Dave Halter
c4906e0e3f
Rework the parser so we can use arbitrary start nodes of the syntax.
...
This also includes a rework for error recovery in the parser. This is now just possible for file_input parsing, which means for full files.
Includes also a refactoring of the tokenizer. No more do we have to add an additional newline, because it now works correctly (removes certain confusion.
2015-12-20 22:25:41 +01:00
Dave Halter
66557903ae
\\\r\n is as possible as \\\n.
2015-04-28 18:53:14 +02:00
Dave Halter
b6ebb2f8bf
Fixed issues with last positions in the tokenizer, which was messed up a little bit a few commits ago.
2015-04-27 21:42:40 +02:00
Dave Halter
0a96083fde
Fix ur'' literals.
2015-04-27 19:21:41 +02:00
Dave Halter
902482568e
The tokenize endmarker should really be the maximum position possible. Caused matplotlib to fail. Fixes davidhalter/jedi-vim#377 .
2015-04-27 19:01:45 +02:00
farhad
32081bd156
Merge branch 'dev' into unicode_tokenize_fix2
...
Conflicts:
AUTHORS.txt
2015-03-06 12:14:38 +04:00
farhad
3747b009bf
fix tokenization of code containing unicode strings
2015-03-06 09:11:35 +04:00
Dave Halter
a91e240c8b
ALWAYS_BREAK_TOKEN -> ALWAYS_BREAK_TOKENS
2015-02-23 14:10:29 +01:00
Dave Halter
3ec96b25cc
Issue with backslashes again in the fast parser.
2015-02-21 18:07:21 +01:00
Dave Halter
39bf9f426b
Handle backslash escaping.
2015-02-18 17:32:34 +01:00
Dave Halter
c689573b0b
Removed the line_offset from tokenize, we have better ways to modify positions, now.
2015-02-05 14:00:58 +01:00
Dave Halter
e913872192
Merged the tokenize is_identifier changes.
2015-02-01 20:32:01 +01:00
Dave Halter
a3cdec819e
Fix the prefix in tokenize, which was the wrong way around.
2015-01-29 17:10:00 +01:00
Dave Halter
a221eee02c
Fix more issues in the fast parser.
2015-01-29 15:38:38 +01:00
Dave Halter
dde0e9c7c6
Fix for loop issues in the fast parser.
2015-01-29 01:36:16 +01:00
Dave Halter
4d6afd3c99
Fix fast parser tests.
2015-01-24 00:06:16 +01:00
Savor d'Isavano
c3c07c4ec2
Fixed issue #526 .
2015-01-16 18:45:34 +08:00
Dave Halter
b2e54ca1eb
The tokenizer now includes all newlines and comments in its prefix.
2014-12-17 20:11:42 +01:00
Dave Halter
e53e211325
Python 2 compatibility in fake module.
2014-12-16 02:07:20 +01:00
Dave Halter
d9d3740c92
Trying to replace the old pgen2 token module with a token module more tightly coupled to the standard library.
2014-12-16 01:52:15 +01:00
Dave Halter
eaace104dd
Replace the tokenizer's output with a tuple (switching back from a Token class).
2014-12-16 00:10:07 +01:00
Dave Halter
2c684906e3
Working with dedents in error recovery.
2014-11-28 21:33:40 +01:00
Dave Halter
31600b9552
classes and functions are new statements and should never get removed by the error recovery.
2014-11-28 02:44:34 +01:00
Dave Halter
128dbd34b6
Check parentheses level in tokenizer.
2014-11-28 02:14:38 +01:00
Dave Halter
e1d6511f2f
Trying to move the indent/dedent logic back into the tokenizer.
2014-11-28 02:04:04 +01:00
Dave Halter
97516eb26b
The new tokenizer is more or less working now. Indents are calculated as they should
2014-11-27 16:03:58 +01:00
Dave Halter
c7862925f5
Small tokenizer changes & tokens now have a prefix attribute instead of preceeding_whitespace.
2014-11-27 01:10:45 +01:00
Dave Halter
f43c371467
Merge @joel-wright's whitespace tokenizer branch. Thanks!
2014-11-26 15:56:11 +01:00
Dave Halter
54dce0e3b2
fix strange issues of Python's std lib tokenizer, might be in there as well (not sure, cause I modified a lot). fixes #449
2014-08-04 16:47:36 +02:00
Joel Wright
07d0a43f7e
Add preceding whitespace collection to tokenizer
2014-07-30 11:59:20 +01:00
Philippe Ombredanne
6f69d7d17f
Fixed comment typo
2014-05-25 15:38:57 +02:00
Dave Halter
5740c45791
again tokenize simplifications
2014-04-28 19:31:41 +02:00
Dave Halter
18dc92f85f
removed a few old/unnecessary tokenize definitions
2014-04-28 18:33:40 +02:00
Dave Halter
a49c624154
tokenize corrections, add unicode literals, because they had been removed from Python 3.2 (reintroduced in 3.3)
2014-04-22 15:17:48 +02:00
Dave Halter
bb6874bc7c
fix for problems with incomplete one liner string literals, after a start of an incomplete string literal the whole line should be seen as an error token
2014-04-19 13:56:29 +02:00
Dave Halter
2e12eb7861
start with the integration of an Operator class to make way for precedences
2014-02-26 14:44:51 +01:00
Dave Halter
e152939791
remove encoding stuff from tokenizer - encoding is always unicode
2014-02-26 12:55:32 +01:00
Dave Halter
40be00826e
clean up tokenize
2014-02-25 17:17:33 +01:00
Dave Halter
761c28ef00
remove __getitem__ from Token
2014-02-25 17:03:56 +01:00
Dave Halter
18e985a961
TokenInfo -> Token
2014-02-25 16:44:48 +01:00
Dave Halter
2db26abf72
start and end don't exst anymore in parser.token.Token, it's now start_pos/end_pos as everywhere else
2014-02-25 16:34:27 +01:00
Dave Halter
aea2c4620f
more unicode switches in the parser
2014-02-25 14:27:50 +01:00
Dave Halter
f4f79317fe
start uniting tokenize.TokenInfo and token.Token
2014-02-25 13:54:18 +01:00
Dave Halter
5b84f0b27f
remove end_pos stuff from tokenizer, the tokens can do that themselves
2014-02-25 13:29:27 +01:00
Dave Halter
3a23c80ae5
prepare for eventual? tokenizer end_pos replacement.
2014-02-25 11:59:10 +01:00
Dave Halter
246118f851
start using @ganwell's new token class (modified in some ways) as the main token class - hope to gain a little bit of memory/cpu/pickling performance
2014-02-25 02:06:26 +01:00
Dave Halter
9943bb6205
remove some old parameters from Parser and FastTokenizer
2014-02-24 11:24:54 +01:00
Dave Halter
7db090a48a
moved NoErrorTokenizer to fast.FastTokenizer
2014-02-24 11:05:31 +01:00
Dave Halter
553ff66c8b
remove last_previous from NoErrorTokenizer
2014-02-23 12:51:05 +01:00
Dave Halter
c5fcebde82
changed _compatibility.utf8 -> 'u' and removed a lot of the issues with the now enforced unicode source input of the parser
2014-02-23 11:29:00 +01:00