Commit Graph

67 Commits

Author SHA1 Message Date
Dave Halter
c4906e0e3f Rework the parser so we can use arbitrary start nodes of the syntax.
This also includes a rework for error recovery in the parser. This is now just possible for file_input parsing, which means for full files.
Includes also a refactoring of the tokenizer. No more do we have to add an additional newline, because it now works correctly (removes certain confusion.
2015-12-20 22:25:41 +01:00
Dave Halter
66557903ae \\\r\n is as possible as \\\n. 2015-04-28 18:53:14 +02:00
Dave Halter
b6ebb2f8bf Fixed issues with last positions in the tokenizer, which was messed up a little bit a few commits ago. 2015-04-27 21:42:40 +02:00
Dave Halter
0a96083fde Fix ur'' literals. 2015-04-27 19:21:41 +02:00
Dave Halter
902482568e The tokenize endmarker should really be the maximum position possible. Caused matplotlib to fail. Fixes davidhalter/jedi-vim#377. 2015-04-27 19:01:45 +02:00
farhad
32081bd156 Merge branch 'dev' into unicode_tokenize_fix2
Conflicts:
	AUTHORS.txt
2015-03-06 12:14:38 +04:00
farhad
3747b009bf fix tokenization of code containing unicode strings 2015-03-06 09:11:35 +04:00
Dave Halter
a91e240c8b ALWAYS_BREAK_TOKEN -> ALWAYS_BREAK_TOKENS 2015-02-23 14:10:29 +01:00
Dave Halter
3ec96b25cc Issue with backslashes again in the fast parser. 2015-02-21 18:07:21 +01:00
Dave Halter
39bf9f426b Handle backslash escaping. 2015-02-18 17:32:34 +01:00
Dave Halter
c689573b0b Removed the line_offset from tokenize, we have better ways to modify positions, now. 2015-02-05 14:00:58 +01:00
Dave Halter
e913872192 Merged the tokenize is_identifier changes. 2015-02-01 20:32:01 +01:00
Dave Halter
a3cdec819e Fix the prefix in tokenize, which was the wrong way around. 2015-01-29 17:10:00 +01:00
Dave Halter
a221eee02c Fix more issues in the fast parser. 2015-01-29 15:38:38 +01:00
Dave Halter
dde0e9c7c6 Fix for loop issues in the fast parser. 2015-01-29 01:36:16 +01:00
Dave Halter
4d6afd3c99 Fix fast parser tests. 2015-01-24 00:06:16 +01:00
Savor d'Isavano
c3c07c4ec2 Fixed issue #526. 2015-01-16 18:45:34 +08:00
Dave Halter
b2e54ca1eb The tokenizer now includes all newlines and comments in its prefix. 2014-12-17 20:11:42 +01:00
Dave Halter
e53e211325 Python 2 compatibility in fake module. 2014-12-16 02:07:20 +01:00
Dave Halter
d9d3740c92 Trying to replace the old pgen2 token module with a token module more tightly coupled to the standard library. 2014-12-16 01:52:15 +01:00
Dave Halter
eaace104dd Replace the tokenizer's output with a tuple (switching back from a Token class). 2014-12-16 00:10:07 +01:00
Dave Halter
2c684906e3 Working with dedents in error recovery. 2014-11-28 21:33:40 +01:00
Dave Halter
31600b9552 classes and functions are new statements and should never get removed by the error recovery. 2014-11-28 02:44:34 +01:00
Dave Halter
128dbd34b6 Check parentheses level in tokenizer. 2014-11-28 02:14:38 +01:00
Dave Halter
e1d6511f2f Trying to move the indent/dedent logic back into the tokenizer. 2014-11-28 02:04:04 +01:00
Dave Halter
97516eb26b The new tokenizer is more or less working now. Indents are calculated as they should 2014-11-27 16:03:58 +01:00
Dave Halter
c7862925f5 Small tokenizer changes & tokens now have a prefix attribute instead of preceeding_whitespace. 2014-11-27 01:10:45 +01:00
Dave Halter
f43c371467 Merge @joel-wright's whitespace tokenizer branch. Thanks! 2014-11-26 15:56:11 +01:00
Dave Halter
54dce0e3b2 fix strange issues of Python's std lib tokenizer, might be in there as well (not sure, cause I modified a lot). fixes #449 2014-08-04 16:47:36 +02:00
Joel Wright
07d0a43f7e Add preceding whitespace collection to tokenizer 2014-07-30 11:59:20 +01:00
Philippe Ombredanne
6f69d7d17f Fixed comment typo 2014-05-25 15:38:57 +02:00
Dave Halter
5740c45791 again tokenize simplifications 2014-04-28 19:31:41 +02:00
Dave Halter
18dc92f85f removed a few old/unnecessary tokenize definitions 2014-04-28 18:33:40 +02:00
Dave Halter
a49c624154 tokenize corrections, add unicode literals, because they had been removed from Python 3.2 (reintroduced in 3.3) 2014-04-22 15:17:48 +02:00
Dave Halter
bb6874bc7c fix for problems with incomplete one liner string literals, after a start of an incomplete string literal the whole line should be seen as an error token 2014-04-19 13:56:29 +02:00
Dave Halter
2e12eb7861 start with the integration of an Operator class to make way for precedences 2014-02-26 14:44:51 +01:00
Dave Halter
e152939791 remove encoding stuff from tokenizer - encoding is always unicode 2014-02-26 12:55:32 +01:00
Dave Halter
40be00826e clean up tokenize 2014-02-25 17:17:33 +01:00
Dave Halter
761c28ef00 remove __getitem__ from Token 2014-02-25 17:03:56 +01:00
Dave Halter
18e985a961 TokenInfo -> Token 2014-02-25 16:44:48 +01:00
Dave Halter
2db26abf72 start and end don't exst anymore in parser.token.Token, it's now start_pos/end_pos as everywhere else 2014-02-25 16:34:27 +01:00
Dave Halter
aea2c4620f more unicode switches in the parser 2014-02-25 14:27:50 +01:00
Dave Halter
f4f79317fe start uniting tokenize.TokenInfo and token.Token 2014-02-25 13:54:18 +01:00
Dave Halter
5b84f0b27f remove end_pos stuff from tokenizer, the tokens can do that themselves 2014-02-25 13:29:27 +01:00
Dave Halter
3a23c80ae5 prepare for eventual? tokenizer end_pos replacement. 2014-02-25 11:59:10 +01:00
Dave Halter
246118f851 start using @ganwell's new token class (modified in some ways) as the main token class - hope to gain a little bit of memory/cpu/pickling performance 2014-02-25 02:06:26 +01:00
Dave Halter
9943bb6205 remove some old parameters from Parser and FastTokenizer 2014-02-24 11:24:54 +01:00
Dave Halter
7db090a48a moved NoErrorTokenizer to fast.FastTokenizer 2014-02-24 11:05:31 +01:00
Dave Halter
553ff66c8b remove last_previous from NoErrorTokenizer 2014-02-23 12:51:05 +01:00
Dave Halter
c5fcebde82 changed _compatibility.utf8 -> 'u' and removed a lot of the issues with the now enforced unicode source input of the parser 2014-02-23 11:29:00 +01:00