99 Commits

Author SHA1 Message Date
Dave Halter
b9725364ab Add a lot of comment to the diff parser 2020-04-13 11:46:36 +02:00
Dave Halter
66ecc264f9 Write 0.7.0 release notes 2020-04-13 11:15:05 +02:00
Dave Halter
63b73a05e6 Diff parser: Take care of one line function error recovery with decorator 2020-04-13 11:07:37 +02:00
Dave Halter
baec4ac58f Diff parser: Take care of one line function error recovery 2020-04-12 02:47:46 +02:00
Dave Halter
b5f58ac33c Ignore some slow files for the fuzzer 2020-04-12 01:14:24 +02:00
Dave Halter
83cb71f7a1 The fuzzer now tries to reuse previous modfiications as well sometimes 2020-04-11 23:29:00 +02:00
Dave Halter
30a2b2f40d Fix an error case with prefixes 2020-04-11 22:51:17 +02:00
Dave Halter
d81e393c0c Fix indentation issues with backslashes and def error recovery 2020-04-10 21:48:28 +02:00
Dave Halter
7822f8be84 Python 2 compatibility 2020-04-09 22:47:50 +02:00
Dave Halter
93788a3e09 Add a test for the diff parser that xfails 2020-04-09 00:03:39 +02:00
Dave Halter
085f666ca1 Add more tokens that can break parens to tokenizer 2020-04-08 23:24:30 +02:00
Dave Halter
9e546e42de Diff parser: Fix another byte order mark issue 2020-04-07 22:58:47 +02:00
Dave Halter
7b14a86e0a Fix tokenizer error tokens 2020-04-07 09:55:28 +02:00
Dave Halter
f45941226f Diff parser: Fix other BOM issues 2020-04-07 01:06:03 +02:00
Dave Halter
e04552b14a Fix tests for Python 2 2020-04-06 23:52:29 +02:00
Dave Halter
cd9c213a62 Fix fstring issues when error leaves are involved 2020-04-06 23:34:27 +02:00
Dave Halter
561e81df00 Replace non utf8 errors properly in diff fuzzer 2020-04-06 02:04:48 +02:00
Dave Halter
556ce86cde Tokenizer: It should not be possible to break out of backslashes on the next line, even if it was an error 2020-04-06 01:25:06 +02:00
Dave Halter
b12dd498bb Diff parser: Fix BOM with indentation issues 2020-04-05 20:47:49 +02:00
Dave Halter
db10b4fa72 Diff parser: Need to care for eror dedents in some open parentheses/always break contexts 2020-04-05 14:39:56 +02:00
Dave Halter
ed38518052 Diff parser: Make sure that nested suites get properly copied 2020-04-05 02:48:41 +02:00
Dave Halter
ebc69545c7 Fix error recovery for multi line strings at the end of the file 2020-04-05 00:13:55 +02:00
Dave Halter
67ebb6acac async is actually a token that cannot appear in brackets 2020-04-04 23:14:10 +02:00
Dave Halter
bcf76949b6 Diff parser: Remove error statements before caring about nested functions 2020-04-04 22:43:33 +02:00
Dave Halter
6c7b397cc7 Diff parser: Check indentation for copies correctly 2020-04-04 20:36:19 +02:00
Dave Halter
1927ba7254 Start using the parser count/copy count again 2020-04-04 17:49:35 +02:00
Dave Halter
a6c33411d4 Remove all the error dedent/indent additions in the diff parser
The parser should just reparse stuff that is strangely indented
2020-04-04 16:15:17 +02:00
Dave Halter
f8dce76ef7 Make sure to only copy nodes that have the same indentation in diff parser 2020-04-04 16:07:54 +02:00
Dave Halter
3242e36859 Python 2 compatibility 2020-04-04 15:45:03 +02:00
Dave Halter
734a4b0e67 Remove support for specialized treatment of form feeds
This is a very intentional change. Previously form feeds were handled very
poorly and sometimes where not counted as indentation. This obviously makes
sense. But at the same time indentation is very tricky to deal with (both for
editors and parso).

Especially in the diff parser this led to a lot of very weird issues. The
decision probably makes sense since:

1. Almost nobody uses form feeds in the first place.
2. People that use form feeds like Barry Warsaw often put a newline ater them.
   (e.g Python's email.__init__)
3. If you write an editor you want to be able to identify a unicode character
   with a clear line/column. This would not be the case if form feeds were just
   ignored when counting.

Form feeds will still work in Jedi, will not cause parse errors and in general
you should be fine using them. It might just cause Jedi to count them as
indentation **if** you use it like '\f  foo()'. This is however confusing for
most editors anyway. It leads to a weird display e.g. in VIM, even if it's
perfectly valid code in Python.

Since parso is a code analysis parser and not the languages parser I think it's
fine to ignore this edge case.
2020-04-04 15:38:10 +02:00
Dave Halter
1047204654 Small tokenizer refactoring 2020-04-04 13:13:00 +02:00
Dave Halter
ae6af7849e Diff parser: All indent checks should use _get_indent 2020-04-04 13:08:47 +02:00
Dave Halter
e1632cdadc Fix some issues with async funcs 2020-04-04 04:01:15 +02:00
Dave Halter
7f0dd35c37 Remove the piece of shit _get_insertion_node function 2020-04-04 03:51:28 +02:00
Dave Halter
ad88783ac9 Remove get_first_indentation 2020-04-03 16:47:00 +02:00
Dave Halter
8550a52e48 Remove indents from _NodesTreeNode 2020-04-03 16:26:01 +02:00
Dave Halter
c88a736e35 Fix indent issues 2020-04-03 16:24:26 +02:00
Dave Halter
a07146f8a5 Deal with indents in diff parser more explicitly 2020-04-03 12:41:28 +02:00
Dave Halter
0c0aa31a91 Don't use max as a variable 2020-04-03 03:35:21 +02:00
Dave Halter
77327a4cea Make node insertion a bit easier 2020-04-03 03:28:14 +02:00
Dave Halter
8bbd304eb9 Define token types a bit different in diff parser 2020-04-03 01:05:11 +02:00
Dave Halter
62fd03edda Pass tokens in diff tokenizer 2020-04-03 01:01:37 +02:00
Dave Halter
12063d42fc When debugging print 2020-04-03 00:56:59 +02:00
Dave Halter
c86af743df Initialize start pos properly in diff parser 2020-04-03 00:54:13 +02:00
Dave Halter
fb2ea551d5 Move the tokenizer/diff parser closer together 2020-04-03 00:18:35 +02:00
Dave Halter
ce170e8aae WIP: Try to use the tokenizer in a more native way 2020-04-02 02:00:35 +02:00
Dave Halter
d674bc9895 Fix a backslash issue 2020-03-29 23:59:53 +02:00
Dave Halter
0d9886c22a Diff parser: Rewrite tokenizer modifications a bit 2020-03-29 22:41:59 +02:00
Dave Halter
9f8a68677d Tokenizer: It's now clearer when an error dedent appears 2020-03-29 13:50:36 +02:00
Dave Halter
a950b82066 Fix tokenizer for random invalid unicode points 2020-03-28 21:02:04 +01:00
Dave Halter
38b7763e9a Use _assert_nodes_are_equal in the fuzzer 2020-03-28 14:51:27 +01:00
Dave Halter
cf880f43d4 Tokenizer: Add error dedents only if parens are not open 2020-03-28 14:41:10 +01:00
Dave Halter
8e49d8ab5f Fix tokenizer fstring end positions 2020-03-28 11:22:32 +01:00
Dave Halter
77b3ad5843 Small flake8 refactoring 2020-03-28 10:41:00 +01:00
Dave Halter
29e3545241 Fix adding error indents/dedents only at the right places 2020-03-27 17:05:05 +01:00
Dave Halter
3d95b65b21 Fix an issue with unfinished f string literals 2020-03-27 11:17:31 +01:00
Dave Halter
b86ea25435 Add a bit to the CHANGELOG 2020-03-24 22:38:18 +01:00
Dave Halter
4c42a82ebc Allow multiple newlines in a suite, this makes the diff parser easier 2020-03-24 22:35:21 +01:00
Dave Halter
43651ef219 Diff parser: Make sure dedent start pos are matching 2020-03-24 22:27:04 +01:00
Dave Halter
419d9e3174 Diff parser: Fix a few more indentation issues 2020-03-24 22:03:29 +01:00
Dave Halter
2bef3cf6ff Fix an issue where indents where repeated unnessecarily 2020-03-24 00:24:53 +01:00
Dave Halter
8e95820d78 Don't show logs in pytest, because they already appear by default 2020-03-23 23:53:23 +01:00
Dave Halter
c18c89eb6b Diff parser: Correctly add indent issues 2020-03-23 00:16:47 +01:00
Dave Halter
afc556d809 Diff parser: Prepare for indent error leaf insertion 2020-03-22 22:57:58 +01:00
Dave Halter
cdb791fbdb Diff parser: Add error dedents if necessary, see also davidhalter/jedi#1499 2020-03-22 21:37:25 +01:00
Dave Halter
93f1cdebbc Try to make parsed trees more similar for incomplete dedents, see also davidhalter/jedi#1499 2020-03-22 21:15:22 +01:00
Dave Halter
d3ceafee01 Specify in tests how another dedent issue is recovered from 2020-03-22 19:34:12 +01:00
Dave Halter
237dc9e135 Diff parser: Make sure to pop nodes directly after error nodes, see also davidhalter/jedi#1499 2020-03-22 14:49:22 +01:00
Dave Halter
bd37353042 Move a bit of code 2020-03-22 13:46:13 +01:00
Dave Halter
51a044cc70 Fix diff parser: Invalid dedents meant that sometimes the wrong parents were chosen, fixes davidhalter/jedi#1499 2020-03-22 12:41:19 +01:00
Dave Halter
2cd0d6c9fc Fix: Dedent omission was wrong, see davidhalter/jedi#1499 2020-03-22 12:41:19 +01:00
Daniel Hahler
287a86c242 ci: Travis: use Python 3.8.2
Ref: https://github.com/davidhalter/parso/issues/103
2020-02-28 00:51:06 +01:00
Dave Halter
0234a70e95 Python 3.8.2 was released and an error message changed, fixes #103 2020-02-28 00:31:58 +01:00
Dave Halter
7ba49a9695 Prepare the 0.6.2 release 2020-02-27 02:10:06 +01:00
Dave Halter
53da7e8e6b Fix get_next_sibling on module, fixes #102 2020-02-21 18:31:13 +01:00
Dave Halter
6dd29c8efb Fix ExprStmt.get_rhs for annotations 2020-02-21 18:31:13 +01:00
Dave Halter
e4a9cfed86 Give parso refactoring tools 2020-02-21 18:31:13 +01:00
Joe Antonakakis
a7f4499644 Add venv to .gitignore (#101) 2020-02-14 14:28:07 +01:00
Dave Halter
4306e8b34b Change the release date for 0.6.1 2020-02-03 21:46:25 +01:00
Dave Halter
2ce3898690 Prepare the next release 0.6.1 2020-02-03 18:40:05 +01:00
Dave Halter
16f257356e Make end_pos public for syntax issues 2020-02-03 18:36:47 +01:00
Dave Halter
c864ca60d1 Bump version to 0.6.0 2020-01-26 20:01:38 +01:00
Dave Halter
a47b5433d4 Make sure iter_funcdefs includes async functions with decorators, fixes #98 2020-01-26 20:00:56 +01:00
Dave Halter
6982cf8321 Add a bit to the changelog 2020-01-26 19:47:46 +01:00
Dave Halter
844ca3d35a del_stmt is now considered a name definition 2020-01-26 19:42:12 +01:00
Dave Halter
9abe5d1e55 Forgot to increase the pickle version 2020-01-20 01:28:06 +01:00
Jarry Shaw
84874aace3 Revision on fstring issues (#100)
* f-string expression part cannot include a backslash
 * failing example `f"{'\n'}"` for tests
2020-01-09 21:49:34 +01:00
Jarry Shaw
55531ab65b Revision on assignment errors (#97)
* Revision on assignment expression errors

 * added rule for __debug__ (should be a keyword)
 * reviewed error messages
 * added new failing samples

* Adjustment upon Dave's review

 * rewind several changes in assignment errors
 * patched is_definition: command not found for assignment expressions
 * patched Python 2 inconsistent error messages in test_python_errors.py: command not found
2020-01-08 23:07:37 +01:00
Dave Halter
31c059fc30 Add a Changelog note about dropping 2.6/3.3 2020-01-06 00:05:11 +01:00
Dave Halter
cfef1d74e7 Fix a Python 2.7 issue 2020-01-06 00:02:26 +01:00
Dave Halter
9ee7409d8a Get rid of Python 3.3 artifacts 2020-01-05 23:59:38 +01:00
Dave Halter
4090c80401 Remove Python 2.6 grammar 2020-01-05 23:55:03 +01:00
Dave Halter
95f353a15f Merge branch 'rm-2.6' of https://github.com/hugovk/parso 2020-01-05 23:50:20 +01:00
Dave Halter
2b0b093276 Make sure to limit the amount of cached files parso stores, fixes davidhalter/jedi#1340 2020-01-05 23:44:51 +01:00
Tim Gates
29b57d93bd Fix simple typo: utitilies -> utilities
Closes #94
2019-12-17 10:00:28 +01:00
Hugo
d3383b6c41 Fix string/tuple concatenation 2019-08-08 16:49:42 +03:00
Hugo
9da4df20d1 Add python_requires to help pip 2019-08-08 14:57:13 +03:00
Hugo
0341f69691 Drop support for EOL Python 3.3 2019-08-08 14:57:13 +03:00
Hugo
f6bdba65c0 Drop support for EOL Python 2.6 2019-08-08 14:56:27 +03:00
42 changed files with 1320 additions and 545 deletions

1
.gitignore vendored
View File

@@ -11,3 +11,4 @@ parso.egg-info/
/.cache/
/.pytest_cache
test/fuzz-redo.pickle
/venv/

View File

@@ -6,7 +6,7 @@ python:
- 3.5
- 3.6
- 3.7
- 3.8
- 3.8.2
- pypy2.7-6.0
- pypy3.5-6.0
matrix:

View File

@@ -49,6 +49,7 @@ Mathias Rav (@Mortal) <rav@cs.au.dk>
Daniel Fiterman (@dfit99) <fitermandaniel2@gmail.com>
Simon Ruggier (@sruggier)
Élie Gouzien (@ElieGouzien)
Tim Gates (@timgates42) <tim.gates@iress.com>
Note: (@user) means a github user name.

View File

@@ -3,6 +3,35 @@
Changelog
---------
0.7.0 (2020-04-13)
++++++++++++++++++
- Fix a lot of annoying bugs in the diff parser. The fuzzer did not find
issues anymore even after running it for more than 24 hours (500k tests).
- Small grammar change: suites can now contain newlines even after a newline.
This should really not matter if you don't use error recovery. It allows for
nicer error recovery.
0.6.2 (2020-02-27)
++++++++++++++++++
- Bugfixes
- Add Grammar.refactor (might still be subject to change until 0.7.0)
0.6.1 (2020-02-03)
++++++++++++++++++
- Add ``parso.normalizer.Issue.end_pos`` to make it possible to know where an
issue ends
0.6.0 (2020-01-26)
++++++++++++++++++
- Dropped Python 2.6/Python 3.3 support
- del_stmt names are now considered as a definition
(for ``name.is_definition()``)
- Bugfixes
0.5.2 (2019-12-15)
++++++++++++++++++

View File

@@ -13,8 +13,8 @@ from parso.utils import parse_version_string
collect_ignore = ["setup.py"]
VERSIONS_2 = '2.6', '2.7'
VERSIONS_3 = '3.3', '3.4', '3.5', '3.6', '3.7', '3.8'
VERSIONS_2 = '2.7',
VERSIONS_3 = '3.4', '3.5', '3.6', '3.7', '3.8'
@pytest.fixture(scope='session')
@@ -87,12 +87,12 @@ def pytest_configure(config):
root = logging.getLogger()
root.setLevel(logging.DEBUG)
ch = logging.StreamHandler(sys.stdout)
ch.setLevel(logging.DEBUG)
#ch = logging.StreamHandler(sys.stdout)
#ch.setLevel(logging.DEBUG)
#formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
#ch.setFormatter(formatter)
root.addHandler(ch)
#root.addHandler(ch)
class Checker():

View File

@@ -43,7 +43,7 @@ from parso.grammar import Grammar, load_grammar
from parso.utils import split_lines, python_bytes_to_unicode
__version__ = '0.5.2'
__version__ = '0.7.0'
def parse(code=None, **kwargs):

View File

@@ -1,14 +1,10 @@
"""
To ensure compatibility from Python ``2.6`` - ``3.3``, a module has been
To ensure compatibility from Python ``2.7`` - ``3.3``, a module has been
created. Clearly there is huge need to use conforming syntax.
"""
import sys
import platform
# Cannot use sys.version.major and minor names, because in Python 2.6 it's not
# a namedtuple.
py_version = int(str(sys.version_info[0]) + str(sys.version_info[1]))
# unicode function
try:
unicode = unicode
@@ -39,7 +35,7 @@ def u(string):
have to cast back to a unicode (and we know that we always deal with valid
unicode, because we check that in the beginning).
"""
if py_version >= 30:
if sys.version_info.major >= 3:
return str(string)
if not isinstance(string, unicode):
@@ -48,8 +44,10 @@ def u(string):
try:
# Python 2.7
FileNotFoundError = FileNotFoundError
except NameError:
# Python 3.3+
FileNotFoundError = IOError
@@ -65,39 +63,7 @@ def utf8_repr(func):
else:
return result
if py_version >= 30:
if sys.version_info.major >= 3:
return func
else:
return wrapper
try:
from functools import total_ordering
except ImportError:
# Python 2.6
def total_ordering(cls):
"""Class decorator that fills in missing ordering methods"""
convert = {
'__lt__': [('__gt__', lambda self, other: not (self < other or self == other)),
('__le__', lambda self, other: self < other or self == other),
('__ge__', lambda self, other: not self < other)],
'__le__': [('__ge__', lambda self, other: not self <= other or self == other),
('__lt__', lambda self, other: self <= other and not self == other),
('__gt__', lambda self, other: not self <= other)],
'__gt__': [('__lt__', lambda self, other: not (self > other or self == other)),
('__ge__', lambda self, other: self > other or self == other),
('__le__', lambda self, other: not self > other)],
'__ge__': [('__le__', lambda self, other: (not self >= other) or self == other),
('__gt__', lambda self, other: self >= other and not self == other),
('__lt__', lambda self, other: not self >= other)]
}
roots = set(dir(cls)) & set(convert)
if not roots:
raise ValueError('must define at least one ordering operation: < > <= >=')
root = max(roots) # prefer __lt__ to __le__ to __gt__ to __ge__
for opname, opfunc in convert[root]:
if opname not in roots:
opfunc.__name__ = opname
opfunc.__doc__ = getattr(int, opname).__doc__
setattr(cls, opname, opfunc)
return cls

View File

@@ -17,8 +17,23 @@ from parso._compatibility import FileNotFoundError
LOG = logging.getLogger(__name__)
_CACHED_FILE_MINIMUM_SURVIVAL = 60 * 10 # 10 minutes
"""
Cached files should survive at least a few minutes.
"""
_CACHED_SIZE_TRIGGER = 600
"""
This setting limits the amount of cached files. It's basically a way to start
garbage collection.
_PICKLE_VERSION = 32
The reasoning for this limit being as big as it is, is the following:
Numpy, Pandas, Matplotlib and Tensorflow together use about 500 files. This
makes Jedi use ~500mb of memory. Since we might want a bit more than those few
libraries, we just increase it a bit.
"""
_PICKLE_VERSION = 33
"""
Version number (integer) for file system cache.
@@ -40,7 +55,7 @@ _VERSION_TAG = '%s-%s%s-%s' % (
"""
Short name for distinguish Python implementations and versions.
It's like `sys.implementation.cache_tag` but for Python < 3.3
It's like `sys.implementation.cache_tag` but for Python2
we generate something similar. See:
http://docs.python.org/3/library/sys.html#sys.implementation
"""
@@ -76,6 +91,7 @@ class _NodeCacheItem(object):
if change_time is None:
change_time = time.time()
self.change_time = change_time
self.last_used = change_time
def load_module(hashed_grammar, file_io, cache_path=None):
@@ -89,6 +105,7 @@ def load_module(hashed_grammar, file_io, cache_path=None):
try:
module_cache_item = parser_cache[hashed_grammar][file_io.path]
if p_time <= module_cache_item.change_time:
module_cache_item.last_used = time.time()
return module_cache_item.node
except KeyError:
return _load_from_file_system(
@@ -122,11 +139,27 @@ def _load_from_file_system(hashed_grammar, path, p_time, cache_path=None):
except FileNotFoundError:
return None
else:
parser_cache.setdefault(hashed_grammar, {})[path] = module_cache_item
_set_cache_item(hashed_grammar, path, module_cache_item)
LOG.debug('pickle loaded: %s', path)
return module_cache_item.node
def _set_cache_item(hashed_grammar, path, module_cache_item):
if sum(len(v) for v in parser_cache.values()) >= _CACHED_SIZE_TRIGGER:
# Garbage collection of old cache files.
# We are basically throwing everything away that hasn't been accessed
# in 10 minutes.
cutoff_time = time.time() - _CACHED_FILE_MINIMUM_SURVIVAL
for key, path_to_item_map in parser_cache.items():
parser_cache[key] = {
path: node_item
for path, node_item in path_to_item_map.items()
if node_item.last_used > cutoff_time
}
parser_cache.setdefault(hashed_grammar, {})[path] = module_cache_item
def save_module(hashed_grammar, file_io, module, lines, pickling=True, cache_path=None):
path = file_io.path
try:
@@ -136,7 +169,7 @@ def save_module(hashed_grammar, file_io, module, lines, pickling=True, cache_pat
pickling = False
item = _NodeCacheItem(module, lines, p_time)
parser_cache.setdefault(hashed_grammar, {})[path] = item
_set_cache_item(hashed_grammar, path, item)
if pickling and path is not None:
_save_to_file_system(hashed_grammar, path, item, cache_path=cache_path)

View File

@@ -13,6 +13,7 @@ from parso.python.parser import Parser as PythonParser
from parso.python.errors import ErrorFinderConfig
from parso.python import pep8
from parso.file_io import FileIO, KnownContentFileIO
from parso.normalizer import RefactoringNormalizer
_loaded_grammars = {}
@@ -137,7 +138,7 @@ class Grammar(object):
cache_path=cache_path)
return new_node
tokens = self._tokenizer(lines, start_pos)
tokens = self._tokenizer(lines, start_pos=start_pos)
p = self._parser(
self._pgen_grammar,
@@ -170,6 +171,9 @@ class Grammar(object):
return self._get_normalizer_issues(node, self._error_normalizer_config)
def refactor(self, base_node, node_to_str_map):
return RefactoringNormalizer(node_to_str_map).walk(base_node)
def _get_normalizer(self, normalizer_config):
if normalizer_config is None:
normalizer_config = self._default_normalizer_config
@@ -211,8 +215,8 @@ class PythonGrammar(Grammar):
)
self.version_info = version_info
def _tokenize_lines(self, lines, start_pos):
return tokenize_lines(lines, self.version_info, start_pos=start_pos)
def _tokenize_lines(self, lines, **kwargs):
return tokenize_lines(lines, self.version_info, **kwargs)
def _tokenize(self, code):
# Used by Jedi.
@@ -224,7 +228,7 @@ def load_grammar(**kwargs):
Loads a :py:class:`parso.Grammar`. The default version is the current Python
version.
:param str version: A python version string, e.g. ``version='3.3'``.
:param str version: A python version string, e.g. ``version='3.8'``.
:param str path: A path to a grammar file
"""
def load_grammar(language='python', version=None, path=None):

View File

@@ -12,6 +12,9 @@ class _NormalizerMeta(type):
class Normalizer(use_metaclass(_NormalizerMeta)):
_rule_type_instances = {}
_rule_value_instances = {}
def __init__(self, grammar, config):
self.grammar = grammar
self._config = config
@@ -119,7 +122,6 @@ class NormalizerConfig(object):
class Issue(object):
def __init__(self, node, code, message):
self._node = node
self.code = code
"""
An integer code that stands for the type of error.
@@ -133,6 +135,7 @@ class Issue(object):
The start position position of the error as a tuple (line, column). As
always in |parso| the first line is 1 and the first column 0.
"""
self.end_pos = node.end_pos
def __eq__(self, other):
return self.start_pos == other.start_pos and self.code == other.code
@@ -181,3 +184,20 @@ class Rule(object):
if self.is_issue(node):
issue_node = self.get_node(node)
self.add_issue(issue_node)
class RefactoringNormalizer(Normalizer):
def __init__(self, node_to_str_map):
self._node_to_str_map = node_to_str_map
def visit(self, node):
try:
return self._node_to_str_map[node]
except KeyError:
return super(RefactoringNormalizer, self).visit(node)
def visit_leaf(self, leaf):
try:
return self._node_to_str_map[leaf]
except KeyError:
return super(RefactoringNormalizer, self).visit_leaf(leaf)

View File

@@ -134,7 +134,7 @@ class BaseParser(object):
# However, the error recovery might have added the token again, if
# the stack is empty, we're fine.
raise InternalParseError(
"incomplete input", token.type, token.value, token.start_pos
"incomplete input", token.type, token.string, token.start_pos
)
if len(self.stack) > 1:

View File

@@ -1,9 +1,29 @@
"""
Basically a contains parser that is faster, because it tries to parse only
parts and if anything changes, it only reparses the changed parts.
The diff parser is trying to be a faster version of the normal parser by trying
to reuse the nodes of a previous pass over the same file. This is also called
incremental parsing in parser literature. The difference is mostly that with
incremental parsing you get a range that needs to be reparsed. Here we
calculate that range ourselves by using difflib. After that it's essentially
incremental parsing.
It works with a simple diff in the beginning and will try to reuse old parser
fragments.
The biggest issue of this approach is that we reuse nodes in a mutable way. The
intial design and idea is quite problematic for this parser, but it is also
pretty fast. Measurements showed that just copying nodes in Python is simply
quite a bit slower (especially for big files >3 kLOC). Therefore we did not
want to get rid of the mutable nodes, since this is usually not an issue.
This is by far the hardest software I ever wrote, exactly because the initial
design is crappy. When you have to account for a lot of mutable state, it
creates a ton of issues that you would otherwise not have. This file took
probably 3-6 months to write, which is insane for a parser.
There is a fuzzer in that helps test this whole thing. Please use it if you
make changes here. If you run the fuzzer like::
test/fuzz_diff_parser.py random -n 100000
you can be pretty sure that everything is still fine. I sometimes run the
fuzzer up to 24h to make sure everything is still ok.
"""
import re
import difflib
@@ -13,7 +33,7 @@ import logging
from parso.utils import split_lines
from parso.python.parser import Parser
from parso.python.tree import EndMarker
from parso.python.tokenize import PythonToken
from parso.python.tokenize import PythonToken, BOM_UTF8_STRING
from parso.python.token import PythonTokenTypes
LOG = logging.getLogger(__name__)
@@ -21,21 +41,37 @@ DEBUG_DIFF_PARSER = False
_INDENTATION_TOKENS = 'INDENT', 'ERROR_DEDENT', 'DEDENT'
NEWLINE = PythonTokenTypes.NEWLINE
DEDENT = PythonTokenTypes.DEDENT
NAME = PythonTokenTypes.NAME
ERROR_DEDENT = PythonTokenTypes.ERROR_DEDENT
ENDMARKER = PythonTokenTypes.ENDMARKER
def _is_indentation_error_leaf(node):
return node.type == 'error_leaf' and node.token_type in _INDENTATION_TOKENS
def _get_previous_leaf_if_indentation(leaf):
while leaf and leaf.type == 'error_leaf' \
and leaf.token_type in _INDENTATION_TOKENS:
while leaf and _is_indentation_error_leaf(leaf):
leaf = leaf.get_previous_leaf()
return leaf
def _get_next_leaf_if_indentation(leaf):
while leaf and leaf.type == 'error_leaf' \
and leaf.token_type in _INDENTATION_TOKENS:
leaf = leaf.get_previous_leaf()
while leaf and _is_indentation_error_leaf(leaf):
leaf = leaf.get_next_leaf()
return leaf
def _get_suite_indentation(tree_node):
return _get_indentation(tree_node.children[1])
def _get_indentation(tree_node):
return tree_node.start_pos[1]
def _assert_valid_graph(node):
"""
Checks if the parent/children relationship is correct.
@@ -70,6 +106,10 @@ def _assert_valid_graph(node):
actual = line, len(splitted[-1])
else:
actual = previous_start_pos[0], previous_start_pos[1] + len(content)
if content.startswith(BOM_UTF8_STRING) \
and node.get_start_pos_of_prefix() == (1, 0):
# Remove the byte order mark
actual = actual[0], actual[1] - 1
assert node.start_pos == actual, (node.start_pos, actual)
else:
@@ -78,6 +118,26 @@ def _assert_valid_graph(node):
_assert_valid_graph(child)
def _assert_nodes_are_equal(node1, node2):
try:
children1 = node1.children
except AttributeError:
assert not hasattr(node2, 'children'), (node1, node2)
assert node1.value == node2.value, (node1, node2)
assert node1.type == node2.type, (node1, node2)
assert node1.prefix == node2.prefix, (node1, node2)
assert node1.start_pos == node2.start_pos, (node1, node2)
return
else:
try:
children2 = node2.children
except AttributeError:
assert False, (node1, node2)
for n1, n2 in zip(children1, children2):
_assert_nodes_are_equal(n1, n2)
assert len(children1) == len(children2), '\n' + repr(children1) + '\n' + repr(children2)
def _get_debug_error_message(module, old_lines, new_lines):
current_lines = split_lines(module.get_code(), keepends=True)
current_diff = difflib.unified_diff(new_lines, current_lines)
@@ -95,6 +155,15 @@ def _get_last_line(node_or_leaf):
if _ends_with_newline(last_leaf):
return last_leaf.start_pos[0]
else:
n = last_leaf.get_next_leaf()
if n.type == 'endmarker' and '\n' in n.prefix:
# This is a very special case and has to do with error recovery in
# Parso. The problem is basically that there's no newline leaf at
# the end sometimes (it's required in the grammar, but not needed
# actually before endmarker, CPython just adds a newline to make
# source code pass the parser, to account for that Parso error
# recovery allows small_stmt instead of simple_stmt).
return last_leaf.end_pos[0] + 1
return last_leaf.end_pos[0]
@@ -233,7 +302,7 @@ class DiffParser(object):
if operation == 'equal':
line_offset = j1 - i1
self._copy_from_old_parser(line_offset, i2, j2)
self._copy_from_old_parser(line_offset, i1 + 1, i2, j2)
elif operation == 'replace':
self._parse(until_line=j2)
elif operation == 'insert':
@@ -249,8 +318,14 @@ class DiffParser(object):
# If there is reasonable suspicion that the diff parser is not
# behaving well, this should be enabled.
try:
assert self._module.get_code() == ''.join(new_lines)
code = ''.join(new_lines)
assert self._module.get_code() == code
_assert_valid_graph(self._module)
without_diff_parser_module = Parser(
self._pgen_grammar,
error_recovery=True
).parse(self._tokenizer(new_lines))
_assert_nodes_are_equal(self._module, without_diff_parser_module)
except AssertionError:
print(_get_debug_error_message(self._module, old_lines, new_lines))
raise
@@ -268,7 +343,7 @@ class DiffParser(object):
if self._module.get_code() != ''.join(lines_new):
LOG.warning('parser issue:\n%s\n%s', ''.join(old_lines), ''.join(lines_new))
def _copy_from_old_parser(self, line_offset, until_line_old, until_line_new):
def _copy_from_old_parser(self, line_offset, start_line_old, until_line_old, until_line_new):
last_until_line = -1
while until_line_new > self._nodes_tree.parsed_until_line:
parsed_until_line_old = self._nodes_tree.parsed_until_line - line_offset
@@ -282,12 +357,18 @@ class DiffParser(object):
p_children = line_stmt.parent.children
index = p_children.index(line_stmt)
from_ = self._nodes_tree.parsed_until_line + 1
copied_nodes = self._nodes_tree.copy_nodes(
p_children[index:],
until_line_old,
line_offset
)
if start_line_old == 1 \
and p_children[0].get_first_leaf().prefix.startswith(BOM_UTF8_STRING):
# If there's a BOM in the beginning, just reparse. It's too
# complicated to account for it otherwise.
copied_nodes = []
else:
from_ = self._nodes_tree.parsed_until_line + 1
copied_nodes = self._nodes_tree.copy_nodes(
p_children[index:],
until_line_old,
line_offset
)
# Match all the nodes that are in the wanted range.
if copied_nodes:
self._copy_count += 1
@@ -333,7 +414,10 @@ class DiffParser(object):
node = self._try_parse_part(until_line)
nodes = node.children
self._nodes_tree.add_parsed_nodes(nodes)
self._nodes_tree.add_parsed_nodes(nodes, self._keyword_token_indents)
if self._replace_tos_indent is not None:
self._nodes_tree.indents[-1] = self._replace_tos_indent
LOG.debug(
'parse_part from %s to %s (to %s in part parser)',
nodes[0].get_start_pos_of_prefix()[0],
@@ -369,34 +453,39 @@ class DiffParser(object):
return self._active_parser.parse(tokens=tokens)
def _diff_tokenize(self, lines, until_line, line_offset=0):
is_first_token = True
omitted_first_indent = False
indents = []
tokens = self._tokenizer(lines, (1, 0))
stack = self._active_parser.stack
for typ, string, start_pos, prefix in tokens:
start_pos = start_pos[0] + line_offset, start_pos[1]
if typ == PythonTokenTypes.INDENT:
indents.append(start_pos[1])
if is_first_token:
omitted_first_indent = True
# We want to get rid of indents that are only here because
# we only parse part of the file. These indents would only
# get parsed as error leafs, which doesn't make any sense.
is_first_token = False
continue
is_first_token = False
was_newline = False
indents = self._nodes_tree.indents
initial_indentation_count = len(indents)
# In case of omitted_first_indent, it might not be dedented fully.
# However this is a sign for us that a dedent happened.
if typ == PythonTokenTypes.DEDENT \
or typ == PythonTokenTypes.ERROR_DEDENT \
and omitted_first_indent and len(indents) == 1:
indents.pop()
if omitted_first_indent and not indents:
tokens = self._tokenizer(
lines,
start_pos=(line_offset + 1, 0),
indents=indents,
is_first_token=line_offset == 0,
)
stack = self._active_parser.stack
self._replace_tos_indent = None
self._keyword_token_indents = {}
# print('start', line_offset + 1, indents)
for token in tokens:
# print(token, indents)
typ = token.type
if typ == DEDENT:
if len(indents) < initial_indentation_count:
# We are done here, only thing that can come now is an
# endmarker or another dedented code block.
typ, string, start_pos, prefix = next(tokens)
while True:
typ, string, start_pos, prefix = token = next(tokens)
if typ in (DEDENT, ERROR_DEDENT):
if typ == ERROR_DEDENT:
# We want to force an error dedent in the next
# parser/pass. To make this possible we just
# increase the location by one.
self._replace_tos_indent = start_pos[1] + 1
pass
else:
break
if '\n' in prefix or '\r' in prefix:
prefix = re.sub(r'[^\n\r]+\Z', '', prefix)
else:
@@ -404,36 +493,38 @@ class DiffParser(object):
if start_pos[1] - len(prefix) == 0:
prefix = ''
yield PythonToken(
PythonTokenTypes.ENDMARKER, '',
(start_pos[0] + line_offset, 0),
ENDMARKER, '',
start_pos,
prefix
)
break
elif typ == PythonTokenTypes.NEWLINE and start_pos[0] >= until_line:
yield PythonToken(typ, string, start_pos, prefix)
# Check if the parser is actually in a valid suite state.
if _suite_or_file_input_is_valid(self._pgen_grammar, stack):
start_pos = start_pos[0] + 1, 0
while len(indents) > int(omitted_first_indent):
indents.pop()
yield PythonToken(PythonTokenTypes.DEDENT, '', start_pos, '')
elif typ == NEWLINE and token.start_pos[0] >= until_line:
was_newline = True
elif was_newline:
was_newline = False
if len(indents) == initial_indentation_count:
# Check if the parser is actually in a valid suite state.
if _suite_or_file_input_is_valid(self._pgen_grammar, stack):
yield PythonToken(ENDMARKER, '', token.start_pos, '')
break
yield PythonToken(PythonTokenTypes.ENDMARKER, '', start_pos, '')
break
else:
continue
if typ == NAME and token.string in ('class', 'def'):
self._keyword_token_indents[token.start_pos] = list(indents)
yield PythonToken(typ, string, start_pos, prefix)
yield token
class _NodesTreeNode(object):
_ChildrenGroup = namedtuple('_ChildrenGroup', 'prefix children line_offset last_line_offset_leaf')
_ChildrenGroup = namedtuple(
'_ChildrenGroup',
'prefix children line_offset last_line_offset_leaf')
def __init__(self, tree_node, parent=None):
def __init__(self, tree_node, parent=None, indentation=0):
self.tree_node = tree_node
self._children_groups = []
self.parent = parent
self._node_children = []
self.indentation = indentation
def finish(self):
children = []
@@ -461,10 +552,13 @@ class _NodesTreeNode(object):
def add_child_node(self, child_node):
self._node_children.append(child_node)
def add_tree_nodes(self, prefix, children, line_offset=0, last_line_offset_leaf=None):
def add_tree_nodes(self, prefix, children, line_offset=0,
last_line_offset_leaf=None):
if last_line_offset_leaf is None:
last_line_offset_leaf = children[-1].get_last_leaf()
group = self._ChildrenGroup(prefix, children, line_offset, last_line_offset_leaf)
group = self._ChildrenGroup(
prefix, children, line_offset, last_line_offset_leaf
)
self._children_groups.append(group)
def get_last_line(self, suffix):
@@ -491,6 +585,9 @@ class _NodesTreeNode(object):
return max(line, self._node_children[-1].get_last_line(suffix))
return line
def __repr__(self):
return '<%s: %s>' % (self.__class__.__name__, self.tree_node)
class _NodesTree(object):
def __init__(self, module):
@@ -499,34 +596,19 @@ class _NodesTree(object):
self._module = module
self._prefix_remainder = ''
self.prefix = ''
self.indents = [0]
@property
def parsed_until_line(self):
return self._working_stack[-1].get_last_line(self.prefix)
def _get_insertion_node(self, indentation_node):
indentation = indentation_node.start_pos[1]
# find insertion node
while True:
node = self._working_stack[-1]
tree_node = node.tree_node
if tree_node.type == 'suite':
# A suite starts with NEWLINE, ...
node_indentation = tree_node.children[1].start_pos[1]
if indentation >= node_indentation: # Not a Dedent
# We might be at the most outer layer: modules. We
# don't want to depend on the first statement
# having the right indentation.
return node
elif tree_node.type == 'file_input':
def _update_insertion_node(self, indentation):
for node in reversed(list(self._working_stack)):
if node.indentation < indentation or node is self._working_stack[0]:
return node
self._working_stack.pop()
def add_parsed_nodes(self, tree_nodes):
def add_parsed_nodes(self, tree_nodes, keyword_token_indents):
old_prefix = self.prefix
tree_nodes = self._remove_endmarker(tree_nodes)
if not tree_nodes:
@@ -535,23 +617,27 @@ class _NodesTree(object):
assert tree_nodes[0].type != 'newline'
node = self._get_insertion_node(tree_nodes[0])
node = self._update_insertion_node(tree_nodes[0].start_pos[1])
assert node.tree_node.type in ('suite', 'file_input')
node.add_tree_nodes(old_prefix, tree_nodes)
# tos = Top of stack
self._update_tos(tree_nodes[-1])
self._update_parsed_node_tos(tree_nodes[-1], keyword_token_indents)
def _update_tos(self, tree_node):
if tree_node.type in ('suite', 'file_input'):
new_tos = _NodesTreeNode(tree_node)
def _update_parsed_node_tos(self, tree_node, keyword_token_indents):
if tree_node.type == 'suite':
def_leaf = tree_node.parent.children[0]
new_tos = _NodesTreeNode(
tree_node,
indentation=keyword_token_indents[def_leaf.start_pos][-1],
)
new_tos.add_tree_nodes('', list(tree_node.children))
self._working_stack[-1].add_child_node(new_tos)
self._working_stack.append(new_tos)
self._update_tos(tree_node.children[-1])
self._update_parsed_node_tos(tree_node.children[-1], keyword_token_indents)
elif _func_or_class_has_suite(tree_node):
self._update_tos(tree_node.children[-1])
self._update_parsed_node_tos(tree_node.children[-1], keyword_token_indents)
def _remove_endmarker(self, tree_nodes):
"""
@@ -561,7 +647,8 @@ class _NodesTree(object):
is_endmarker = last_leaf.type == 'endmarker'
self._prefix_remainder = ''
if is_endmarker:
separation = max(last_leaf.prefix.rfind('\n'), last_leaf.prefix.rfind('\r'))
prefix = last_leaf.prefix
separation = max(prefix.rfind('\n'), prefix.rfind('\r'))
if separation > -1:
# Remove the whitespace part of the prefix after a newline.
# That is not relevant if parentheses were opened. Always parse
@@ -577,6 +664,26 @@ class _NodesTree(object):
tree_nodes = tree_nodes[:-1]
return tree_nodes
def _get_matching_indent_nodes(self, tree_nodes, is_new_suite):
# There might be a random dedent where we have to stop copying.
# Invalid indents are ok, because the parser handled that
# properly before. An invalid dedent can happen, because a few
# lines above there was an invalid indent.
node_iterator = iter(tree_nodes)
if is_new_suite:
yield next(node_iterator)
first_node = next(node_iterator)
indent = _get_indentation(first_node)
if not is_new_suite and indent not in self.indents:
return
yield first_node
for n in node_iterator:
if _get_indentation(n) != indent:
return
yield n
def copy_nodes(self, tree_nodes, until_line, line_offset):
"""
Copies tree nodes from the old parser tree.
@@ -588,19 +695,38 @@ class _NodesTree(object):
# issues.
return []
self._get_insertion_node(tree_nodes[0])
indentation = _get_indentation(tree_nodes[0])
old_working_stack = list(self._working_stack)
old_prefix = self.prefix
old_indents = self.indents
self.indents = [i for i in self.indents if i <= indentation]
new_nodes, self._working_stack, self.prefix = self._copy_nodes(
self._update_insertion_node(indentation)
new_nodes, self._working_stack, self.prefix, added_indents = self._copy_nodes(
list(self._working_stack),
tree_nodes,
until_line,
line_offset,
self.prefix,
)
if new_nodes:
self.indents += added_indents
else:
self._working_stack = old_working_stack
self.prefix = old_prefix
self.indents = old_indents
return new_nodes
def _copy_nodes(self, working_stack, nodes, until_line, line_offset, prefix=''):
def _copy_nodes(self, working_stack, nodes, until_line, line_offset,
prefix='', is_nested=False):
new_nodes = []
added_indents = []
nodes = list(self._get_matching_indent_nodes(
nodes,
is_new_suite=is_nested,
))
new_prefix = ''
for node in nodes:
@@ -620,26 +746,83 @@ class _NodesTree(object):
if _func_or_class_has_suite(node):
new_nodes.append(node)
break
try:
c = node.children
except AttributeError:
pass
else:
# This case basically appears with error recovery of one line
# suites like `def foo(): bar.-`. In this case we might not
# include a newline in the statement and we need to take care
# of that.
n = node
if n.type == 'decorated':
n = n.children[-1]
if n.type in ('async_funcdef', 'async_stmt'):
n = n.children[-1]
if n.type in ('classdef', 'funcdef'):
suite_node = n.children[-1]
else:
suite_node = c[-1]
if suite_node.type in ('error_leaf', 'error_node'):
break
new_nodes.append(node)
# Pop error nodes at the end from the list
if new_nodes:
while new_nodes:
last_node = new_nodes[-1]
if (last_node.type in ('error_leaf', 'error_node')
or _is_flow_node(new_nodes[-1])):
# Error leafs/nodes don't have a defined start/end. Error
# nodes might not end with a newline (e.g. if there's an
# open `(`). Therefore ignore all of them unless they are
# succeeded with valid parser state.
# If we copy flows at the end, they might be continued
# after the copy limit (in the new parser).
# In this while loop we try to remove until we find a newline.
new_prefix = ''
new_nodes.pop()
while new_nodes:
last_node = new_nodes[-1]
if last_node.get_last_leaf().type == 'newline':
break
new_nodes.pop()
continue
if len(new_nodes) > 1 and new_nodes[-2].type == 'error_node':
# The problem here is that Parso error recovery sometimes
# influences nodes before this node.
# Since the new last node is an error node this will get
# cleaned up in the next while iteration.
new_nodes.pop()
continue
break
if not new_nodes:
return [], working_stack, prefix
return [], working_stack, prefix, added_indents
tos = working_stack[-1]
last_node = new_nodes[-1]
had_valid_suite_last = False
# Pop incomplete suites from the list
if _func_or_class_has_suite(last_node):
suite = last_node
while suite.type != 'suite':
suite = suite.children[-1]
suite_tos = _NodesTreeNode(suite)
indent = _get_suite_indentation(suite)
added_indents.append(indent)
suite_tos = _NodesTreeNode(suite, indentation=_get_indentation(last_node))
# Don't need to pass line_offset here, it's already done by the
# parent.
suite_nodes, new_working_stack, new_prefix = self._copy_nodes(
working_stack + [suite_tos], suite.children, until_line, line_offset
suite_nodes, new_working_stack, new_prefix, ai = self._copy_nodes(
working_stack + [suite_tos], suite.children, until_line, line_offset,
is_nested=True,
)
added_indents += ai
if len(suite_nodes) < 2:
# A suite only with newline is not valid.
new_nodes.pop()
@@ -650,25 +833,6 @@ class _NodesTree(object):
working_stack = new_working_stack
had_valid_suite_last = True
if new_nodes:
last_node = new_nodes[-1]
if (last_node.type in ('error_leaf', 'error_node') or
_is_flow_node(new_nodes[-1])):
# Error leafs/nodes don't have a defined start/end. Error
# nodes might not end with a newline (e.g. if there's an
# open `(`). Therefore ignore all of them unless they are
# succeeded with valid parser state.
# If we copy flows at the end, they might be continued
# after the copy limit (in the new parser).
# In this while loop we try to remove until we find a newline.
new_prefix = ''
new_nodes.pop()
while new_nodes:
last_node = new_nodes[-1]
if last_node.get_last_leaf().type == 'newline':
break
new_nodes.pop()
if new_nodes:
if not _ends_with_newline(new_nodes[-1].get_last_leaf()) and not had_valid_suite_last:
p = new_nodes[-1].get_next_leaf().prefix
@@ -688,11 +852,13 @@ class _NodesTree(object):
assert last_line_offset_leaf == ':'
else:
last_line_offset_leaf = new_nodes[-1].get_last_leaf()
tos.add_tree_nodes(prefix, new_nodes, line_offset, last_line_offset_leaf)
tos.add_tree_nodes(
prefix, new_nodes, line_offset, last_line_offset_leaf,
)
prefix = new_prefix
self._prefix_remainder = ''
return new_nodes, working_stack, prefix
return new_nodes, working_stack, prefix, added_indents
def close(self):
self._base_node.finish()
@@ -708,6 +874,8 @@ class _NodesTree(object):
lines = split_lines(self.prefix)
assert len(lines) > 0
if len(lines) == 1:
if lines[0].startswith(BOM_UTF8_STRING) and end_pos == [1, 0]:
end_pos[1] -= 1
end_pos[1] += len(lines[0])
else:
end_pos[0] += len(lines) - 1

View File

@@ -176,8 +176,7 @@ class _Context(object):
self._analyze_names(self._global_names, 'global')
self._analyze_names(self._nonlocal_names, 'nonlocal')
# Python2.6 doesn't have dict comprehensions.
global_name_strs = dict((n.value, n) for n in self._global_names)
global_name_strs = {n.value: n for n in self._global_names}
for nonlocal_name in self._nonlocal_names:
try:
global_name = global_name_strs[nonlocal_name.value]
@@ -864,6 +863,7 @@ class _TryStmtRule(SyntaxRule):
@ErrorFinder.register_rule(type='fstring')
class _FStringRule(SyntaxRule):
_fstring_grammar = None
message_expr = "f-string expression part cannot include a backslash"
message_nested = "f-string: expressions nested too deeply"
message_conversion = "f-string: invalid conversion character: expected 's', 'r', or 'a'"
@@ -874,6 +874,10 @@ class _FStringRule(SyntaxRule):
if depth >= 2:
self.add_issue(fstring_expr, message=self.message_nested)
expr = fstring_expr.children[1]
if '\\' in expr.get_code():
self.add_issue(expr, message=self.message_expr)
conversion = fstring_expr.children[2]
if conversion.type == 'fstring_conversion':
name = conversion.children[1]
@@ -915,6 +919,14 @@ class _CheckAssignmentRule(SyntaxRule):
if second.type == 'yield_expr':
error = 'yield expression'
elif second.type == 'testlist_comp':
# ([a, b] := [1, 2])
# ((a, b) := [1, 2])
if is_namedexpr:
if first == '(':
error = 'tuple'
elif first == '[':
error = 'list'
# This is not a comprehension, they were handled
# further above.
for child in second.children[::2]:
@@ -964,7 +976,7 @@ class _CheckAssignmentRule(SyntaxRule):
if error is not None:
if is_namedexpr:
message = 'cannot use named assignment with %s' % error
message = 'cannot use assignment expressions with %s' % error
else:
cannot = "can't" if self._normalizer.version < (3, 8) else "cannot"
message = ' '.join([cannot, "delete" if is_deletion else "assign to", error])

View File

@@ -1,159 +0,0 @@
# Grammar for Python
# Note: Changing the grammar specified in this file will most likely
# require corresponding changes in the parser module
# (../Modules/parsermodule.c). If you can't make the changes to
# that module yourself, please co-ordinate the required changes
# with someone who can; ask around on python-dev for help. Fred
# Drake <fdrake@acm.org> will probably be listening there.
# NOTE WELL: You should also follow all the steps listed in PEP 306,
# "How to Change Python's Grammar"
# Commands for Kees Blom's railroad program
#diagram:token NAME
#diagram:token NUMBER
#diagram:token STRING
#diagram:token NEWLINE
#diagram:token ENDMARKER
#diagram:token INDENT
#diagram:output\input python.bla
#diagram:token DEDENT
#diagram:output\textwidth 20.04cm\oddsidemargin 0.0cm\evensidemargin 0.0cm
#diagram:rules
# Start symbols for the grammar:
# single_input is a single interactive statement;
# file_input is a module or sequence of commands read from an input file;
# eval_input is the input for the eval() and input() functions.
# NB: compound_stmt in single_input is followed by extra NEWLINE!
single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE
file_input: (NEWLINE | stmt)* ENDMARKER
eval_input: testlist NEWLINE* ENDMARKER
decorator: '@' dotted_name [ '(' [arglist] ')' ] NEWLINE
decorators: decorator+
decorated: decorators (classdef | funcdef)
funcdef: 'def' NAME parameters ':' suite
parameters: '(' [varargslist] ')'
varargslist: ((fpdef ['=' test] ',')*
('*' NAME [',' '**' NAME] | '**' NAME) |
fpdef ['=' test] (',' fpdef ['=' test])* [','])
fpdef: NAME | '(' fplist ')'
fplist: fpdef (',' fpdef)* [',']
stmt: simple_stmt | compound_stmt
simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
small_stmt: (expr_stmt | print_stmt | del_stmt | pass_stmt | flow_stmt |
import_stmt | global_stmt | exec_stmt | assert_stmt)
expr_stmt: testlist (augassign (yield_expr|testlist) |
('=' (yield_expr|testlist))*)
augassign: ('+=' | '-=' | '*=' | '/=' | '%=' | '&=' | '|=' | '^=' |
'<<=' | '>>=' | '**=' | '//=')
# For normal assignments, additional restrictions enforced by the interpreter
print_stmt: 'print' ( [ test (',' test)* [','] ] |
'>>' test [ (',' test)+ [','] ] )
del_stmt: 'del' exprlist
pass_stmt: 'pass'
flow_stmt: break_stmt | continue_stmt | return_stmt | raise_stmt | yield_stmt
break_stmt: 'break'
continue_stmt: 'continue'
return_stmt: 'return' [testlist]
yield_stmt: yield_expr
raise_stmt: 'raise' [test [',' test [',' test]]]
import_stmt: import_name | import_from
import_name: 'import' dotted_as_names
import_from: ('from' ('.'* dotted_name | '.'+)
'import' ('*' | '(' import_as_names ')' | import_as_names))
import_as_name: NAME ['as' NAME]
dotted_as_name: dotted_name ['as' NAME]
import_as_names: import_as_name (',' import_as_name)* [',']
dotted_as_names: dotted_as_name (',' dotted_as_name)*
dotted_name: NAME ('.' NAME)*
global_stmt: 'global' NAME (',' NAME)*
exec_stmt: 'exec' expr ['in' test [',' test]]
assert_stmt: 'assert' test [',' test]
compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | decorated
if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
while_stmt: 'while' test ':' suite ['else' ':' suite]
for_stmt: 'for' exprlist 'in' testlist ':' suite ['else' ':' suite]
try_stmt: ('try' ':' suite
((except_clause ':' suite)+
['else' ':' suite]
['finally' ':' suite] |
'finally' ':' suite))
with_stmt: 'with' with_item ':' suite
# Dave: Python2.6 actually defines a little bit of a different label called
# 'with_var'. However in 2.7+ this is the default. Apply it for
# consistency reasons.
with_item: test ['as' expr]
# NB compile.c makes sure that the default except clause is last
except_clause: 'except' [test [('as' | ',') test]]
suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT
# Backward compatibility cruft to support:
# [ x for x in lambda: True, lambda: False if x() ]
# even while also allowing:
# lambda x: 5 if x else 2
# (But not a mix of the two)
testlist_safe: old_test [(',' old_test)+ [',']]
old_test: or_test | old_lambdef
old_lambdef: 'lambda' [varargslist] ':' old_test
test: or_test ['if' or_test 'else' test] | lambdef
or_test: and_test ('or' and_test)*
and_test: not_test ('and' not_test)*
not_test: 'not' not_test | comparison
comparison: expr (comp_op expr)*
comp_op: '<'|'>'|'=='|'>='|'<='|'<>'|'!='|'in'|'not' 'in'|'is'|'is' 'not'
expr: xor_expr ('|' xor_expr)*
xor_expr: and_expr ('^' and_expr)*
and_expr: shift_expr ('&' shift_expr)*
shift_expr: arith_expr (('<<'|'>>') arith_expr)*
arith_expr: term (('+'|'-') term)*
term: factor (('*'|'/'|'%'|'//') factor)*
factor: ('+'|'-'|'~') factor | power
power: atom trailer* ['**' factor]
atom: ('(' [yield_expr|testlist_comp] ')' |
'[' [listmaker] ']' |
'{' [dictorsetmaker] '}' |
'`' testlist1 '`' |
NAME | NUMBER | strings)
strings: STRING+
listmaker: test ( list_for | (',' test)* [','] )
# Dave: Renamed testlist_gexpr to testlist_comp, because in 2.7+ this is the
# default. It's more consistent like this.
testlist_comp: test ( gen_for | (',' test)* [','] )
lambdef: 'lambda' [varargslist] ':' test
trailer: '(' [arglist] ')' | '[' subscriptlist ']' | '.' NAME
subscriptlist: subscript (',' subscript)* [',']
subscript: '.' '.' '.' | test | [test] ':' [test] [sliceop]
sliceop: ':' [test]
exprlist: expr (',' expr)* [',']
testlist: test (',' test)* [',']
# Dave: Rename from dictmaker to dictorsetmaker, because this is more
# consistent with the following grammars.
dictorsetmaker: test ':' test (',' test ':' test)* [',']
classdef: 'class' NAME ['(' [testlist] ')'] ':' suite
arglist: (argument ',')* (argument [',']
|'*' test (',' argument)* [',' '**' test]
|'**' test)
argument: test [gen_for] | test '=' test # Really [keyword '='] test
list_iter: list_for | list_if
list_for: 'for' exprlist 'in' testlist_safe [list_iter]
list_if: 'if' old_test [list_iter]
gen_iter: gen_for | gen_if
gen_for: 'for' exprlist 'in' or_test [gen_iter]
gen_if: 'if' old_test [gen_iter]
testlist1: test (',' test)*
# not used in grammar, but may appear in "node" passed from Parser to Compiler
encoding_decl: NAME
yield_expr: 'yield' [testlist]

View File

@@ -16,7 +16,7 @@
# eval_input is the input for the eval() and input() functions.
# NB: compound_stmt in single_input is followed by extra NEWLINE!
single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE
file_input: (NEWLINE | stmt)* ENDMARKER
file_input: stmt* ENDMARKER
eval_input: testlist NEWLINE* ENDMARKER
decorator: '@' dotted_name [ '(' [arglist] ')' ] NEWLINE
@@ -30,7 +30,7 @@ varargslist: ((fpdef ['=' test] ',')*
fpdef: NAME | '(' fplist ')'
fplist: fpdef (',' fpdef)* [',']
stmt: simple_stmt | compound_stmt
stmt: simple_stmt | compound_stmt | NEWLINE
simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
small_stmt: (expr_stmt | print_stmt | del_stmt | pass_stmt | flow_stmt |
import_stmt | global_stmt | exec_stmt | assert_stmt)

View File

@@ -16,7 +16,7 @@
# eval_input is the input for the eval() functions.
# NB: compound_stmt in single_input is followed by extra NEWLINE!
single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE
file_input: (NEWLINE | stmt)* ENDMARKER
file_input: stmt* ENDMARKER
eval_input: testlist NEWLINE* ENDMARKER
decorator: '@' dotted_name [ '(' [arglist] ')' ] NEWLINE
@@ -33,7 +33,7 @@ varargslist: (vfpdef ['=' test] (',' vfpdef ['=' test])* [','
| '*' [vfpdef] (',' vfpdef ['=' test])* [',' '**' vfpdef] | '**' vfpdef)
vfpdef: NAME
stmt: simple_stmt | compound_stmt
stmt: simple_stmt | compound_stmt | NEWLINE
simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
small_stmt: (expr_stmt | del_stmt | pass_stmt | flow_stmt |
import_stmt | global_stmt | nonlocal_stmt | assert_stmt)

View File

@@ -16,7 +16,7 @@
# eval_input is the input for the eval() functions.
# NB: compound_stmt in single_input is followed by extra NEWLINE!
single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE
file_input: (NEWLINE | stmt)* ENDMARKER
file_input: stmt* ENDMARKER
eval_input: testlist NEWLINE* ENDMARKER
decorator: '@' dotted_name [ '(' [arglist] ')' ] NEWLINE
@@ -33,7 +33,7 @@ varargslist: (vfpdef ['=' test] (',' vfpdef ['=' test])* [','
| '*' [vfpdef] (',' vfpdef ['=' test])* [',' '**' vfpdef] | '**' vfpdef)
vfpdef: NAME
stmt: simple_stmt | compound_stmt
stmt: simple_stmt | compound_stmt | NEWLINE
simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
small_stmt: (expr_stmt | del_stmt | pass_stmt | flow_stmt |
import_stmt | global_stmt | nonlocal_stmt | assert_stmt)

View File

@@ -16,7 +16,7 @@
# eval_input is the input for the eval() functions.
# NB: compound_stmt in single_input is followed by extra NEWLINE!
single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE
file_input: (NEWLINE | stmt)* ENDMARKER
file_input: stmt* ENDMARKER
eval_input: testlist NEWLINE* ENDMARKER
decorator: '@' dotted_name [ '(' [arglist] ')' ] NEWLINE
@@ -38,7 +38,7 @@ varargslist: (vfpdef ['=' test] (',' vfpdef ['=' test])* [','
| '*' [vfpdef] (',' vfpdef ['=' test])* [',' '**' vfpdef] | '**' vfpdef)
vfpdef: NAME
stmt: simple_stmt | compound_stmt
stmt: simple_stmt | compound_stmt | NEWLINE
simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
small_stmt: (expr_stmt | del_stmt | pass_stmt | flow_stmt |
import_stmt | global_stmt | nonlocal_stmt | assert_stmt)

View File

@@ -9,7 +9,7 @@
# eval_input is the input for the eval() functions.
# NB: compound_stmt in single_input is followed by extra NEWLINE!
single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE
file_input: (NEWLINE | stmt)* ENDMARKER
file_input: stmt* ENDMARKER
eval_input: testlist NEWLINE* ENDMARKER
decorator: '@' dotted_name [ '(' [arglist] ')' ] NEWLINE
decorators: decorator+
@@ -35,7 +35,7 @@ varargslist: (vfpdef ['=' test] (',' vfpdef ['=' test])* [',' [
)
vfpdef: NAME
stmt: simple_stmt | compound_stmt
stmt: simple_stmt | compound_stmt | NEWLINE
simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
small_stmt: (expr_stmt | del_stmt | pass_stmt | flow_stmt |
import_stmt | global_stmt | nonlocal_stmt | assert_stmt)

View File

@@ -9,7 +9,7 @@
# eval_input is the input for the eval() functions.
# NB: compound_stmt in single_input is followed by extra NEWLINE!
single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE
file_input: (NEWLINE | stmt)* ENDMARKER
file_input: stmt* ENDMARKER
eval_input: testlist NEWLINE* ENDMARKER
decorator: '@' dotted_name [ '(' [arglist] ')' ] NEWLINE
decorators: decorator+
@@ -33,7 +33,7 @@ varargslist: (vfpdef ['=' test] (',' vfpdef ['=' test])* [',' [
)
vfpdef: NAME
stmt: simple_stmt | compound_stmt
stmt: simple_stmt | compound_stmt | NEWLINE
simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
small_stmt: (expr_stmt | del_stmt | pass_stmt | flow_stmt |
import_stmt | global_stmt | nonlocal_stmt | assert_stmt)

View File

@@ -9,7 +9,7 @@
# eval_input is the input for the eval() functions.
# NB: compound_stmt in single_input is followed by extra NEWLINE!
single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE
file_input: (NEWLINE | stmt)* ENDMARKER
file_input: stmt* ENDMARKER
eval_input: testlist NEWLINE* ENDMARKER
decorator: '@' dotted_name [ '(' [arglist] ')' ] NEWLINE
@@ -46,7 +46,7 @@ varargslist: vfpdef ['=' test ](',' vfpdef ['=' test])* ',' '/' [',' [ (vfpdef [
)
vfpdef: NAME
stmt: simple_stmt | compound_stmt
stmt: simple_stmt | compound_stmt | NEWLINE
simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
small_stmt: (expr_stmt | del_stmt | pass_stmt | flow_stmt |
import_stmt | global_stmt | nonlocal_stmt | assert_stmt)

View File

@@ -9,7 +9,7 @@
# eval_input is the input for the eval() functions.
# NB: compound_stmt in single_input is followed by extra NEWLINE!
single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE
file_input: (NEWLINE | stmt)* ENDMARKER
file_input: stmt* ENDMARKER
eval_input: testlist NEWLINE* ENDMARKER
decorator: '@' dotted_name [ '(' [arglist] ')' ] NEWLINE
@@ -46,7 +46,7 @@ varargslist: vfpdef ['=' test ](',' vfpdef ['=' test])* ',' '/' [',' [ (vfpdef [
)
vfpdef: NAME
stmt: simple_stmt | compound_stmt
stmt: simple_stmt | compound_stmt | NEWLINE
simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
small_stmt: (expr_stmt | del_stmt | pass_stmt | flow_stmt |
import_stmt | global_stmt | nonlocal_stmt | assert_stmt)

View File

@@ -172,5 +172,5 @@ A list of syntax/indentation errors I've encountered in CPython.
Version specific:
Python 3.5:
'yield' inside async function
Python 3.3/3.4:
Python 3.4:
can use starred expression only as assignment target

View File

@@ -44,8 +44,6 @@ class Parser(BaseParser):
# avoid extreme amounts of work around the subtle difference of 2/3
# grammar in list comoprehensions.
'list_for': tree.SyncCompFor,
# Same here. This just exists in Python 2.6.
'gen_for': tree.SyncCompFor,
'decorator': tree.Decorator,
'lambdef': tree.Lambda,
'old_lambdef': tree.Lambda,
@@ -128,10 +126,10 @@ class Parser(BaseParser):
if self._start_nonterminal == 'file_input' and \
(token.type == PythonTokenTypes.ENDMARKER
or token.type == DEDENT and '\n' not in last_leaf.value
and '\r' not in last_leaf.value):
or token.type == DEDENT and not last_leaf.value.endswith('\n')
and not last_leaf.value.endswith('\r')):
# In Python statements need to end with a newline. But since it's
# possible (and valid in Python ) that there's no newline at the
# possible (and valid in Python) that there's no newline at the
# end of a file, we have to recover even if the user doesn't want
# error recovery.
if self.stack[-1].dfa.from_rule == 'simple_stmt':
@@ -210,6 +208,7 @@ class Parser(BaseParser):
o = self._omit_dedent_list
if o and o[-1] == self._indent_counter:
o.pop()
self._indent_counter -= 1
continue
self._indent_counter -= 1

View File

@@ -12,14 +12,12 @@ memory optimizations here.
from __future__ import absolute_import
import sys
import string
import re
from collections import namedtuple
import itertools as _itertools
from codecs import BOM_UTF8
from parso.python.token import PythonTokenTypes
from parso._compatibility import py_version
from parso.utils import split_lines
@@ -50,7 +48,7 @@ BOM_UTF8_STRING = BOM_UTF8.decode('utf-8')
_token_collection_cache = {}
if py_version >= 30:
if sys.version_info.major >= 3:
# Python 3 has str.isidentifier() to check if a char is a valid identifier
is_identifier = str.isidentifier
else:
@@ -86,7 +84,7 @@ def _all_string_prefixes(version_info, include_fstring=False, only_fstring=False
# and don't contain any permuations (include 'fr', but not
# 'rf'). The various permutations will be generated.
valid_string_prefixes = ['b', 'r', 'u']
if version_info >= (3, 0):
if version_info.major >= 3:
valid_string_prefixes.append('br')
result = set([''])
@@ -106,7 +104,7 @@ def _all_string_prefixes(version_info, include_fstring=False, only_fstring=False
# create a list with upper and lower versions of each
# character
result.update(different_case_versions(t))
if version_info <= (2, 7):
if version_info.major == 2:
# In Python 2 the order cannot just be random.
result.update(different_case_versions('ur'))
result.update(different_case_versions('br'))
@@ -164,7 +162,7 @@ def _create_token_collection(version_info):
else:
Hexnumber = r'0[xX][0-9a-fA-F]+'
Binnumber = r'0[bB][01]+'
if version_info >= (3, 0):
if version_info.major >= 3:
Octnumber = r'0[oO][0-7]+'
else:
Octnumber = '0[oO]?[0-7]+'
@@ -219,10 +217,10 @@ def _create_token_collection(version_info):
Funny = group(Operator, Bracket, Special)
# First (or only) line of ' or " string.
ContStr = group(StringPrefix + r"'[^\r\n'\\]*(?:\\.[^\r\n'\\]*)*" +
group("'", r'\\(?:\r\n?|\n)'),
StringPrefix + r'"[^\r\n"\\]*(?:\\.[^\r\n"\\]*)*' +
group('"', r'\\(?:\r\n?|\n)'))
ContStr = group(StringPrefix + r"'[^\r\n'\\]*(?:\\.[^\r\n'\\]*)*"
+ group("'", r'\\(?:\r\n?|\n)'),
StringPrefix + r'"[^\r\n"\\]*(?:\\.[^\r\n"\\]*)*'
+ group('"', r'\\(?:\r\n?|\n)'))
pseudo_extra_pool = [Comment, Triple]
all_quotes = '"', "'", '"""', "'''"
if fstring_prefixes:
@@ -259,11 +257,14 @@ def _create_token_collection(version_info):
fstring_pattern_map[t + quote] = quote
ALWAYS_BREAK_TOKENS = (';', 'import', 'class', 'def', 'try', 'except',
'finally', 'while', 'with', 'return')
'finally', 'while', 'with', 'return', 'continue',
'break', 'del', 'pass', 'global', 'assert')
if version_info >= (3, 5):
ALWAYS_BREAK_TOKENS += ('async', 'nonlocal')
pseudo_token_compiled = _compile(PseudoToken)
return TokenCollection(
pseudo_token_compiled, single_quoted, triple_quoted, endpats,
whitespace, fstring_pattern_map, ALWAYS_BREAK_TOKENS
whitespace, fstring_pattern_map, set(ALWAYS_BREAK_TOKENS)
)
@@ -312,7 +313,7 @@ class FStringNode(object):
return not self.is_in_expr() and self.format_spec_count
def _close_fstring_if_necessary(fstring_stack, string, start_pos, additional_prefix):
def _close_fstring_if_necessary(fstring_stack, string, line_nr, column, additional_prefix):
for fstring_stack_index, node in enumerate(fstring_stack):
lstripped_string = string.lstrip()
len_lstrip = len(string) - len(lstripped_string)
@@ -320,7 +321,7 @@ def _close_fstring_if_necessary(fstring_stack, string, start_pos, additional_pre
token = PythonToken(
FSTRING_END,
node.quote,
start_pos,
(line_nr, column + len_lstrip),
prefix=additional_prefix+string[:len_lstrip],
)
additional_prefix = ''
@@ -382,13 +383,14 @@ def _print_tokens(func):
"""
def wrapper(*args, **kwargs):
for token in func(*args, **kwargs):
print(token) # This print is intentional for debugging!
yield token
return wrapper
# @_print_tokens
def tokenize_lines(lines, version_info, start_pos=(1, 0)):
def tokenize_lines(lines, version_info, start_pos=(1, 0), indents=None, is_first_token=True):
"""
A heavily modified Python standard library tokenizer.
@@ -399,17 +401,19 @@ def tokenize_lines(lines, version_info, start_pos=(1, 0)):
def dedent_if_necessary(start):
while start < indents[-1]:
if start > indents[-2]:
yield PythonToken(ERROR_DEDENT, '', (lnum, 0), '')
yield PythonToken(ERROR_DEDENT, '', (lnum, start), '')
indents[-1] = start
break
yield PythonToken(DEDENT, '', spos, '')
indents.pop()
yield PythonToken(DEDENT, '', spos, '')
pseudo_token, single_quoted, triple_quoted, endpats, whitespace, \
fstring_pattern_map, always_break_tokens, = \
_get_token_collection(version_info)
paren_level = 0 # count parentheses
indents = [0]
max = 0
if indents is None:
indents = [0]
max_ = 0
numchars = '0123456789'
contstr = ''
contline = None
@@ -420,25 +424,24 @@ def tokenize_lines(lines, version_info, start_pos=(1, 0)):
new_line = True
prefix = '' # Should never be required, but here for safety
additional_prefix = ''
first = True
lnum = start_pos[0] - 1
fstring_stack = []
for line in lines: # loop over lines in stream
lnum += 1
pos = 0
max = len(line)
if first:
max_ = len(line)
if is_first_token:
if line.startswith(BOM_UTF8_STRING):
additional_prefix = BOM_UTF8_STRING
line = line[1:]
max = len(line)
max_ = len(line)
# Fake that the part before was already parsed.
line = '^' * start_pos[1] + line
pos = start_pos[1]
max += start_pos[1]
max_ += start_pos[1]
first = False
is_first_token = False
if contstr: # continued string
endmatch = endprog.match(line)
@@ -454,7 +457,7 @@ def tokenize_lines(lines, version_info, start_pos=(1, 0)):
contline = contline + line
continue
while pos < max:
while pos < max_:
if fstring_stack:
tos = fstring_stack[-1]
if not tos.is_in_expr():
@@ -469,14 +472,15 @@ def tokenize_lines(lines, version_info, start_pos=(1, 0)):
)
tos.previous_lines = ''
continue
if pos == max:
if pos == max_:
break
rest = line[pos:]
fstring_end_token, additional_prefix, quote_length = _close_fstring_if_necessary(
fstring_stack,
rest,
(lnum, pos),
lnum,
pos,
additional_prefix,
)
pos += quote_length
@@ -497,9 +501,39 @@ def tokenize_lines(lines, version_info, start_pos=(1, 0)):
pseudomatch = pseudo_token.match(string_line, pos)
else:
pseudomatch = pseudo_token.match(line, pos)
if pseudomatch:
prefix = additional_prefix + pseudomatch.group(1)
additional_prefix = ''
start, pos = pseudomatch.span(2)
spos = (lnum, start)
token = pseudomatch.group(2)
if token == '':
assert prefix
additional_prefix = prefix
# This means that we have a line with whitespace/comments at
# the end, which just results in an endmarker.
break
initial = token[0]
else:
match = whitespace.match(line, pos)
initial = line[match.end()]
start = match.end()
spos = (lnum, start)
if new_line and initial not in '\r\n#' and (initial != '\\' or pseudomatch is None):
new_line = False
if paren_level == 0 and not fstring_stack:
indent_start = start
if indent_start > indents[-1]:
yield PythonToken(INDENT, '', spos, '')
indents.append(indent_start)
for t in dedent_if_necessary(indent_start):
yield t
if not pseudomatch: # scan for tokens
match = whitespace.match(line, pos)
if pos == 0:
if new_line and paren_level == 0 and not fstring_stack:
for t in dedent_if_necessary(match.end()):
yield t
pos = match.end()
@@ -512,50 +546,18 @@ def tokenize_lines(lines, version_info, start_pos=(1, 0)):
pos += 1
continue
prefix = additional_prefix + pseudomatch.group(1)
additional_prefix = ''
start, pos = pseudomatch.span(2)
spos = (lnum, start)
token = pseudomatch.group(2)
if token == '':
assert prefix
additional_prefix = prefix
# This means that we have a line with whitespace/comments at
# the end, which just results in an endmarker.
break
initial = token[0]
if new_line and initial not in '\r\n\\#':
new_line = False
if paren_level == 0 and not fstring_stack:
i = 0
indent_start = start
while line[i] == '\f':
i += 1
# TODO don't we need to change spos as well?
indent_start -= 1
if indent_start > indents[-1]:
yield PythonToken(INDENT, '', spos, '')
indents.append(indent_start)
for t in dedent_if_necessary(indent_start):
yield t
if (initial in numchars or # ordinary number
(initial == '.' and token != '.' and token != '...')):
if (initial in numchars # ordinary number
or (initial == '.' and token != '.' and token != '...')):
yield PythonToken(NUMBER, token, spos, prefix)
elif pseudomatch.group(3) is not None: # ordinary name
if token in always_break_tokens:
if token in always_break_tokens and (fstring_stack or paren_level):
fstring_stack[:] = []
paren_level = 0
# We only want to dedent if the token is on a new line.
if re.match(r'[ \f\t]*$', line[:start]):
while True:
indent = indents.pop()
if indent > start:
yield PythonToken(DEDENT, '', spos, '')
else:
indents.append(indent)
break
m = re.match(r'[ \f\t]*$', line[:start])
if m is not None:
for t in dedent_if_necessary(m.end()):
yield t
if is_identifier(token):
yield PythonToken(NAME, token, spos, prefix)
else:
@@ -588,7 +590,7 @@ def tokenize_lines(lines, version_info, start_pos=(1, 0)):
token = line[start:pos]
yield PythonToken(STRING, token, spos, prefix)
else:
contstr_start = (lnum, start) # multiple lines
contstr_start = spos # multiple lines
contstr = line[start:]
contline = line
break
@@ -650,10 +652,22 @@ def tokenize_lines(lines, version_info, start_pos=(1, 0)):
if contstr.endswith('\n') or contstr.endswith('\r'):
new_line = True
end_pos = lnum, max
if fstring_stack:
tos = fstring_stack[-1]
if tos.previous_lines:
yield PythonToken(
FSTRING_STRING, tos.previous_lines,
tos.last_string_start_pos,
# Never has a prefix because it can start anywhere and
# include whitespace.
prefix=''
)
end_pos = lnum, max_
# As the last position we just take the maximally possible position. We
# remove -1 for the last new line.
for indent in indents[1:]:
indents.pop()
yield PythonToken(DEDENT, '', end_pos, '')
yield PythonToken(ENDMARKER, '', end_pos, additional_prefix)

View File

@@ -57,10 +57,14 @@ from parso.utils import split_lines
_FLOW_CONTAINERS = set(['if_stmt', 'while_stmt', 'for_stmt', 'try_stmt',
'with_stmt', 'async_stmt', 'suite'])
_RETURN_STMT_CONTAINERS = set(['suite', 'simple_stmt']) | _FLOW_CONTAINERS
_FUNC_CONTAINERS = set(['suite', 'simple_stmt', 'decorated']) | _FLOW_CONTAINERS
_FUNC_CONTAINERS = set(
['suite', 'simple_stmt', 'decorated', 'async_funcdef']
) | _FLOW_CONTAINERS
_GET_DEFINITION_TYPES = set([
'expr_stmt', 'sync_comp_for', 'with_stmt', 'for_stmt', 'import_name',
'import_from', 'param'
'import_from', 'param', 'del_stmt',
])
_IMPORTS = set(['import_name', 'import_from'])
@@ -95,7 +99,7 @@ class DocstringMixin(object):
class PythonMixin(object):
"""
Some Python specific utitilies.
Some Python specific utilities.
"""
__slots__ = ()
@@ -233,6 +237,8 @@ class Name(_LeafWithoutNewlines):
while node is not None:
if node.type == 'suite':
return None
if node.type == 'namedexpr_test':
return node.children[0]
if node.type in _GET_DEFINITION_TYPES:
if self in node.get_defined_names(include_setitem):
return node
@@ -993,6 +999,14 @@ class KeywordStatement(PythonBaseNode):
def keyword(self):
return self.children[0].value
def get_defined_names(self, include_setitem=False):
keyword = self.keyword
if keyword == 'del':
return _defined_names(self.children[1], include_setitem)
if keyword in ('global', 'nonlocal'):
return self.children[1::2]
return []
class AssertStmt(KeywordStatement):
__slots__ = ()
@@ -1067,7 +1081,13 @@ class ExprStmt(PythonBaseNode, DocstringMixin):
def get_rhs(self):
"""Returns the right-hand-side of the equals."""
return self.children[-1]
node = self.children[-1]
if node.type == 'annassign':
if len(node.children) == 4:
node = node.children[3]
else:
node = node.children[1]
return node
def yield_operators(self):
"""

View File

@@ -1,6 +1,7 @@
import sys
from abc import abstractmethod, abstractproperty
from parso._compatibility import utf8_repr, encoding, py_version
from parso._compatibility import utf8_repr, encoding
from parso.utils import split_lines
@@ -44,8 +45,12 @@ class NodeOrLeaf(object):
Returns the node immediately following this node in this parent's
children list. If this node does not have a next sibling, it is None
"""
parent = self.parent
if parent is None:
return None
# Can't use index(); we need to test by identity
for i, child in enumerate(self.parent.children):
for i, child in enumerate(parent.children):
if child is self:
try:
return self.parent.children[i + 1]
@@ -58,8 +63,12 @@ class NodeOrLeaf(object):
children list. If this node does not have a previous sibling, it is
None.
"""
parent = self.parent
if parent is None:
return None
# Can't use index(); we need to test by identity
for i, child in enumerate(self.parent.children):
for i, child in enumerate(parent.children):
if child is self:
if i == 0:
return None
@@ -70,6 +79,9 @@ class NodeOrLeaf(object):
Returns the previous leaf in the parser tree.
Returns `None` if this is the first element in the parser tree.
"""
if self.parent is None:
return None
node = self
while True:
c = node.parent.children
@@ -93,6 +105,9 @@ class NodeOrLeaf(object):
Returns the next leaf in the parser tree.
Returns None if this is the last element in the parser tree.
"""
if self.parent is None:
return None
node = self
while True:
c = node.parent.children
@@ -321,7 +336,7 @@ class BaseNode(NodeOrLeaf):
@utf8_repr
def __repr__(self):
code = self.get_code().replace('\n', ' ').replace('\r', ' ').strip()
if not py_version >= 30:
if not sys.version_info.major >= 3:
code = code.encode(encoding, 'replace')
return "<%s: %s@%s,%s>" % \
(type(self).__name__, code, self.start_pos[0], self.start_pos[1])

View File

@@ -2,8 +2,9 @@ from collections import namedtuple
import re
import sys
from ast import literal_eval
from functools import total_ordering
from parso._compatibility import unicode, total_ordering
from parso._compatibility import unicode
# The following is a list in Python that are line breaks in str.splitlines, but
# not in Python. In Python only \r (Carriage Return, 0xD) and \n (Line Feed,
@@ -122,7 +123,7 @@ def _parse_version(version):
match = re.match(r'(\d+)(?:\.(\d)(?:\.\d+)?)?$', version)
if match is None:
raise ValueError('The given version is not in the right format. '
'Use something like "3.2" or "3".')
'Use something like "3.8" or "3".')
major = int(match.group(1))
minor = match.group(2)
@@ -163,13 +164,13 @@ class PythonVersionInfo(namedtuple('Version', 'major, minor')):
def parse_version_string(version=None):
"""
Checks for a valid version number (e.g. `3.2` or `2.7.1` or `3`) and
Checks for a valid version number (e.g. `3.8` or `2.7.1` or `3`) and
returns a corresponding version info that is always two characters long in
decimal.
"""
if version is None:
version = '%s.%s' % sys.version_info[:2]
if not isinstance(version, (unicode, str)):
raise TypeError("version must be a string like 3.2.")
raise TypeError('version must be a string like "3.8"')
return _parse_version(version)

View File

@@ -27,6 +27,7 @@ setup(name='parso',
packages=find_packages(exclude=['test']),
package_data={'parso': ['python/grammar*.txt']},
platforms=['any'],
python_requires='>=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*',
classifiers=[
'Development Status :: 4 - Beta',
'Environment :: Plugins',
@@ -34,10 +35,8 @@ setup(name='parso',
'License :: OSI Approved :: MIT License',
'Operating System :: OS Independent',
'Programming Language :: Python :: 2',
'Programming Language :: Python :: 2.6',
'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.3',
'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6',

View File

@@ -281,11 +281,11 @@ if sys.version_info >= (3, 6):
# Same as above, but for f-strings.
'f"s" b""',
'b"s" f""',
# f-string expression part cannot include a backslash
r'''f"{'\n'}"''',
]
if sys.version_info >= (2, 7):
# This is something that raises a different error in 2.6 than in the other
# versions. Just skip it for 2.6.
FAILING_EXAMPLES.append('[a, 1] += 3')
FAILING_EXAMPLES.append('[a, 1] += 3')
if sys.version_info[:2] == (3, 5):
# yields are not allowed in 3.5 async functions. Therefore test them
@@ -350,4 +350,14 @@ if sys.version_info[:2] >= (3, 8):
# Not in that issue
'(await a := x)',
'((await a) := x)',
# new discoveries
'((a, b) := (1, 2))',
'([a, b] := [1, 2])',
'({a, b} := {1, 2})',
'({a: b} := {1: 2})',
'(a + b := 1)',
'(True := 1)',
'(False := 1)',
'(None := 1)',
'(__debug__ := 1)',
]

View File

@@ -50,6 +50,11 @@ def find_python_files_in_tree(file_path):
yield file_path
return
for root, dirnames, filenames in os.walk(file_path):
if 'chardet' in root:
# Stuff like chardet/langcyrillicmodel.py is just very slow to
# parse and machine generated, so ignore those.
continue
for name in filenames:
if name.endswith('.py'):
yield os.path.join(root, name)
@@ -102,9 +107,17 @@ class LineCopy:
class FileModification:
@classmethod
def generate(cls, code_lines, change_count):
def generate(cls, code_lines, change_count, previous_file_modification=None):
if previous_file_modification is not None and random.random() > 0.5:
# We want to keep the previous modifications in some cases to make
# more complex parser issues visible.
code_lines = previous_file_modification.apply(code_lines)
added_modifications = previous_file_modification.modification_list
else:
added_modifications = []
return cls(
list(cls._generate_line_modifications(code_lines, change_count)),
added_modifications
+ list(cls._generate_line_modifications(code_lines, change_count)),
# work with changed trees more than with normal ones.
check_original=random.random() > 0.8,
)
@@ -158,18 +171,18 @@ class FileModification:
yield l
def __init__(self, modification_list, check_original):
self._modification_list = modification_list
self.modification_list = modification_list
self._check_original = check_original
def _apply(self, code_lines):
def apply(self, code_lines):
changed_lines = list(code_lines)
for modification in self._modification_list:
for modification in self.modification_list:
modification.apply(changed_lines)
return changed_lines
def run(self, grammar, code_lines, print_code):
code = ''.join(code_lines)
modified_lines = self._apply(code_lines)
modified_lines = self.apply(code_lines)
modified_code = ''.join(modified_lines)
if print_code:
@@ -197,7 +210,7 @@ class FileModification:
class FileTests:
def __init__(self, file_path, test_count, change_count):
self._path = file_path
with open(file_path) as f:
with open(file_path, errors='replace') as f:
code = f.read()
self._code_lines = split_lines(code, keepends=True)
self._test_count = test_count
@@ -228,8 +241,12 @@ class FileTests:
def run(self, grammar, debugger):
def iterate():
fm = None
for _ in range(self._test_count):
fm = FileModification.generate(self._code_lines, self._change_count)
fm = FileModification.generate(
self._code_lines, self._change_count,
previous_file_modification=fm
)
self._file_modifications.append(fm)
yield fm

View File

@@ -5,12 +5,14 @@ Test all things related to the ``jedi.cache`` module.
from os import unlink
import pytest
import time
from parso.cache import _NodeCacheItem, save_module, load_module, \
_get_hashed_path, parser_cache, _load_from_file_system, _save_to_file_system
from parso import load_grammar
from parso import cache
from parso import file_io
from parso import parse
@pytest.fixture()
@@ -87,3 +89,53 @@ def test_modulepickling_simulate_deleted_cache(tmpdir):
cached2 = load_module(grammar._hashed, io)
assert cached2 is None
def test_cache_limit():
def cache_size():
return sum(len(v) for v in parser_cache.values())
try:
parser_cache.clear()
future_node_cache_item = _NodeCacheItem('bla', [], change_time=time.time() + 10e6)
old_node_cache_item = _NodeCacheItem('bla', [], change_time=time.time() - 10e4)
parser_cache['some_hash_old'] = {
'/path/%s' % i: old_node_cache_item for i in range(300)
}
parser_cache['some_hash_new'] = {
'/path/%s' % i: future_node_cache_item for i in range(300)
}
assert cache_size() == 600
parse('somecode', cache=True, path='/path/somepath')
assert cache_size() == 301
finally:
parser_cache.clear()
class _FixedTimeFileIO(file_io.KnownContentFileIO):
def __init__(self, path, content, last_modified):
super(_FixedTimeFileIO, self).__init__(path, content)
self._last_modified = last_modified
def get_last_modified(self):
return self._last_modified
@pytest.mark.parametrize('diff_cache', [False, True])
@pytest.mark.parametrize('use_file_io', [False, True])
def test_cache_last_used_update(diff_cache, use_file_io):
p = '/path/last-used'
parser_cache.clear() # Clear, because then it's easier to find stuff.
parse('somecode', cache=True, path=p)
node_cache_item = next(iter(parser_cache.values()))[p]
now = time.time()
assert node_cache_item.last_used < now
if use_file_io:
f = _FixedTimeFileIO(p, 'code', node_cache_item.last_used - 10)
parse(file_io=f, cache=True, diff_cache=diff_cache)
else:
parse('somecode2', cache=True, path=p, diff_cache=diff_cache)
node_cache_item = next(iter(parser_cache.values()))[p]
assert now < node_cache_item.last_used < time.time()

View File

@@ -8,7 +8,7 @@ import pytest
from parso.utils import split_lines
from parso import cache
from parso import load_grammar
from parso.python.diff import DiffParser, _assert_valid_graph
from parso.python.diff import DiffParser, _assert_valid_graph, _assert_nodes_are_equal
from parso import parse
ANY = object()
@@ -69,6 +69,9 @@ class Differ(object):
_assert_valid_graph(new_module)
without_diff_parser_module = parse(code)
_assert_nodes_are_equal(new_module, without_diff_parser_module)
error_node = _check_error_leaves_nodes(new_module)
assert expect_error_leaves == (error_node is not None), error_node
if parsers is not ANY:
@@ -88,15 +91,15 @@ def test_change_and_undo(differ):
# Parse the function and a.
differ.initialize(func_before + 'a')
# Parse just b.
differ.parse(func_before + 'b', copies=1, parsers=1)
differ.parse(func_before + 'b', copies=1, parsers=2)
# b has changed to a again, so parse that.
differ.parse(func_before + 'a', copies=1, parsers=1)
differ.parse(func_before + 'a', copies=1, parsers=2)
# Same as before parsers should not be used. Just a simple copy.
differ.parse(func_before + 'a', copies=1)
# Now that we have a newline at the end, everything is easier in Python
# syntax, we can parse once and then get a copy.
differ.parse(func_before + 'a\n', copies=1, parsers=1)
differ.parse(func_before + 'a\n', copies=1, parsers=2)
differ.parse(func_before + 'a\n', copies=1)
# Getting rid of an old parser: Still no parsers used.
@@ -135,7 +138,7 @@ def test_if_simple(differ):
differ.initialize(src + 'a')
differ.parse(src + else_ + "a", copies=0, parsers=1)
differ.parse(else_, parsers=1, copies=1, expect_error_leaves=True)
differ.parse(else_, parsers=2, expect_error_leaves=True)
differ.parse(src + else_, parsers=1)
@@ -152,7 +155,7 @@ def test_func_with_for_and_comment(differ):
# COMMENT
a""")
differ.initialize(src)
differ.parse('a\n' + src, copies=1, parsers=2)
differ.parse('a\n' + src, copies=1, parsers=3)
def test_one_statement_func(differ):
@@ -236,7 +239,7 @@ def test_backslash(differ):
def y():
pass
""")
differ.parse(src, parsers=2)
differ.parse(src, parsers=1)
src = dedent(r"""
def first():
@@ -247,7 +250,7 @@ def test_backslash(differ):
def second():
pass
""")
differ.parse(src, parsers=1)
differ.parse(src, parsers=2)
def test_full_copy(differ):
@@ -261,10 +264,10 @@ def test_wrong_whitespace(differ):
hello
'''
differ.initialize(code)
differ.parse(code + 'bar\n ', parsers=3)
differ.parse(code + 'bar\n ', parsers=2, expect_error_leaves=True)
code += """abc(\npass\n """
differ.parse(code, parsers=2, copies=1, expect_error_leaves=True)
differ.parse(code, parsers=2, expect_error_leaves=True)
def test_issues_with_error_leaves(differ):
@@ -279,7 +282,7 @@ def test_issues_with_error_leaves(differ):
str
''')
differ.initialize(code)
differ.parse(code2, parsers=1, copies=1, expect_error_leaves=True)
differ.parse(code2, parsers=1, expect_error_leaves=True)
def test_unfinished_nodes(differ):
@@ -299,7 +302,7 @@ def test_unfinished_nodes(differ):
a(1)
''')
differ.initialize(code)
differ.parse(code2, parsers=1, copies=2)
differ.parse(code2, parsers=2, copies=2)
def test_nested_if_and_scopes(differ):
@@ -365,7 +368,7 @@ def test_totally_wrong_whitespace(differ):
'''
differ.initialize(code1)
differ.parse(code2, parsers=4, copies=0, expect_error_leaves=True)
differ.parse(code2, parsers=2, copies=0, expect_error_leaves=True)
def test_node_insertion(differ):
@@ -439,7 +442,7 @@ def test_in_class_movements(differ):
""")
differ.initialize(code1)
differ.parse(code2, parsers=2, copies=1)
differ.parse(code2, parsers=1)
def test_in_parentheses_newlines(differ):
@@ -484,7 +487,7 @@ def test_indentation_issue(differ):
""")
differ.initialize(code1)
differ.parse(code2, parsers=1)
differ.parse(code2, parsers=2)
def test_endmarker_newline(differ):
@@ -585,7 +588,7 @@ def test_if_removal_and_reappearence(differ):
la
''')
differ.initialize(code1)
differ.parse(code2, parsers=1, copies=4, expect_error_leaves=True)
differ.parse(code2, parsers=3, copies=2, expect_error_leaves=True)
differ.parse(code1, parsers=1, copies=1)
differ.parse(code3, parsers=1, copies=1)
@@ -618,8 +621,8 @@ def test_differing_docstrings(differ):
''')
differ.initialize(code1)
differ.parse(code2, parsers=3, copies=1)
differ.parse(code1, parsers=3, copies=1)
differ.parse(code2, parsers=2, copies=1)
differ.parse(code1, parsers=2, copies=1)
def test_one_call_in_function_change(differ):
@@ -649,7 +652,7 @@ def test_one_call_in_function_change(differ):
''')
differ.initialize(code1)
differ.parse(code2, parsers=1, copies=1, expect_error_leaves=True)
differ.parse(code2, parsers=2, copies=1, expect_error_leaves=True)
differ.parse(code1, parsers=2, copies=1)
@@ -711,7 +714,7 @@ def test_docstring_removal(differ):
differ.initialize(code1)
differ.parse(code2, parsers=1, copies=2)
differ.parse(code1, parsers=2, copies=1)
differ.parse(code1, parsers=3, copies=1)
def test_paren_in_strange_position(differ):
@@ -783,7 +786,7 @@ def test_parentheses_before_method(differ):
differ.initialize(code1)
differ.parse(code2, parsers=2, copies=1, expect_error_leaves=True)
differ.parse(code1, parsers=1, copies=1)
differ.parse(code1, parsers=2, copies=1)
def test_indentation_issues(differ):
@@ -824,10 +827,10 @@ def test_indentation_issues(differ):
''')
differ.initialize(code1)
differ.parse(code2, parsers=2, copies=2, expect_error_leaves=True)
differ.parse(code1, copies=2)
differ.parse(code2, parsers=3, copies=1, expect_error_leaves=True)
differ.parse(code1, copies=1, parsers=2)
differ.parse(code3, parsers=2, copies=1)
differ.parse(code1, parsers=1, copies=2)
differ.parse(code1, parsers=2, copies=1)
def test_error_dedent_issues(differ):
@@ -860,7 +863,7 @@ def test_error_dedent_issues(differ):
''')
differ.initialize(code1)
differ.parse(code2, parsers=6, copies=2, expect_error_leaves=True)
differ.parse(code2, parsers=3, copies=0, expect_error_leaves=True)
differ.parse(code1, parsers=1, copies=0)
@@ -892,8 +895,8 @@ Some'random text: yeah
''')
differ.initialize(code1)
differ.parse(code2, parsers=1, copies=1, expect_error_leaves=True)
differ.parse(code1, parsers=1, copies=1)
differ.parse(code2, parsers=2, copies=1, expect_error_leaves=True)
differ.parse(code1, parsers=2, copies=1)
def test_many_nested_ifs(differ):
@@ -946,7 +949,7 @@ def test_with_and_funcdef_in_call(differ, prefix):
code2 = insert_line_into_code(code1, 3, 'def y(self, args):\n')
differ.initialize(code1)
differ.parse(code2, parsers=3, expect_error_leaves=True)
differ.parse(code2, parsers=1, expect_error_leaves=True)
differ.parse(code1, parsers=1)
@@ -961,14 +964,10 @@ def test_wrong_backslash(differ):
code2 = insert_line_into_code(code1, 3, '\\.whl$\n')
differ.initialize(code1)
differ.parse(code2, parsers=2, copies=2, expect_error_leaves=True)
differ.parse(code2, parsers=3, copies=1, expect_error_leaves=True)
differ.parse(code1, parsers=1, copies=1)
def test_comment_change(differ):
differ.initialize('')
def test_random_unicode_characters(differ):
"""
Those issues were all found with the fuzzer.
@@ -984,12 +983,11 @@ def test_random_unicode_characters(differ):
differ.parse(s, parsers=1, expect_error_leaves=True)
differ.parse('')
differ.parse(s + '\n', parsers=1, expect_error_leaves=True)
differ.parse(u' result = (\r\f\x17\t\x11res)', parsers=2, expect_error_leaves=True)
differ.parse(u' result = (\r\f\x17\t\x11res)', parsers=1, expect_error_leaves=True)
differ.parse('')
differ.parse(' a( # xx\ndef', parsers=2, expect_error_leaves=True)
differ.parse(' a( # xx\ndef', parsers=1, expect_error_leaves=True)
@pytest.mark.skipif(sys.version_info < (2, 7), reason="No set literals in Python 2.6")
def test_dedent_end_positions(differ):
code1 = dedent('''\
if 1:
@@ -998,7 +996,7 @@ def test_dedent_end_positions(differ):
c = {
5}
''')
code2 = dedent('''\
code2 = dedent(u'''\
if 1:
if ⌟ഒᜈྡྷṭb:
2
@@ -1041,7 +1039,7 @@ def test_random_character_insertion(differ):
# 4
''')
differ.initialize(code1)
differ.parse(code2, copies=1, parsers=3, expect_error_leaves=True)
differ.parse(code2, copies=1, parsers=1, expect_error_leaves=True)
differ.parse(code1, copies=1, parsers=1)
@@ -1102,8 +1100,8 @@ def test_all_sorts_of_indentation(differ):
end
''')
differ.initialize(code1)
differ.parse(code2, copies=1, parsers=4, expect_error_leaves=True)
differ.parse(code1, copies=1, parsers=3)
differ.parse(code2, copies=1, parsers=1, expect_error_leaves=True)
differ.parse(code1, copies=1, parsers=1, expect_error_leaves=True)
code3 = dedent('''\
if 1:
@@ -1113,7 +1111,7 @@ def test_all_sorts_of_indentation(differ):
d
\x00
''')
differ.parse(code3, parsers=2, expect_error_leaves=True)
differ.parse(code3, parsers=1, expect_error_leaves=True)
differ.parse('')
@@ -1130,7 +1128,7 @@ def test_dont_copy_dedents_in_beginning(differ):
''')
differ.initialize(code1)
differ.parse(code2, copies=1, parsers=1, expect_error_leaves=True)
differ.parse(code1, parsers=2)
differ.parse(code1, parsers=1, copies=1)
def test_dont_copy_error_leaves(differ):
@@ -1150,7 +1148,7 @@ def test_dont_copy_error_leaves(differ):
''')
differ.initialize(code1)
differ.parse(code2, parsers=1, expect_error_leaves=True)
differ.parse(code1, parsers=2)
differ.parse(code1, parsers=1)
def test_error_dedent_in_between(differ):
@@ -1174,7 +1172,7 @@ def test_error_dedent_in_between(differ):
z
''')
differ.initialize(code1)
differ.parse(code2, copies=1, parsers=1, expect_error_leaves=True)
differ.parse(code2, copies=1, parsers=2, expect_error_leaves=True)
differ.parse(code1, copies=1, parsers=2)
@@ -1200,8 +1198,8 @@ def test_some_other_indentation_issues(differ):
a
''')
differ.initialize(code1)
differ.parse(code2, copies=2, parsers=1, expect_error_leaves=True)
differ.parse(code1, copies=2, parsers=2)
differ.parse(code2, copies=0, parsers=1, expect_error_leaves=True)
differ.parse(code1, copies=1, parsers=1)
def test_open_bracket_case1(differ):
@@ -1241,8 +1239,8 @@ def test_open_bracket_case2(differ):
d
''')
differ.initialize(code1)
differ.parse(code2, copies=1, parsers=2, expect_error_leaves=True)
differ.parse(code1, copies=2, parsers=0, expect_error_leaves=True)
differ.parse(code2, copies=0, parsers=1, expect_error_leaves=True)
differ.parse(code1, copies=0, parsers=1, expect_error_leaves=True)
def test_some_weird_removals(differ):
@@ -1267,7 +1265,7 @@ def test_some_weird_removals(differ):
''')
differ.initialize(code1)
differ.parse(code2, copies=1, parsers=1, expect_error_leaves=True)
differ.parse(code3, copies=1, parsers=2, expect_error_leaves=True)
differ.parse(code3, copies=1, parsers=3, expect_error_leaves=True)
differ.parse(code1, copies=1)
@@ -1286,3 +1284,467 @@ def test_async_copy(differ):
differ.initialize(code1)
differ.parse(code2, copies=1, parsers=1)
differ.parse(code1, copies=1, parsers=1, expect_error_leaves=True)
def test_parent_on_decorator(differ):
code1 = dedent('''\
class AClass:
@decorator()
def b_test(self):
print("Hello")
print("world")
def a_test(self):
pass''')
code2 = dedent('''\
class AClass:
@decorator()
def b_test(self):
print("Hello")
print("world")
def a_test(self):
pass''')
differ.initialize(code1)
module_node = differ.parse(code2, parsers=1)
cls = module_node.children[0]
cls_suite = cls.children[-1]
assert len(cls_suite.children) == 3
def test_wrong_indent_in_def(differ):
code1 = dedent('''\
def x():
a
b
''')
code2 = dedent('''\
def x():
//
b
c
''')
differ.initialize(code1)
differ.parse(code2, parsers=1, expect_error_leaves=True)
differ.parse(code1, parsers=1)
def test_backslash_issue(differ):
code1 = dedent('''
pre = (
'')
after = 'instead'
''')
code2 = dedent('''
pre = (
'')
\\if
''')
differ.initialize(code1)
differ.parse(code2, parsers=1, copies=1, expect_error_leaves=True)
differ.parse(code1, parsers=1, copies=1)
def test_paren_with_indentation(differ):
code1 = dedent('''
class C:
def f(self, fullname, path=None):
x
def load_module(self, fullname):
a
for prefix in self.search_path:
try:
b
except ImportError:
c
else:
raise
def x():
pass
''')
code2 = dedent('''
class C:
def f(self, fullname, path=None):
x
(
a
for prefix in self.search_path:
try:
b
except ImportError:
c
else:
raise
''')
differ.initialize(code1)
differ.parse(code2, parsers=1, copies=1, expect_error_leaves=True)
differ.parse(code1, parsers=3, copies=1)
def test_error_dedent_in_function(differ):
code1 = dedent('''\
def x():
a
b
c
d
''')
code2 = dedent('''\
def x():
a
b
c
d
e
''')
differ.initialize(code1)
differ.parse(code2, parsers=2, copies=1, expect_error_leaves=True)
def test_with_formfeed(differ):
code1 = dedent('''\
@bla
async def foo():
1
yield from []
return
return ''
''')
code2 = dedent('''\
@bla
async def foo():
1
\x0cimport
return
return ''
''')
differ.initialize(code1)
differ.parse(code2, parsers=ANY, copies=ANY, expect_error_leaves=True)
def test_repeating_invalid_indent(differ):
code1 = dedent('''\
def foo():
return
@bla
a
def foo():
a
b
c
''')
code2 = dedent('''\
def foo():
return
@bla
a
b
c
''')
differ.initialize(code1)
differ.parse(code2, parsers=2, copies=1, expect_error_leaves=True)
def test_another_random_indent(differ):
code1 = dedent('''\
def foo():
a
b
c
return
def foo():
d
''')
code2 = dedent('''\
def foo():
a
c
return
def foo():
d
''')
differ.initialize(code1)
differ.parse(code2, parsers=1, copies=3)
def test_invalid_function(differ):
code1 = dedent('''\
a
def foo():
def foo():
b
''')
code2 = dedent('''\
a
def foo():
def foo():
b
''')
differ.initialize(code1)
differ.parse(code2, parsers=1, copies=1, expect_error_leaves=True)
def test_async_func2(differ):
code1 = dedent('''\
async def foo():
return ''
@bla
async def foo():
x
''')
code2 = dedent('''\
async def foo():
return ''
{
@bla
async def foo():
x
y
''')
differ.initialize(code1)
differ.parse(code2, parsers=ANY, copies=ANY, expect_error_leaves=True)
def test_weird_ending(differ):
code1 = dedent('''\
def foo():
a
return
''')
code2 = dedent('''\
def foo():
a
nonlocal xF"""
y"""''')
differ.initialize(code1)
differ.parse(code2, parsers=1, copies=1, expect_error_leaves=True)
def test_nested_class(differ):
code1 = dedent('''\
def c():
a = 3
class X:
b
''')
code2 = dedent('''\
def c():
a = 3
class X:
elif
''')
differ.initialize(code1)
differ.parse(code2, parsers=1, copies=1, expect_error_leaves=True)
def test_class_with_paren_breaker(differ):
code1 = dedent('''\
class Grammar:
x
def parse():
y
parser(
)
z
''')
code2 = dedent('''\
class Grammar:
x
def parse():
y
parser(
finally ;
)
z
''')
differ.initialize(code1)
differ.parse(code2, parsers=3, copies=1, expect_error_leaves=True)
def test_byte_order_mark(differ):
code2 = dedent('''\
x
\ufeff
else :
''')
differ.initialize('\n')
differ.parse(code2, parsers=2, expect_error_leaves=True)
code3 = dedent('''\
\ufeff
if:
x
''')
differ.initialize('\n')
differ.parse(code3, parsers=2, expect_error_leaves=True)
def test_byte_order_mark2(differ):
code = u'\ufeff# foo'
differ.initialize(code)
differ.parse(code + 'x', parsers=ANY)
def test_byte_order_mark3(differ):
code1 = u"\ufeff#\ny\n"
code2 = u'x\n\ufeff#\n\ufeff#\ny\n'
differ.initialize(code1)
differ.parse(code2, expect_error_leaves=True, parsers=ANY, copies=ANY)
differ.parse(code1, parsers=1)
def test_backslash_insertion(differ):
code1 = dedent('''
def f():
x
def g():
base = "" \\
""
return
''')
code2 = dedent('''
def f():
x
def g():
base = "" \\
def h():
""
return
''')
differ.initialize(code1)
differ.parse(code2, parsers=2, copies=1, expect_error_leaves=True)
differ.parse(code1, parsers=2, copies=1)
def test_fstring_with_error_leaf(differ):
code1 = dedent("""\
def f():
x
def g():
y
""")
code2 = dedent("""\
def f():
x
F'''
def g():
y
{a
\x01
""")
differ.initialize(code1)
differ.parse(code2, parsers=1, copies=1, expect_error_leaves=True)
def test_yet_another_backslash(differ):
code1 = dedent('''\
def f():
x
def g():
y
base = "" \\
"" % to
return
''')
code2 = dedent('''\
def f():
x
def g():
y
base = "" \\
\x0f
return
''')
differ.initialize(code1)
differ.parse(code2, parsers=ANY, copies=ANY, expect_error_leaves=True)
differ.parse(code1, parsers=ANY, copies=ANY)
def test_backslash_before_def(differ):
code1 = dedent('''\
def f():
x
def g():
y
z
''')
code2 = dedent('''\
def f():
x
>\\
def g():
y
x
z
''')
differ.initialize(code1)
differ.parse(code2, parsers=3, copies=1, expect_error_leaves=True)
def test_backslash_with_imports(differ):
code1 = dedent('''\
from x import y, \\
''')
code2 = dedent('''\
from x import y, \\
z
''')
differ.initialize(code1)
differ.parse(code2, parsers=1)
differ.parse(code1, parsers=1)
def test_one_line_function_error_recovery(differ):
code1 = dedent('''\
class X:
x
def y(): word """
# a
# b
c(self)
''')
code2 = dedent('''\
class X:
x
def y(): word """
# a
# b
c(\x01+self)
''')
differ.initialize(code1)
differ.parse(code2, parsers=1, copies=1, expect_error_leaves=True)
def test_one_line_property_error_recovery(differ):
code1 = dedent('''\
class X:
x
@property
def encoding(self): True -
return 1
''')
code2 = dedent('''\
class X:
x
@property
def encoding(self): True -
return 1
''')
differ.initialize(code1)
differ.parse(code2, parsers=2, copies=1, expect_error_leaves=True)

View File

@@ -1,3 +1,5 @@
from textwrap import dedent
from parso import parse, load_grammar
@@ -83,3 +85,65 @@ def test_invalid_token_in_fstr():
assert error1.type == 'error_leaf'
assert error2.value == '"'
assert error2.type == 'error_leaf'
def test_dedent_issues1():
code = dedent('''\
class C:
@property
f
g
end
''')
module = load_grammar(version='3.8').parse(code)
klass, endmarker = module.children
suite = klass.children[-1]
assert suite.children[2].type == 'error_leaf'
assert suite.children[3].get_code(include_prefix=False) == 'f\n'
assert suite.children[5].get_code(include_prefix=False) == 'g\n'
assert suite.type == 'suite'
def test_dedent_issues2():
code = dedent('''\
class C:
@property
if 1:
g
else:
h
end
''')
module = load_grammar(version='3.8').parse(code)
klass, endmarker = module.children
suite = klass.children[-1]
assert suite.children[2].type == 'error_leaf'
if_ = suite.children[3]
assert if_.children[0] == 'if'
assert if_.children[3].type == 'suite'
assert if_.children[3].get_code() == '\n g\n'
assert if_.children[4] == 'else'
assert if_.children[6].type == 'suite'
assert if_.children[6].get_code() == '\n h\n'
assert suite.children[4].get_code(include_prefix=False) == 'end\n'
assert suite.type == 'suite'
def test_dedent_issues3():
code = dedent('''\
class C:
f
g
''')
module = load_grammar(version='3.8').parse(code)
klass, endmarker = module.children
suite = klass.children[-1]
assert len(suite.children) == 4
assert suite.children[1].get_code() == ' f\n'
assert suite.children[1].type == 'simple_stmt'
assert suite.children[2].get_code() == ''
assert suite.children[2].type == 'error_leaf'
assert suite.children[2].token_type == 'ERROR_DEDENT'
assert suite.children[3].get_code() == ' g\n'
assert suite.children[3].type == 'simple_stmt'

View File

@@ -118,3 +118,16 @@ def test_carriage_return_at_end(code, types):
assert tree.get_code() == code
assert [c.type for c in tree.children] == types
assert tree.end_pos == (len(code) + 1, 0)
@pytest.mark.parametrize('code', [
' ',
' F"""',
' F"""\n',
' F""" \n',
' F""" \n3',
' f"""\n"""',
' f"""\n"""\n',
])
def test_full_code_round_trip(code):
assert parse(code).get_code() == code

View File

@@ -28,4 +28,4 @@ def test_invalid_grammar_version(string):
def test_grammar_int_version():
with pytest.raises(TypeError):
load_grammar(version=3.2)
load_grammar(version=3.8)

View File

@@ -5,9 +5,9 @@ tests of pydocstyle.
import difflib
import re
from functools import total_ordering
import parso
from parso._compatibility import total_ordering
from parso.utils import python_bytes_to_unicode

View File

@@ -142,7 +142,7 @@ def test_yields(each_version):
def test_yield_from():
y, = get_yield_exprs('def x(): (yield from 1)', '3.3')
y, = get_yield_exprs('def x(): (yield from 1)', '3.8')
assert y.type == 'yield_expr'
@@ -222,3 +222,19 @@ def test_is_definition(code, name_index, is_definition, include_setitem):
name = name.get_next_leaf()
assert name.is_definition(include_setitem=include_setitem) == is_definition
def test_iter_funcdefs():
code = dedent('''
def normal(): ...
async def asyn(): ...
@dec
def dec_normal(): ...
@dec1
@dec2
async def dec_async(): ...
def broken
''')
module = parse(code, version='3.8')
func_names = [f.name.value for f in module.iter_funcdefs()]
assert func_names == ['normal', 'asyn', 'dec_normal', 'dec_async']

View File

@@ -29,13 +29,17 @@ def _invalid_syntax(code, version=None, **kwargs):
print(module.children)
def test_formfeed(each_py2_version):
s = u"""print 1\n\x0Cprint 2\n"""
t = _parse(s, each_py2_version)
assert t.children[0].children[0].type == 'print_stmt'
assert t.children[1].children[0].type == 'print_stmt'
s = u"""1\n\x0C\x0C2\n"""
t = _parse(s, each_py2_version)
def test_formfeed(each_version):
s = u"foo\n\x0c\nfoo\n"
t = _parse(s, each_version)
assert t.children[0].children[0].type == 'name'
assert t.children[1].children[0].type == 'name'
s = u"1\n\x0c\x0c\n2\n"
t = _parse(s, each_version)
with pytest.raises(ParserSyntaxError):
s = u"\n\x0c2\n"
_parse(s, each_version)
def test_matrix_multiplication_operator(works_ge_py35):

View File

@@ -37,7 +37,7 @@ def test_python_exception_matches(code):
error, = errors
actual = error.message
assert actual in wanted
# Somehow in Python3.3 the SyntaxError().lineno is sometimes None
# Somehow in Python2.7 the SyntaxError().lineno is sometimes None
assert line_nr is None or line_nr == error.start_pos[0]
@@ -118,22 +118,12 @@ def _get_actual_exception(code):
assert False, "The piece of code should raise an exception."
# SyntaxError
# Python 2.6 has a bit different error messages here, so skip it.
if sys.version_info[:2] == (2, 6) and wanted == 'SyntaxError: unexpected EOF while parsing':
wanted = 'SyntaxError: invalid syntax'
if wanted == 'SyntaxError: non-keyword arg after keyword arg':
# The python 3.5+ way, a bit nicer.
wanted = 'SyntaxError: positional argument follows keyword argument'
elif wanted == 'SyntaxError: assignment to keyword':
return [wanted, "SyntaxError: can't assign to keyword",
'SyntaxError: cannot assign to __debug__'], line_nr
elif wanted == 'SyntaxError: assignment to None':
# Python 2.6 does has a slightly different error.
wanted = 'SyntaxError: cannot assign to None'
elif wanted == 'SyntaxError: can not assign to __debug__':
# Python 2.6 does has a slightly different error.
wanted = 'SyntaxError: cannot assign to __debug__'
elif wanted == 'SyntaxError: can use starred expression only as assignment target':
# Python 3.4/3.4 have a bit of a different warning than 3.5/3.6 in
# certain places. But in others this error makes sense.
@@ -331,4 +321,3 @@ def test_invalid_fstrings(code, message):
def test_trailing_comma(code):
errors = _get_error_list(code)
assert not errors

View File

@@ -5,7 +5,6 @@ from textwrap import dedent
import pytest
from parso._compatibility import py_version
from parso.utils import split_lines, parse_version_string
from parso.python.token import PythonTokenTypes
from parso.python import tokenize
@@ -137,7 +136,7 @@ def test_identifier_contains_unicode():
''')
token_list = _get_token_list(fundef)
unicode_token = token_list[1]
if py_version >= 30:
if sys.version_info.major >= 3:
assert unicode_token[0] == NAME
else:
# Unicode tokens in Python 2 seem to be identified as operators.
@@ -185,19 +184,19 @@ def test_ur_literals():
assert typ == NAME
check('u""')
check('ur""', is_literal=not py_version >= 30)
check('Ur""', is_literal=not py_version >= 30)
check('UR""', is_literal=not py_version >= 30)
check('ur""', is_literal=not sys.version_info.major >= 3)
check('Ur""', is_literal=not sys.version_info.major >= 3)
check('UR""', is_literal=not sys.version_info.major >= 3)
check('bR""')
# Starting with Python 3.3 this ordering is also possible.
if py_version >= 33:
if sys.version_info.major >= 3:
check('Rb""')
# Starting with Python 3.6 format strings where introduced.
check('fr""', is_literal=py_version >= 36)
check('rF""', is_literal=py_version >= 36)
check('f""', is_literal=py_version >= 36)
check('F""', is_literal=py_version >= 36)
check('fr""', is_literal=sys.version_info >= (3, 6))
check('rF""', is_literal=sys.version_info >= (3, 6))
check('f""', is_literal=sys.version_info >= (3, 6))
check('F""', is_literal=sys.version_info >= (3, 6))
def test_error_literal():
@@ -239,7 +238,7 @@ xfail_py2 = dict(marks=[pytest.mark.xfail(sys.version_info[0] == 2, reason='Pyth
(' foo', [INDENT, NAME, DEDENT]),
(' foo\n bar', [INDENT, NAME, NEWLINE, ERROR_DEDENT, NAME, DEDENT]),
(' foo\n bar \n baz', [INDENT, NAME, NEWLINE, ERROR_DEDENT, NAME,
NEWLINE, ERROR_DEDENT, NAME, DEDENT]),
NEWLINE, NAME, DEDENT]),
(' foo\nbar', [INDENT, NAME, NEWLINE, DEDENT, NAME]),
# Name stuff
@@ -250,6 +249,21 @@ xfail_py2 = dict(marks=[pytest.mark.xfail(sys.version_info[0] == 2, reason='Pyth
pytest.param(u'²', [ERRORTOKEN], **xfail_py2),
pytest.param(u'ä²ö', [NAME, ERRORTOKEN, NAME], **xfail_py2),
pytest.param(u'ää²¹öö', [NAME, ERRORTOKEN, NAME], **xfail_py2),
(' \x00a', [INDENT, ERRORTOKEN, NAME, DEDENT]),
(dedent('''\
class BaseCache:
a
def
b
def
c
'''), [NAME, NAME, OP, NEWLINE, INDENT, NAME, NEWLINE,
ERROR_DEDENT, NAME, NEWLINE, INDENT, NAME, NEWLINE, DEDENT,
NAME, NEWLINE, INDENT, NAME, NEWLINE, DEDENT, DEDENT]),
(' )\n foo', [INDENT, OP, NEWLINE, ERROR_DEDENT, NAME, DEDENT]),
('a\n b\n )\n c', [NAME, NEWLINE, INDENT, NAME, NEWLINE, INDENT, OP,
NEWLINE, DEDENT, NAME, DEDENT]),
(' 1 \\\ndef', [INDENT, NUMBER, NAME, DEDENT]),
]
)
def test_token_types(code, types):
@@ -258,7 +272,7 @@ def test_token_types(code, types):
def test_error_string():
t1, newline, endmarker = _get_token_list(' "\n')
indent, t1, newline, token, endmarker = _get_token_list(' "\n')
assert t1.type == ERRORTOKEN
assert t1.prefix == ' '
assert t1.string == '"'
@@ -319,16 +333,18 @@ def test_brackets_no_indentation():
def test_form_feed():
error_token, endmarker = _get_token_list(dedent('''\
indent, error_token, dedent_, endmarker = _get_token_list(dedent('''\
\f"""'''))
assert error_token.prefix == '\f'
assert error_token.string == '"""'
assert endmarker.prefix == ''
assert indent.type == INDENT
assert dedent_.type == DEDENT
def test_carriage_return():
lst = _get_token_list(' =\\\rclass')
assert [t.type for t in lst] == [INDENT, OP, DEDENT, NAME, ENDMARKER]
assert [t.type for t in lst] == [INDENT, OP, NAME, DEDENT, ENDMARKER]
def test_backslash():
@@ -339,6 +355,7 @@ def test_backslash():
@pytest.mark.parametrize(
('code', 'types'), [
# f-strings
('f"', [FSTRING_START]),
('f""', [FSTRING_START, FSTRING_END]),
('f" {}"', [FSTRING_START, FSTRING_STRING, OP, OP, FSTRING_END]),
@@ -394,7 +411,7 @@ def test_backslash():
]),
]
)
def test_fstring(code, types, version_ge_py36):
def test_fstring_token_types(code, types, version_ge_py36):
actual_types = [t.type for t in _get_token_list(code, version_ge_py36)]
assert types + [ENDMARKER] == actual_types
@@ -414,3 +431,13 @@ def test_fstring(code, types, version_ge_py36):
def test_fstring_assignment_expression(code, types, version_ge_py38):
actual_types = [t.type for t in _get_token_list(code, version_ge_py38)]
assert types + [ENDMARKER] == actual_types
def test_fstring_end_error_pos(version_ge_py38):
f_start, f_string, bracket, f_end, endmarker = \
_get_token_list('f" { "', version_ge_py38)
assert f_start.start_pos == (1, 0)
assert f_string.start_pos == (1, 2)
assert bracket.start_pos == (1, 3)
assert f_end.start_pos == (1, 5)
assert endmarker.start_pos == (1, 6)

View File

@@ -1,11 +1,9 @@
[tox]
envlist = {py26,py27,py33,py34,py35,py36,py37,py38}
envlist = {py27,py34,py35,py36,py37,py38}
[testenv]
extras = testing
deps =
py26,py33: pytest>=3.0.7,<3.3
py27,py34: pytest<3.3
py26,py33: setuptools<37
coverage: coverage
setenv =
# https://github.com/tomchristie/django-rest-framework/issues/1957