Make pgen2's grammar ambiguity detection handle more cases

Under the old implementation,

```
outer: A [inner] B C
inner: B C [inner]
```

wouldn't get detected as the ambiguous grammar that it is, whereas

```
outer: A rest
rest: [inner] B C
inner: B C [inner]
```

would.

This would manifest itself as non-determinism in the DFA state
generation. See the discussion #62 on for a full explanation.

This modifies the ambiguity detection to work on a broader class of
issues, so it should now hopefully detect all cases where the given
grammar is ambiguous.

At some point, we could extend this logic to allow developers to
optionally set precedence of grammar productions, which could resolve
ambiguities, but that's not a strict requirement for parsing python.
This commit is contained in:
Benjamin Woodruff
2019-05-07 16:56:29 -04:00
committed by Dave Halter
parent c0ace63a69
commit 0032bae041
2 changed files with 60 additions and 12 deletions

View File

@@ -293,11 +293,40 @@ def test_left_recursion():
def test_ambiguities():
with pytest.raises(ValueError, match='ambiguous'):
with pytest.raises(
ValueError,
match=r"foo is ambiguous.*given a TokenType\(NAME\).*bar or baz"
):
generate_grammar('foo: bar | baz\nbar: NAME\nbaz: NAME\n', tokenize.PythonTokenTypes)
with pytest.raises(ValueError, match='ambiguous'):
with pytest.raises(
ValueError,
match=r"foo is ambiguous.*given a ReservedString\(x\).*bar or baz"
):
generate_grammar('''foo: bar | baz\nbar: 'x'\nbaz: "x"\n''', tokenize.PythonTokenTypes)
with pytest.raises(ValueError, match='ambiguous'):
with pytest.raises(
ValueError,
match=r"foo is ambiguous.*given a ReservedString\(x\).*bar or foo"
):
generate_grammar('''foo: bar | 'x'\nbar: 'x'\n''', tokenize.PythonTokenTypes)
# an ambiguity with the second (not the first) child of a production
with pytest.raises(
ValueError,
match=r"outer is ambiguous.*given a ReservedString\(b\).*inner or outer"
):
generate_grammar(
'outer: "a" [inner] "b" "c"\ninner: "b" "c" [inner]\n',
tokenize.PythonTokenTypes,
)
# an ambiguity hidden by a level of indirection (middle)
with pytest.raises(
ValueError,
match=r"outer is ambiguous.*given a ReservedString\(b\).*middle or outer"
):
generate_grammar(
'outer: "a" [middle] "b" "c"\nmiddle: inner\ninner: "b" "c" [inner]\n',
tokenize.PythonTokenTypes,
)