Using syntactic predicates in Xtext, part 2
This blog is a continuation of the previous one about how to use syntactic predicates in Xtext. As promised, I’ll provide a few more examples, most of which come from the realm of GPL-like languages.
But first, a little summary is in order. As stated in the previous blog, a syntactic predicate is an annotation in an Xtext grammar which indicates to the ANTLR parser generator how a (potential) ambiguity should be resolved by picking the (first) one which is decorated with ‘=>‘. The annotation can be applied to:
- a(n individual) keyword (such as ‘else‘),
- a rule call (unassigned or as part of an assignment) and
- a grouped parse expression, i.e. a parse expression between parentheses.
One thing to keep in mind -not only for syntactic predicates but in general- that an Xtext grammar has at least three and often four responsibilities:
- defining the lexing behavior through definition and inclusion of terminals;
- defining the parsing behavior through parser rules which determine how tokens are matched and consumed;
- defining how the model is populated;
- (when not using an existing Ecore model) defining the meta model.
Syntactic predicates influence the second of these but not the others. It is, after all, a syntactic predicate, not a semantic one – which Xtext doesn’t have in any case. Just as without using syntactic predicates, parsing behavior is not influenced by how the model is populated: instead, it is governed solely by the types of the tokens it receives from the lexer. This is easily forgotten when you’re trying to write grammars with cross-references like this:
SomeParserRule: Alternative1 | Alternative2; Alternative1: ref1=[ReferencedType1|ID]; Alternative1: ref2=[ReferencedType2|ID];
In this case, the parser will always consume the ID token as part of Alternative1 even if its value is the (qualified) name of something of ReferencedType2. In fact, ANTLR will issue a warning about alternative 2 being unreachable so it is disabled. For a workaround this problem, see this older blog: it uses a slightly different use case as motivation but the details are the same. The only thing a syntactic predicate can do here is to explicitly favor one alternative over the other.
Some examples from Xbase
The Xtend and the Xbase languages that Xtext ships with both use plenty of syntactic predicates to avoid ambiguities in their grammars and to avoid having to use backtracking altogether. This already indicates that syntactic predicates are a necessary tool, especially when creating GPL-like or otherwise quite expressive DSLs. Note again that syntactic predicates are typically found near/inside optional parts of grammar rules since optionality automatically implies an alternative parsing route.
A good example can be found in the Xbase grammar in the form of the XReturnExpression rule: see GitHub. It uses a syntactic predicate on an assignment to force the optional XExpression following the ‘return‘ keyword to be parsed as part of the XReturnExpression rather than being an XExpression all on its own – which would have totally different semantics, but could be a viable interpretation considering Xtend doesn’t require separating/ending semi-colons.
The Xbase grammar also shows that syntactic predicates are an effective way to disambiguate the use of pairs of parentheses for denoting a list of arguments to a function call from that for grouping inside an expression: once again, see GitHub – here, the syntactic predicate applies to a grouped parse expression, i.e. everything between the parentheses pair starting just before the ‘=>‘.
Even if you don’t (have to) use syntactic predicates yourself, it’s important to know of their existence. As an example, the other day I was prototyping a DSL which used the JvmTypeReference type rule from Xbase followed by an angled bracket pair (‘<‘, ‘>’) which held ID tokens functioning as cross-references. I was momentarily surprised to see parse errors arise in my example along the lines of “Couldn't resolve reference to JvmType 'administrator'.” The stuff between the angled brackets was being interpreted as a generic type parameter!
It turns out that the JvmTypeReference parser rule uses a syntactic predicate on an angled bracket pair surrounding generic type parameters. This explains both the behavior and the lack of warnings by ANTLR about grammar ambiguities. You’d probably have a hard time figuring out this behavior before finding an innocuous ‘=>‘ here. In the end, I changed “my” angled brackets to square brackets to resolve this. This shows that syntactic predicates, just like backtracking, can be a double-edged sword: it can solve some of your problems but you have to really know how it works to be able to understand what’s going on.
I hope that this was useful for you: please let me know whether it is! I’m not planning on a third installment but you never know: a particular enticing use case might just do the trick.