Using syntactic predicates in Xtext, part 2
This blog is a continuation of the previous one about how to use syntactic predicates in Xtext. As promised, I’ll provide a few more examples, most of which come from the realm of GPL-like languages.
But first, a little summary is in order. As stated in the previous blog, a syntactic predicate is an annotation in an Xtext grammar which indicates to the ANTLR parser generator how a (potential) ambiguity should be resolved by picking the (first) one which is decorated with ‘=>‘. The annotation can be applied to:
- a(n individual) keyword (such as ‘else‘),
- a rule call (unassigned or as part of an assignment) and
- a grouped parse expression, i.e. a parse expression between parentheses.
One thing to keep in mind -not only for syntactic predicates but in general- that an Xtext grammar has at least three and often four responsibilities:
- defining the lexing behavior through definition and inclusion of terminals;
- defining the parsing behavior through parser rules which determine how tokens are matched and consumed;
- defining how the model is populated;
- (when not using an existing Ecore model) defining the meta model.
Syntactic predicates influence the second of these but not the others. It is, after all, a syntactic predicate, not a semantic one – which Xtext doesn’t have in any case. Just as without using syntactic predicates, parsing behavior is not influenced by how the model is populated: instead, it is governed solely by the types of the tokens it receives from the lexer. This is easily forgotten when you’re trying to write grammars with cross-references like this:
SomeParserRule: Alternative1 | Alternative2; Alternative1: ref1=[ReferencedType1|ID]; Alternative1: ref2=[ReferencedType2|ID];
In this case, the parser will always consume the ID token as part of Alternative1 even if its value is the (qualified) name of something of ReferencedType2. In fact, ANTLR will issue a warning about alternative 2 being unreachable so it is disabled. For a workaround this problem, see this older blog: it uses a slightly different use case as motivation but the details are the same. The only thing a syntactic predicate can do here is to explicitly favor one alternative over the other.
Some examples from Xbase
The Xtend and the Xbase languages that Xtext ships with both use plenty of syntactic predicates to avoid ambiguities in their grammars and to avoid having to use backtracking altogether. This already indicates that syntactic predicates are a necessary tool, especially when creating GPL-like or otherwise quite expressive DSLs. Note again that syntactic predicates are typically found near/inside optional parts of grammar rules since optionality automatically implies an alternative parsing route.
A good example can be found in the Xbase grammar in the form of the XReturnExpression rule: see GitHub. It uses a syntactic predicate on an assignment to force the optional XExpression following the ‘return‘ keyword to be parsed as part of the XReturnExpression rather than being an XExpression all on its own – which would have totally different semantics, but could be a viable interpretation considering Xtend doesn’t require separating/ending semi-colons.
The Xbase grammar also shows that syntactic predicates are an effective way to disambiguate the use of pairs of parentheses for denoting a list of arguments to a function call from that for grouping inside an expression: once again, see GitHub – here, the syntactic predicate applies to a grouped parse expression, i.e. everything between the parentheses pair starting just before the ‘=>‘.
Unforeseen consequences
Even if you don’t (have to) use syntactic predicates yourself, it’s important to know of their existence. As an example, the other day I was prototyping a DSL which used the JvmTypeReference type rule from Xbase followed by an angled bracket pair (‘<‘, ‘>’) which held ID tokens functioning as cross-references. I was momentarily surprised to see parse errors arise in my example along the lines of “Couldn't resolve reference to JvmType 'administrator'.” The stuff between the angled brackets was being interpreted as a generic type parameter!
It turns out that the JvmTypeReference parser rule uses a syntactic predicate on an angled bracket pair surrounding generic type parameters. This explains both the behavior and the lack of warnings by ANTLR about grammar ambiguities. You’d probably have a hard time figuring out this behavior before finding an innocuous ‘=>‘ here. In the end, I changed “my” angled brackets to square brackets to resolve this. This shows that syntactic predicates, just like backtracking, can be a double-edged sword: it can solve some of your problems but you have to really know how it works to be able to understand what’s going on.
I hope that this was useful for you: please let me know whether it is! I’m not planning on a third installment but you never know: a particular enticing use case might just do the trick.
Hi, it´s difficult to me to understand all the text in english.
Do you have a translation to spanish or French?
My spanish is non-existent and my french sucks. You could try Google Translate. And please note that the DSLs here have nothing to do with data lines.
I am in the situation you are describing for function parameters and grouped expression:
Msg:
msgs+=msgNOP (‘,’ msgs+=msgNOP)*;
msgNOP:
‘(‘ msg=Msg ‘)
| call=FunctionCall
FunctionCall:
declFun=[Function] (parleft='(‘) args=Msg parright=’)’)?;
The question mark here is causing compiler warnings, so I would like to disambiguate the grammar with syntactic predicates. However, the link to the github is broken and I do not find anything with predicates on your general repository?
Could you please help or fix the link?
Hi Rémi,
The repo I referenced has indeed moved in the meanwhile. The corrected links are https://github.com/eclipse/xtext-extras/blob/master/org.eclipse.xtext.xbase/src/org/eclipse/xtext/xbase/Xbase.xtext (first 2 ones), and https://github.com/eclipse/xtext-extras/blob/master/org.eclipse.xtext.xbase/src/org/eclipse/xtext/xbase/Xtype.xtext
I haven’t been doing anything Xtext-related for a long while now, and a little too busy right now to retrieve from the secondary memory banks, so I’ll have to leave you dangling with that, I’m afraid. Just to be sure: what are the compiler warnings?
No problem, your link was helpful, I has the classical ‘non LL(*) Decision’ message, but this seems to do the trick:
msgNOP:
(=> ‘(‘ msg=Msg ‘)’)
| FunctionCall
| (=> variable=[Function])
FunctionCall:
=> (function=[Function] ‘(‘)
args=Msg ‘)’;
If I try to parse ( then couple ID+( and then the rest, I can have a valid evaluation. Thank you for your help!
Glad I turned out to be of help! 🙂
It seems you have a lot of syntactic predicates now. I suspect that only the one before the variable reference to a Function is really necessary. You might want to experiment with that a bit, as it can help with performance, and with avoiding to introduce other ambiguity problems further down the road.
Indeed, I will try to simplify it, thank you for your help.
I also have a tricky requirement for my grammar, which is reserved instance names. For example, I define:
Number:
name=NumID;
NumberReservedID:
’empty’;
NumID:
ID | NumberReservedID;
NumberRef:
ref=[Number|NumID]|NumberReservedID
I do not know if syntactic predicates are the adequate mechanism to use here, but I would like the parser to try to resolve ’empty’ as a reference, and then if it does not exist, try to parse it as a string.
As it is, I could not at the same time have the parser consider ’empty’ as a reference if it was declared, and authorize it as a keyword if it was not.
Did you ever encounter such a situation?
’empty’ is a keyword now, so the lexer wins over the parser. If you want to change that, you need to make a standard library: a model that’s automatically imported everywhere. This standard library should then define ’empty’ as something of an appropriate meta type. This is a good “trick” to make the grammar as small as possible, yet very flexible, even though it takes some more setup. I believe there are blogs by Sebastian Zarnekow explaining how to do this.