The Xtext grammar language and its invisible infix operator
The nice thing about the Xtext framework is that it eats its own dog food: its grammar definition language (also called Xtext of course: the framework suffers slightly from reuse) is created using Xtext -there’s a wondrous principle called bootstrapping at work here 😉
Anyway, this means you can simply inspect the .xtext file and the other Java customizations directly in the plugin in case you don’t believe the User Guide or find that it’s lacking. One such case (at least for me) was the construction of unordered groups and why the following grammar fragment didn’t produce the result I expected:
Entity: transient?='transient'? & abstract?='abstract'? 'entity' name=ID;
The result I expected was to parse things like “transient entity Foo” and “abstract entity Bar”. It does that, but it doesn’t accept “abstract transient entity Foo” while it does accept “abstract entity Bar transient“, which certainly checked out with what I intended!
After looking into the grammar, I was a little surprised to find that the part of the language for defining parser rules actually uses an expression language for everything between the ‘:’ and ‘;’. (I shouldn’t have been, of course, since there’s an explicit reference to EBNF expressions and you can use parentheses to group what we now can call in all certainty, sub expressions.)
This expression language sports the following infix operators, ordered in increasing precedence:
- ‘|’ for the regular alternative operator,
- ‘&’ for the unordered alternative operator and
- concatenation of “abstract tokens”, meaning assignments, keywords, rule calls, actions and everything parenthesized, not separated by anything other than (optional!) whitespace.
Item 3 says that token concatenation is an invisible infix operator…spooky! So,
transient?='transient'? & abstract?='abstract'? 'entity' name=ID
actually means the same as
transient?='transient'? & ( abstract?='abstract'? 'entity' name=ID )
because the invisible token concatenation operator has higher precedence than the unordered group operator. This explains why “abstract entity Bar transient” is accepted (and yields the same AST as “transient abstract entity Bar”) but “abstract transient entity Foo” not (the abstract keyword is separated from the entity keyword). The fix is easy enough: just enclose the entire unordered group in parentheses, just like you would do a group of alternatives.
I also noticed that (lone) keywords and rule calls can have cardinality postfixes as well, at least syntax/grammar-wise -I haven’t checked what happens at and after generation and whether semantics are what you’d intuitively expect. It’s certainly something I haven’t seen used in any grammar so far!