A pure-Java arithmetics parser
Well, it’s been over a month since the last post…I’ll try and post more often in the near future.
Anyway, one of the things I did in the past month was to write a parser for arithmetic expressions by hand. This was mainly inspired by Sven Efftinge’s blog and the workshop I gave which showed that explaining such a parser is actually quite hard. Sven’s movie which actually showed the call stack is rather more enlightening than any verbal account of the matter, so I started thinking about how to do the same thing but then interactively (and without the need to do a voice over which I don’t like to do since I hate hearing my own voice). In the end, I figured that Eclipse’s debugger already provides that functionality so a parser (and lexer) hand-written in Java should already do the trick.
The parser can be found alongside the Xtext workshop material I published earlier on Google Code. Remarks:
- It is a regular Eclipse project which doesn’t use anything beyond a standard Java JRE (>=5) and JUnit4.
- The project contains three parser classes: ParserWithLeftAssociativity, ParserWithSomeRightAssociativity and ParserWithSomeNonAssociativity -the correspondence (to Sven’s blog) should be obvious 😉
- The parsers share a common lexer implementation and the two latter parsers derive from the first one both for convenience and for sake of clarity -again, the correspondence should be obvious.
- I matched the parser implementation with the structure of the Xtext grammar definition and also (to a limited extent) with the structure of a generated Antlr-based parser. Comments refer to bits of the grammar which are being matched at that point in the code.
- Arithmetic expression can contain identifiers (matching /[a-zA-Z]+/) which are essentially defined on-the-fly. I added this to make the lexer slightly more interesting.
- I didn’t try and match the structure of the lexer to that of an Xtext-generated one. Instead, it’s rather free-form and doesn’t follow any particular lexer pattern I know (e.g., from the “Dragon book”): I’ve gone for a “whatever works and communicates” approach -I hope it communicates to you as well 😉
- Error detection, reporting and recovery is basic…at most. (The parser behaves slightly better than the lexer.)
- Unit tests are not too good (or actually not asserting anything for the lexer…).
To see the parser and its rule call stack, simply enable a breakpoint at Parser#ruleExpression and run a unit test in debug mode. Let me you know if you find bugs, have possible enhancements or think this is useful (or not!).