Archive for the ‘The How’ Category

Implementing existing DSLs with Xtext – a case study, part 0

November 3, 2011 5 comments

This is the first installment in a series of blog posts on how to implement a “typical” existing DSL using Xtext. It’s intended as an informal guide showing how I tend to go about implementing a DSL, solving often-occurring problems and making design decisions. I’m actually going to implement two DSLs: CSS (version 3, on average) and Less. These are technical DSLs rather than domain DSLs, but I’ve only got a finite amount of time per week so live with it, alright?

The case of existing DSLs is both an interesting and a common one: we don’t have to design the language which means we can focus purely on the implementation and don’t have to bother with everything that goes into designing a good language. On the other hand, we are often forced to deal with “features” which might be cumbersome to implement in the “parsing tech du jour”. In the green field case, you’d have the freedom to reconsider having that feature in the language at all, but not here.

For this case study, I chose Less because it appealed to me: I kind-of loath working with CSS directly. Although Less provides a way of getting feedback on syntactical and semantic validity through less.js and highlighting/language assist modules have been developed for various editors, I’d rather have a full-fledged editor for the Less language instead of a tedious “change-compile-fix type” development cycle. Also, it seemed a rather good showcase for Xtext outside of the now-popular JVM language envelope.

It became apparent early on that I’d also need a full implementation of CSS as well since Less extends CSS, which gives me a chance to show how to do language composition as well as grammar generation with Xtext. Other aspects which will be addressed, are: expression sub languagestype systemsevaluation and code generation using model-to-model transformation.

This post doesn’t have any technical details yet: sorry! You can follow my efforts on/through Github, which holds the sources as well as a downloadable ZIP with deployable Eclipse plug-in JARs.

Categories: The How, Xtext

To mock a…DSL?

October 3, 2011 Leave a comment

Even with all these language workbenches and DSL frameworks which make creating a DSL a lot easier than it used to be, it’s still not a trivial matter – hence, it takes a non-trivial effort to implement even a rough first draft of a DSL. This also means that it makes sense to get an idea of what your DSL should actually look like before you really start hacking away at the implementation. This is especially true for graphical DSLs which are generally harder to build than textual DSLs, so you can save a lot of wasted efforts by investing in a well-executed mockup phase.

Very rarely you’ll have an actual language specification upfront. The situation that you’re creating a completely new DSL is both much more common and also much more interesting: it is first and foremost an adventure into Domain Land in which you get to learn new people, see interesting things and discover hoards of hidden knowledge, often buried in caches of moldy Word or Excel documents. Often, a language-of-sorts is already around, but without a clearly-defined syntax, let alone a formal definition and without any tooling. It’s our job, as DSL builders, to bring order to this chaos.

Get to know the domain

That’s why the first item on the agenda is getting to know the domain, and the people living it, really well. Us geeks have a tendency to cut corners here: after all, we managed to ace the Compiler and Algorithm Analysis courses, so what could possible be difficult about, say, refrigerator configurations, right? But bear in mind that a DSL is, well, domain-specific so you’d better make sure it fits into the heads of the people which make up your domain and it better fit good, making them more productive even taking into account that they need to learn new tools and probably a new way of working as well.

If the fit is sub-optimal, they’re like to bin your efforts as soon as they’ve come across the first hurdle – which probably even means that you lost your foot in the door regarding model-driven/DSL solutions. (Another way is saying the same thing is that a domain is defined by the group of people which are considered part of it, not the other way around.)

Make mockups

Therefore, the second item on the agenda is coming up with a bunch of mockups. The intent of these mockups is (1) for your domain users to get an idea of their DSL by gauging what actual content would look like expressed in it, and (2) for you to gain feedback from that to validate and guide your DSL design and implementation. It’s important that you use actual content the domain users are really familiar with for these mockups: introducing a DSL tends to be seen as a disruptive innovation even by the most innovation-prone people (and we all know that organizations are rife with that kind…) so your domain users must be able to see where things are going for them.

You don’t achieve that by using something that is too generic/unspecific (e.g., of the ‘Hello world!’ type), too small (which probably means you’re not even beginning to touch corner cases where it really matters what the DSL looks like and how much expressivity it allows) or not broad enough (i.e., overly focusing on a particular aspect the DSL addresses instead of achieving a good spread).

It’s also good to have a few variants for the DSL. There are a lot of, often fairly continuously-valued parameters you can tweak in a DSL:

  1. Is the syntax overly verbose or on-the-verge-of-cryptic minimal?
  2. How is the language factored: one big language or several small ones, each addressing one specific aspect?
  3. How is the prose (i.e., content expressed in the DSL) modularized: one big model, or spread over multiple files?
  4. Is there a way to make content re-usable or to make abstractions?
  5. How is visibility (scoping) and naming (e.g., qualified names with local imports) organized?

Each of these parameters (and also the many more I didn’t list) determine how good the fit with the people in the domain and their way of working is going to be. Also, the eventual language design is going to influence the engineering and the complexity of the implementation directly. The mockup phase is as good a time as any to find out how much leeway you have in the design to lighten the load and optimize implementation efforts and proposing variants is a good way of exploring the space.

What to make mockups with

What do you use for mockups? Anything that allows you to approximate the eventual syntax of the language. For textual DSLs, you can use anything that allows you to mimic the syntax highlighting (i.e., font style and color – I consider highlighting part of the syntax). For graphical DSLs, anything besides Paint could possibly work. In both cases, you’d best pick something that both you and your domain users are already familiar with so that it is easy for everyone involved to contribute to the process by changing stuff and even coming up with their own mockups. Chances are OpenOffice or any of its commercial rivals provide you with more than enough.

Obviously, you’re going to miss out on the tooling aspect: a good editing experience (content assist, navigation of references, outlines, semantic search/compare, etc.) makes up a large part of any modern DSL. Keep in mind though that the DSL prose is going to be read much more often than it is going to be written (i.e., modified or created), and the use of mockups reflects that. Martin Fowler coined the term ‘business-readable DSLs’ because of this and the fact that domain users who are really able to write DSL prose seem to be relatively rare. In any case, you should try whether your domain users that they will be actually be able to create and modify prose, using only the proposed syntax and no tooling.


Having arrived at a good consensus on the DSL’s syntax and design, you can start hammering away at an implementation knowing that the first working draft should not come as a surprise to your domain users. In true Agile fashion, you should present and share working but incomplete versions of the DSL implementation as soon and often as possible and elicit feedback. This also countermands the traditional “MDSD as a magical black box”-thinking which is often present in the uninitiated.

To conclude: making mockups of your DSL before you start implementing is useful and saves you a lot of wasted implementation effort later on.

Categories: DSLs, The How

Annotation-based dispatch for scope providers

One of the slightly awkward aspects of Xtext is that the org.eclipse.xtext.scoping.impl.AbstractDeclarativeScopeProvider class essentially relies on a naming convention to dispatch scoping to the correct method. Such a strategy is quite brittle when you’re changing your grammar (or the Ecore meta model) as it doesn’t alert you to the fact that method names should change as well.

In an earlier blog, I already gave a tip to help deal with both this as well as with knowing which method to actually implement, but that doesn’t make this a statically-checked enterprise yet. The challenge here is to come up with a suitable compile-time Java representation of the references and types (or in grammar-terms: features and parser rules) involved, otherwise it wouldn’t be static checking, right? Unfortunately, the rather useful Java representation of references provided by the <MyDSL>PackageImpl class is a purely run-time representation.

Instead, I chose to make do with the IDs defined for types and their features in the <MyDSL>Package class instead to come up with an implementation of an annotation-based strategy for scope provider specification. It turns out that this is rather easy to do by extending the AbstractDeclarativeScopeProvider class. I’ve pushed my implementation -aptly called AbstractAnnotationBasedDeclarativeScopeProvider (which would probably score a triple word value in a game of Scrabble)- to my open-source GitHub repository: the source and the plugin JAR.


Usage is quite simple (as it should be): have your scope provider class extend  AbstractAnnotationBasedDeclarativeScopeProvider and add either a ScopeForType or ScopeForReference annotation (both conveniently contained in theAbstractAnnotationBasedDeclarativeScopeProvider class) to the scoping methods. The ScopeForType annotation takes the class(ifier) ID of the EClass to scope for. The ScopeForReference annotation also takes the feature ID of the reference (contained by the EClass specified) to scope for. Both these IDs are found in the <MyDSL>Package class as simple int constants. Note that it’s not checked whether these IDs actually belong together (in the second case) as the <MyDSL>Package class doesn’t actually encode that information.

As long as you use the <MyDSL>Package class to obtain IDs, this notation is checked at compile-time so that when something changes, chances are good that the annotation breaks and you’re alerted to the fact you have to change something.

As an example, consider the DomainmodelScopeProvider scope implementation class for the Domain Model example project shipped with Xtext 1.0.x: have that class extend AbstractAnnotationBasedDeclarativeScopeProvider and add the following annotation to scope_Reference_opposite method to use the annotation-based strategy.

@ScopeForReference(classId=DomainmodelPackage.REFERENCE, featureId=DomainmodelPackage.REFERENCE__OPPOSITE)

The current implementation is nice enough to tell you (as a warning in the log4j log) that it’s encountered a method which seems to comply to the naming-based strategy but nevertheless refuses to call the method to avoid having mixed-strategy behavior. I might change the behavior in the future to remove this nicety or to make it configurable (and non-default), though.

Design decisions

I could have used a Class<? extends EObject> instance to specify the EClass (or rather, the Java type of the instances of that). The example given above would then look as follows.

@ScopeForReference(class=Reference.class, featureId=DomainmodelPackage.REFERENCE__OPPOSITE)

However, you still need the right feature ID to specify the EReference so I chose to stick to using two IDs which clearly belong together as it communicates a little clearer. I also thought it’s better to use standard Ecore infrastructure throughout and not to rely on the particular way Ecore maps to actual Java classes.

Let me know…

…what you think of this approach! Is it useful? Is the notation clear? Do you have a preference for the use of the Class<? extends EObject>-style? Be sure to drop me line.

Categories: The How, Xtext

Checklist for Xtext DSL implementations

April 8, 2011 1 comment

Currently I’m in the US, working with a team that’s building a number of DSLs with Xtext and have been doing that for some time already. The interesting thing is that this team is quite proficient at doing that and tackling all sorts of gnarly problems (coming either from a legacy language which they have to emulate to some level or from requirements coming from the end users), even though most of them have only been working with Xtext for a few months. However, during the past week I realized that I unconsciously use a collection of properties which I check/look for in Xtext DSLs and since I use it unconsciously I wasn’t really aware of the fact that not everyone was using the same thing. In effect, the team had already run into problems which they had solved either completely or partly in places which were downstream from the root causes of the problem. The root causes generally resided at the level of the grammar or scope provider implementation and would (for the most part) have been covered by my unconscious checklist. Had the team had my checklist, they’d probably saved both time and headaches.

Since existing sources (i.e., the Xtext User Guide and, e.g., Markus Völter’s “MD* Best Practices” paper) are either reference-typed or quite general and somewhat hard to easily map to the daily Xtext practice, I figured I’d better make this list explicit. I divvied the checklist up in three sections: one concerning the Generate<MyDsl>.mwe2 file, one concerning the grammar file and one concerning the Java artifacts which augment the grammar.

Generator workflow

  1. Do the EPackages imported in the grammar file correspond 1:1 with the referencedGenModels in the workflow?
  2. Do you know/understand what the configured fragments (especially pertaining to naming, scoping, validation) provide out-of-the-box?
  3. Is backtracking set to false (default) in the options configuration for the XtextAntlrGeneratorFragment? I find that backtracking is rarely needed and unless it is, enabling backtracking introduces quite a performance hit and. More importantly, it might hide ambiguities (i.e., they don’t get reported during the generation phase) in the grammar at a point you didn’t need the backtracking for anyway.

To expand a little on the second item, here’s a list of the most important choices you’ve got:

  • naming: exporting.SimpleNamesFragment versus exporting.QualifiedNamesFragment
  • scoping: scoping.ImportURIScopingFragment versus scoping.ImportNamespacesScopingFragment
  • validation.JavaValidatorFragment has two composedChecks by default: ImportUriValidator which validates importURI occurrences (only useful in case you’ve configured the ImportURIGlobalScopeProvider in the runtime module, either manually or by using ImportURIScopingFragment), and NamesAreUniqueValidator (which checks whether all objects exported from the current Resource have unique qualified names).


  1. Any left-recursion? This should be pretty obvious since Xtext generator breaks anyway and leaves the DSL projects in an unbuildable state.
  2. No ambiguities (red error messages coming from the ANTLR parser generator)? Ambiguities generally either come from ambiguities at the token level (e.g., having a choice ‘|’ which consume the same token type) or overlapping terminal rules (somewhat rarer since creating new terminal rules and/or changing existing ones is fortunately not that common).
  3. Does the grammar provide semantics which are not syntactical in nature? Generally: grammar is for syntax, the rest (scope provision, validation, name provision, etc.) is for semantics.
  4. Did you document the grammar by documenting the semantics of each of the rules, also specifying aspects such as naming, scoping, validation, formatting, etc. (unfortunately, in comment-form only)? Since the grammar is the starting point of the DSL implementation, it’s usually best to put as much info in there as possible…
  5. Did you add a {Foo} unassigned action to the rules which do not necessarily assign to the type? (Saves you from unexpected NullPointerExceptions.)

To expand on the second item pertaining to ambiguities:

  • Most ambiguities of the first kind are introduced by incorrect setup of an expression sub language. Make sure you use the pattern(s) described in Sven‘s and two of my blog posts.
  • Favor recursive over linear structures in the context of references into recursive structures. This makes implementing the scope provider all the more easier (or even: possible). For a example of this: see this blog post.

Java artifacts

First some checks which pertain to implementation of the custom local scope provider:

  1. Are you using the “narrow” form (signature: IScope scope_<Type>_<Reference>(<ContextType> context, EReference ref), where Reference is a feature of Type) as much as possible?
  2. Are you using the “wide” form (signature: IScope scope_<Type>(<ContextType> context, EReference ref)) where it makes sense?
  3. Have you chosen the ContentType (see previous item) to be convenient so you don’t need to travel up the containment hierarchy?

For the rest of the Java artifacts:

  1. Is your custom IQualifiedNameProvider implementation bound in the runtime module?
  2. Does the bound IQualifiedNameProvider implementation compute a qualified name for the model root? (Important in case you’re using the org.eclipse.xtext.mwe.Reader class.)
  3. Have you implemented value converters (see §5.7) for all the data type rules in the grammar?
  4. Have you bound the value converter class in the runtime module?
Categories: DSLs, The How, Xtext

Deploying plugins using ANT

April 3, 2011 3 comments

Whenever you’re developing DSLs in the form of Eclipse plugins, you’ll have to come up with a means of deploying these plugins to the Eclipse’s of your language’s users. At several times, I’ve used a simple ANT script to do just that:

<?xml version="1.0"?>
<project name="DSL deployment" default="deploy">

	<target name="check" unless="eclipse.home">
		<echo message="Property {eclipse.home} not set (run this ANT script inside the same JRE as Eclipse)." />

	<target name="deploy" depends="check" if="eclipse.home">
  	<echo message="Deploying plugin JARs to Eclipse installation (${eclipse.home})" />
    <copy todir="${eclipse.home}/dropins">
      <fileset dir="${basedir}/lib/plugins" />
  	<echo message="Please restart Eclipse to activate/update the plugins." />


You simply put all the plugins to be deployed in the lib/ directory of the containing Eclipse project and run the ANT script inside the same JRE as Eclipse, using the settings on the JRE panel of the ANT Run Configuration. The script checks for this and will signal if it’s not ran in the proper way. After deployment, you’ll have to (have) Eclipse restarted to activate the plugins. The plugins are placed in the dropins directory rather than the plugins directory which allows you to easily distinguish them from “regular” plugins.

This setup has the advantage that you can have Eclipse export the plugins to the lib/ directory of the containing Eclipse project, by pointing the Export wizard to that directory on the Destination tab. In case of errors, the file gets dumped in lib/ directory as well.

Categories: DSLs, The How

Open-sourcing some DSLs

March 31, 2011 1 comment

In the past two months, I’ve re-visited several technical domains I’ve dealt with in the past and implemented DSLs for these (in Xtext 1.0.y) which might be somewhat generically useful. I’m open sourcing these on Github under the MIT license, in the hope that these may prove useful to folks as examples what you can do with DSLs in general and Xtext in particular and how you can achieve that.

Feel free to pull the repo and play around with it. Be sure to drop me a line in case you found it useful (or not, in which case: why?). Do consider though that I’m providing this as-is and that I don’t intend to provide anything remotely resembling serious support on it – unless you’re willing to hire me, of course 😉

Categories: DSLs, The How, Xtext

More on pre- and postfix operators in Xtext

March 24, 2011 3 comments

In the previous blog, I entirely glossed over the aspect of associativity of operators. I turns out that the unitary minus operator we constructed is right-associative and the signum operator is non-associative. In general, prefix operators can either be right-associative, meaning the operations are grouped from right to left, or non-associative, meaning that the operator can’t bind with operators of the same precedence (including itself). Postfix operators can either be left-associative (exemplified by “x++–“) or non-associative.

We already ascertained the right-associativity of the unitary minus by means of the last line in the unit test in the previous post: if unitary minus were non-associative, then “- -2” would trigger a parse error and not be equal to 2. We can check the non-associativity of the signum operator with the following test (still as part of the CalculatorTest class), which asserts that parsing “-2s s” yields a parse error:

    public void test_associativity_of_signum() throws Exception {
		getResourceAndExpect(new StringInputStream("module test  -2s s;"), 1);

(I’ll explain about the extra space in the expression string “-2s s” in a minute.)

It’s not difficult to make our prefix operator non-associative: we simply call the grammar rule UnitaryMinus call the rule for the next precedence level instead of calling itself.

UnitaryMinus returns Expression:
    Signum | ({UnitaryMinus} '-' expr=Signum);

We can test this by trying to parse “–2” and expecting a (1) parse error.

Making the postfix operator left-associative is equally simple: we replace the ?-cardinality with a *-cardinality.

Signum returns Expression:
    PrimaryExpression ({Signum.expr=current} 's')*;

We can test this by checking whether “2s s” is parsed (and equals 1).
(The following “obvious” solution “obviously” introduces left-recursion:

Signum returns Expression:
    Signum ({Signum.expr=current} 's')?;

As usual with implementing left-recursive grammars with LL parser technology, this is circumvented with tree writing, in the case of Xtext using actions.)

The only problem with this grammar is that it doesn’t parse something like “-2ss”. The reason for that is that the sub string “ss” is lexed as an ID token instead of a sequence of ‘s’ terminals. By introducing an extra hidden token (whitespace or comments) in between, we force the lexer to produce a sequence of ‘s’ terminals, but that means we’re bothering the DSLs users with it. A better solution would be to choose a character for the postfix operator that doesn’t clash with the ID terminal rule (the σ character would’ve been a rather obvious choice, in this case) or rewrite the ID terminal rule to not match the regexp /s+/.

Categories: DSLs, The How, Xtext

Pre- and postfix operators in Xtext

March 21, 2011 4 comments

While re-visiting some Xtext DSLs I made earlier, I came across one which some ambiguities (reported by ANTLR) due to the use of a prefix operator in an expressions sub language. Fixing that stymied me enough to warrant this blog. (It’s actually a sort of addendum to Sven Efftinge’s excellent blog on expression parsing. I’ll do a blog later on to recap everything.)

For the sake of minimizing effort for all parties involved, we’ll extend the Arithmetics example that’s shipped with Xtext with a unitary minus as prefix operator and the signum function (which returns -1, 0, or 1 depending on whether the operand is negative, zero or positive) as postfix operator. (I chose the signum function because it’s an instance method of the BigDecimal class which is used for the interpreter.)

If you’ve had a good, hard look at the grammar of the Arithmetics example or Sven’s blog, you might have realized that the patterns there amount exactly to implementing a classical recursive-descent parser, right in Xtext’s grammar definition language. The rules of the expression language form a (strictly-ordered) sequence, each of which start of calling the next rule in the sequence before trying to match anything else (and doing a tree rewrite with the result of the rule call). The net effect is that the sequence order equals the precedence order, with the first rule corresponding to the lowest precedence and the last rule in the sequence corresponding to the highest level, typically consisting of the ubiquitous parentheses and possibly other things like literals, variable references and such.

We’re going to extend that pattern to deal with pre- and postfix operators as well. The relevant section of the Arithmetics grammar consists of lines 46-52 of the Arithmetics.xtext file:

Multiplication returns Expression:
    PrimaryExpression (({Multi.left=current} '*' | {Div.left=current} '/') right=PrimaryExpression)*;

PrimaryExpression returns Expression:
    '(' Expression ')' |
    {NumberLiteral} value=NUMBER |
    {FunctionCall} func=[AbstractDefinition] ('(' args+=Expression (',' args+=Expression)* ')')?;

We’re going to add two rules, called UnitaryMinus and Signum, in between the Multiplication and PrimaryExpression rules, so we have to change the Multiplication rule:

Multiplication returns Expression:
    UnitaryMinus (({Multi.left=current} '*' | {Div.left=current} '/') right=UnitaryMinus)*;

Matching the unitary minus prefix operator is simple enough:

UnitaryMinus returns Expression:
    '-' expr=UnitaryMinus;

Since this rule always consumes at least one token, the ‘-‘ character, from the input, we can recursively call UnitaryMinus without causing left-recursion. The upshot of this is that ‘–37’ is parsed as -(-(37)). Unfortunately, the rule (as it is) would break the sequence of rules, so that we’d lose the higher levels of precedence altogether. To prevent that, we also call the next rule, Signum, as an alternative:

UnitaryMinus returns Expression:
    Signum | ({UnitaryMinus} '-' expr=UnitaryMinus);

(We need the {UnitaryMinus} action here to make sure the Xtext generator generates a corresponding type in the Ecore model which holds the parsed info.)

Implementing the postfix operator is a matter of calling the next rule (PrimaryExpression) and performing a tree rewrite in case the postfix operator can be matched:

Signum returns Expression:
    PrimaryExpression ({Signum.expr=current} 's')?;

This is all that’s required for the grammar. Now, we only have to fix the Calculator class and add a couple of test cases in the CalculatorTest class. The Calculator class uses the PolymorphicDispatcher class I wrote about earlier which means we just have to add methods corresponding to the new UnitaryMinus and Signum types:

	protected BigDecimal internalEvaluate(UnitaryMinus unitaryMinus, ImmutableMap<String,BigDecimal> values) {
		return evaluate(unitaryMinus.getExpr(),values).negate();
	protected BigDecimal internalEvaluate(Signum signum, ImmutableMap<String,BigDecimal> values) {
		return new BigDecimal( evaluate(signum.getExpr(),values).signum() );

We add a couple of test cases, specifically for the new operators:

    public void test_unitary_minus_and_signum() throws Exception {
        check(-1,	"1 + -2");	// == 1 + (-2)
        check(1,	"-1 + 2");	// == (-1) + 2, not -(1 + 2)
        check(-1,	"-3.7s");	// == -(3.7s) == -1
        check(0,	"1 + -7s");	// == 1 + -(7s) == 1 + -(1) == 0
        check(2,	"--2");		// == -(-2)

Now go out and add your pre- and postfix operators to your DSL today! 🙂

Tips and tricks for Xpand

January 17, 2011 5 comments

I’ve been using Xpand (in combination with Xtend) for some four years now and it still is my favorite (target language-agnostic) templating engine. As all programmers, I picked up a few habits which I’ll conveniently rebrand as “tips and tricks” 🙂

Separating templates from logic

Xpand is closely tied with Xtend: not only do they share an expression sub language, but they’re also each other’s perfect buddies in crime since Xtend (as a separate language) allows you to factor out re-used or intricate expressions into a separate file, so the Xpand template only needs to call (hopefully aptly-named) functions in a separate Xtend file. As a bonus, this approach allows you to use the cached keyword so that commonly used, but expensive expressions have to be evaluated only once as well as offering a more convenient way of documenting functionality (see also XtendTools).

Xtend libs

Xtext ships with a number of standard libraries which can be quite helpful, especially the io library. These libraries are all contained in the org.eclipse.xtend.util.stdlib plugin so be sure to add this one to the plugin dependencies (in the META-INF/ file). I’ll highlight a few of the libraries (in alphabetical order):

  • counter.ext: provides counters inside templates (if you really need these -most of the time you don’t and you simply didn’t think hard enough about it, yet ;));
  • crossref.extgetReferencingObjects(EObject eObject) compiles a list of all objects (in the current Resource) referencing eObject;
  • elementprops.ext: provides access to System properties;
  • io.ext: provides console logging;
  • naming.ext: provides common naming, such as computing a qualified name from the containment hierarchy;
  • properties.ext: provides access to properties injected into the workflow using the org.eclipse.xtend.util.stdlib.PropertiesReader workflow component;
  • uid.ext: provides common UID functionality/generation.

I’d advise anyone to go over these libraries by looking at the .ext files, checking out the (somewhat sparse) documentation and do some ‘finger exercises’ using them.

Polymorphic dispatch and the use of a sentinel

As most things in the Xtext universe, Xpand features polymorphic dispatch, i.e.: in case of an overloaded DEFINE block, Xpand decides which one to call based on the actual (runtime) type of the arguments. This is extremely useful but also presents a slight inconvenience in case of a type hierarchy (as opposed to a number of types having only (E)Object as common super type). As an example, consider the hierarchy where Apple and Orange are types having Fruit as common, direct super type. Now, when you write DEFINE blocks for Apple and Orange with common name ‘squeeze‘ and you call squeeze from someplace you’re dealing with Fruits, the Xpand editor will complain that it “Couldn’t find definition squeeze for type metamodel::Fruit”. The Xpand editor is not able to see that you’ve covered all essential cases, even if Fruit is defined as abstract. The only way to deal with this, is to write a third DEFINE block which acts as a sentinel to cover the Fruit case (har, har).

However, this sentinel can be put to good use by mixing in some fail-safe in case your meta model changes. E.g., if you add Banana to the type hierarchy of Fruit without writing a separate DEFINE block for it, the sentinel block will be executed but obviously it cannot produce meaningful output on its own. That’s why I tend to put in some error logging in the DEFINE block, like so:

«DEFINE squeeze FOR Type»
«( "no (specific) DEFINE block 'squeeze' defined for type " + ).error()»

Here, I use the error function from the io.ext library mentioned above which you include as follows:

«EXTENSION org::eclipse::xtend::util::stdlib::io»

Now, you’ll get a meaningful error message in the console when executing the template, indicating that Banana is missing a (specific) DEFINE block. (Of course, you could just as well factor the inner expression of the sentinel block out into a separate Xtend file.)

Readable template versus readable output

One of the choices you have when creating a template is whether to make the template readable exclusive-or to make the output readable, i.e. nicely formatted. My advice: choose the former (readable template), especially if there’s a beautifier for the output format. For a lot of the more common output formats, such a Java and XML, Xpand beautifiers have been written. Have a look here (navigate to the ‘PostProcessor’ section) to learn more about them and how to configure these. You might not be an instant fan of the beautification offered out of the box, but it generally beats the trouble of getting the template output nicely formatted. Moreover, this choice fits my earlier reasoning the meta software (such as code generators and templates) is software as well, so the readability argument is very much in scope here.

Getting the template output nicely formatted is frankly a pain in the buttocks, for a number of reasons: (1) Xpand constructs often seem to add whitespace unintentionally and (2) it’s hard to get indentation correct with “recursive content”. In fact, the Xpand constructs do not add whitespace: it’s just that everything between ‘»’ and ‘«’ (while inside a FILE statement of course) is part of the template text and ends up in the output. This means that something like

«FOREACH entities AS e»
entity «»;

will produce a newline in front of each entity in the output because of the newline after the first ‘»’. Xpand allows you to add a minus sign just before that ‘»’ which has the effect that the whole uninterrupted sequence of whitespace characters (i.e., everything the next non-whitespace character) is gobbled up so it doesn’t produce anything in the output. This might help a lot getting unwanted whitespace out of the way, but it certainly doesn’t help with producing the correct indentation so the ‘-‘ is a two-edged sword.

Again, in general I’d go for a readable template unless there are very pressing reasons (such as the lack of a beautifier) to make the output look nice as-is. In some cases, it might even be worthwhile to combine a readable template with a custom-made parser for the target language and a (separate) serializer/pretty-printer/beautifier for that target language. (In fact, this is how the XML beautifier arose as part of openArchitectureWare.) This adds a step to the pipeline, but it also adds some validation of the templates’ output. Xtext could very well be a perfect fit for this case, as you quite probably can even re-using the existing meta model.

Categories: MDSD, The How, Xpand

DSLs “versus” software engineering

January 13, 2011 8 comments

(New Year’s resolution: blog more frequently ;))

One aspect of using DSLs for software development which seems to be a bit underplayed (IMHO) is the role of good-ole software engineering, by which I happen to mean the practice of creating software through a systematic approach involving analysis, design, implementation and testing in a controlled and predictable manner. It seems to me there’s something of a sentiment or expectation that with DSLs you don’t have to do software engineering anymore, as if everything that makes creating software difficult somehow instantly disappears when using DSLs to model the software to as large a degree as is desirable and productive.

There are two main reasons for using DSLs:

  1. Empowering domain stakeholders (and other participants) by establishing an ubiquitous language for communication and providing them (and disciplines downstream) with dedicated tooling for that language;
  2. Separating the essential complexity (the what and why) from the incidental complexity (the how) by focusing the language on the former and hiding the latter “somewhere else”. (This also means that the software model does the same with less “code”.)

So, how does software engineering come into play then? Well, as I see it, there are two sides to this equation: (1) the DSL itself and the way it’s used and (2) the “meta software”, i.e. the software (parser, editor, tooling, etc.) which brings the DSL to life.

Engineering a DSL

To me, the fundamental value of software engineering is the set of concepts such as Separation of Concerns, Loose Coupling, Strong Coherence, (Unit) Testing etc., which allow us to create quality software. I simply define software quality somewhat non-conventially as “it works as required plus it can be productively changed in case the requirements change”. (It’s interesting to see that Kent Beck defines the concepts Loose Coupling and Strong Coherence in terms of how change spreads through a code base.) A DSL can easily be said to “work” if the two advantages mentioned above are realized: stakeholder empowerment and separating essential from incidental complexity -essentially another incarnation of Separation of Concerns. Unfortunately, it’s almost impossible to make this S.M.A.R.T. so you’ll have to rely on your craftsmanship here.

The most obvious aspect of DSL design which contributes directly to the change part of quality is: modularization. This aspect has two directions: (1) how do you distribute different aspects across parts of the language and (2) how can you cut up the entire domain into manageable pieces. Both of these directions benefit directly from the application of concepts like Separation of Concerns, Loose Coupling and Strong Coherence. As an example, consider a (vertical) DSL for Web application development which would typically address data, screen and interaction modeling: Do you create a separate DSL for the data model? Can you divide that up as well? Do you separate screens and interaction/flow? Do you take the use case as unit of granularity? Etcetera… The answers to all these questions are “It depends…” and you’ll have to address these time and time again for each situation or change to that.

But software engineering on the DSL side doesn’t stop there: the DSL instance, i.e., the model that’s built using the DSL must be viewed as software as well -after all, it’s the center piece for the software creation. E.g., as soon as you can modularize your model, you’ll have to think about how to divide the DSL instance into several pieces. Questions which come into play here: How large should each piece be? What kind of inter-piece dependencies do I want to allow or minimize? (This already depends on how modularization is realized on the language level.) How does the versioning system you use affect these decisions? Again, Separation of Concerns, Loose Coupling and Strong Coherence are key concepts to keep in mind here.

You also might want to think about a way to (unit) test the instance. A famous example is business rules: it is very valuable to be able to test the execution of such rules in different scenarios to validate that the results are what your domain stakeholders are expecting. How you code such tests depend (as ever) on the situation: sometimes it’s better to code them against an the business rules’ execution engine (which is something different than testing that execution engine itself!), sometimes you enhance the DSL (or probably better: create a separate DSL) for this purpose.

Engineering the meta software

By “meta software” I mean all the software which is involved with the DSL and which is not directly contributing to the realization of requirements/business value. This ranges from parser definitions (when using a Parser Generator), parser implementations, model validators and model importers to code generators or model interpreters/execution engines. It’s important to realize that this software in the traditional sense as well -not just a bunch of “utility scripts” lying around. In fact, your meta software has to be at least as good as “regular” software since it typically has a large impact on the rest of the software development because the model does more in the same amount of “code”. Among other things, this warrants that you create automated tests for all components, that these tests are part of the continuous integration (automated build) and that everything is checked in into a versioning system. It also warrants that you design the meta software really well, e.g. with an eye towards Separation of Concerns, both inside components as well as across components. It’s advisable to choose a set of tools/frameworks which integrate well and offer as much IDE and compile-time support as possible, to make meta software development as productive, error- and care-free as possible. (I once did a project in which an UML model was fed directly into a combination of a dynamic language and StringTemplate: in the end you get used to it, but it’s painful all the way…)

If the DSL changes, then it might happen that the current model (DSL instance) breaks and must be repaired -depending on the amount of breakage you could do that manually or expend the effort to devise an automated migration facility. This also means that it should be clear at all times which version of the DSL a model requires: it’s usually best to explicitly version both the DSL as well as the model but you might be able get away with an incremental push from a DSL development branch to the modeling branch. In the latter case, you really need to make sure that you tag the meta software together with releases of the software in order to be able to go back in history reliably.

Using a DSL has a kind of a two- (or even multi-) step nature: the DSL instance is parsed first and then fed to either a code generator or a model interpreter. So, if you change the DSL you’ll probably have to change the second step as well. In fact, changes typically “flow backwards”: changes to or enhancement of the code generator or model interpreter often require a change to the DSL as well, e.g. in the form of a new language construct. This phenomenon is sometimes called coupled evolution. Again, having automated tests for all individual components/steps are really necessary to avoid running into problems. In case you use a model interpreter, it’s usually quite easy to write (unit) tests against that.

Change “versus” code generation

In case you have a code generator things are typically more difficult because (1) the generation often is target language-agnostic since most template engines (such as StringTemplate and Xpand) simply spit out “plain text” and (2) usually there’s non-generated code (either framework code or code behind the Generation Gap) as well which has to be integrated with the generated code. I’ve found that in these cases it’s extremely useful to use a relatively small reference application for the meta software discipline. Such a target application would consists of a smallish DSL instance/model which does, however, touch all language constructs in the DSL (especially the new ones) and a fair amount of combinations of those and non-generated code consisting of framework code, hand-crafted code behind the Generation Gap and –mui importante– unit tests to validate the reference application. As always, the process of generating from the reference model, the building of the reference application from generated + non-generated code and running of the unit tests should be automated to be able to verify the correctness of the meta software reliably and quickly.

Something which is very useful in this scenario, is traceability, i.e. being able to see where generated code came from. In particular, which template produced a particular piece of code and what model element was the primary input for that template. Realizing a modest but already quite useful form of traceability is to generate comments with tracing information along with the real code. This is reminiscent of logging, also because care must be taken that the tracing information is succinct without overly “littering” the actual code.

Wrapping up

I hope I’ve been able to make the case that “old-fashioned” software engineering has a definite merit in the DSL scenario. Of course, there’s a lot more to say on this topic. Personally, I’m hoping that the upcoming book on DSL Engineering by Eelco Visser and Markus Voelter treats this really well.