Home > DSLs > Some musings on DSL design

Some musings on DSL design

I’d like to say a few things language design and the DSLs I encounter “in the wild”. (Yes, a non-technical, non-Xtext-specific blog again: sorry ’bout that.)

First of all, about 90-95% of all the DSLs I encounter, e.g. on the Eclipse TMF Xtext forum, either deal with entity models (about 70% of the time) or are intrinsically “vertical” in nature. With “vertical” I mean that a certain aspect of the solution space (or implementation) is addressed, rather than an aspect coming from the problem (or business domain) space. Examples of non-entity vertical DSLs would be screen descriptions and (Spring) configurations. Of course, the entity model usually is a low-hanging fruit in a(ny) typical application development project to enjoy the benefits of formal modeling, so that doesn’t surprise me at all.

What does surprise me, however, is that:

  1. people are not actually creating more business-readable DSLs;
  2. people apparently are sticking to a C-like syntax a fairly large percentage of the time!

With C-like syntax I particularly mean: the use of semicolons (‘;’) to “separate” or “close” statements, and the use of commas to separate items in a list. DSLs often are a lot less complex than C or Java and have fewer degrees of freedom than full-blown GPLs, so it ought to be straightforward (for the parser) to determine where a construct starts and ends. In fact, I’ve never encountered a situation where I even could have used a token to separate or close constructs to get my (Xtext) grammar definition to work.

And, in the end, syntactically really complex languages like Groovy and Scala also manage to do without separating/closing ;’s. So, why bother your DSL users with having to type an extra character? There’s a reason that modern IDE’s like Eclipse have a feature to type that semicolon automatically for you. Maybe you want to avoid someone (possibly/especially yourself!) having to write that feature for your language?

It’s the same for the comma as a list item separator: lists typically are already delimited at the start and the end (“[...]“) so it’s completely clear for the parser that we’re in a list context. Inside lists, it’s even more unlikely that you need a closing token at all, if only because I’ve never seen lists with an ending comma. One big advantage of not having item separators is that you can easily shuffle list items around without having to take care of the commas as well. (This also saves some effort when generating DSL texts.)

This preference might also go a little way in explaining my first point of surprise, since DSLs with a C-syntax flavor tend to evoke reactions like “…but I don’t want to have to do anything with programming!” So, folks: could we please refrain from using unnecessary statement and list item separator tokens in our DSLs? *pretty please with sugar on top*

Advertisements
Categories: DSLs
  1. August 30, 2010 at 12:02 pm

    Nice length! (People understanding your text will make a better reply, I suppose)

  2. September 21, 2010 at 6:54 pm

    Nice post, I agree with you… I think that DSL’s can be used for greater things than just “shortcuts” for writing code or configuration files.
    On my free time, I’m trying to setup a DSL supporting a software analysis framework, mainly based on expressing use case’s scenarios and high leveel business rules (see http://www.virage-it.be/2010/05/mix-textual-and-graphical-languages-to-improve-formalism-of-software-analysis/).
    Just after reading your post, I have checked my grammar… no CSV or semicolon ;o)

    • September 21, 2010 at 7:00 pm

      @Laurent: nice blog post and DSL on UC scenario’s! 🙂 We’ll bridge that chasm with all “non-techies” one day 😉

  3. Tjerk
    October 25, 2010 at 11:26 pm

    In my opinion commas make lists more readible but it of course depends on the complexity of your grammar.

    • October 26, 2010 at 6:30 am

      I’d say: the less complex the grammar, the less readability you gain with commas. Or the other way around: if your list items are hard to tell apart, you might need something more than mere commas. Maybe pipes or newlines? My point is: having commas as a list separator seems to be a knee-jerk reaction of sorts which might be aligned very well with your domain (stakeholders), or quite the opposite… Important thing is to think about it and validate your assumptions on readability and usability with the domain stakeholders, whenever possible.

  4. Tjerk
    October 26, 2010 at 7:35 am

    Yes i agree with that. But your textual grammar is just syntactical sugar, its all about the metamodel. You could even write multiple grammars for a specific metamodel… probably not smart todo. But a visual concrete syntax would possibly be easier to use for domain experts. However it all depends on a lot of factors.

    Nice articles by the way!

  5. October 26, 2010 at 7:43 am

    Thanks for the compliment!

    Yes, in the end “it all depends” 🙂 Textual grammar may be syntactical sugar but it may be very important tactical sugar, even for domain experts -certainly if the domain is something technical like “Web development”.

    I catch myself fiddling around with list orders (e.g., method/function arguments, enum literals, etc.) a lot, because I think a good, clear ordering can help in clarifying (domain) code. Not all of the languages I do that in (e.g., Xtext-related ones) have Refactoring support for that, so this action tends to take up some time.

  6. Jan
    October 26, 2010 at 7:47 am

    I fully agree with you that C-style syntax is bloating a DSL, but there are a couple of things to keep in mind, especially for new DSL users/designers.

    From my experience, to lower the barrier for the users which are sceptical towards DLSs is essential. A well-known syntax might make them feel more comfortable right form the beginning.

    Adding a semicolon to terminate a statement as well as curly braces and commas can help to avoid ambiguities in a language. I have seen quite a few customers turning on backtracking for their Xtext languages just to get rid of Antlr’s ambiguity warnings, later on complaining about bad parser performance.

    And, as a last point, people seem to naturally understand the notion of containment if you use curly braces.

    So, to sum it up, you’re absolutely right that you don’t gain the complete benefit of DSLs with such a syntax, but you might make it easier for newbees.

  7. October 26, 2010 at 12:29 pm

    An interesting corollary: most of the graphical Domain-Specific Modeling languages in the Eclipse GMF newsgroup seem to be very simple, often poor copies of existing languages, e.g. state diagrams or some kind of entity+attribute or entity+containment language. In contrast, many more of the languages that we see MetaEdit+ users making are what I would consider bona fide new Domain-Specific Modeling languages, where the concepts are taken from the problem domain.

    I suppose that partly this just reflects the different composition of the communities: the Eclipse posters are mainly academics, whereas MetaEdit+ users are mainly from industry. Mind you, I’ve seen some nice DSM languages from academic MetaEdit+ users too, and fruitful cooperation between industry and academia.

    I wonder if Jan’s points about newbies are mostly considering the situation where the DSL newbies are programmers? I’m not sure the population at large finds curly braces and semicolons at the end of statements “natural”. The main point is true, though: the best language in the world is no good if you can’t persuade people to use it.

    • October 26, 2010 at 12:47 pm

      I think you should substitute “academics” in your reasoning with “techy people who think they can think for other domains”, which demystifies the existence of nice DSLs made by “real academic” as well 😉

      As always, it depends on the domain. I’m guessing Jan uses “newbies” mainly to indicate people who have worked with GPLs but not with DSLs as such and or the DSL designers providing for such users.

      @Jan: I think that using semicolons just to get some performance on the parser, is a smell 😉 Groovy/Scala can do without semicolons so I’d still be hard-pressed to believe you actually need semicolons to disambiguate the language. Of course, not everyone is willing or able to pour time in a (customized) lexer/parser (combination) just to get both the concerns and exactly the grammar they want.

      Btw: I’m quite fond of curly braces, myself 😉

  8. October 26, 2010 at 1:24 pm

    Hi Meinte,

    > With “vertical” I mean that a certain aspect of the solution space (or implementation) is addressed, rather than an aspect coming from the problem (or business domain) space.

    In my opinion vertical means targeting a problem domain, horizontal means targeting a solution space aspect. Horizontal points at the fact that you stay in the solution domain and focus on different aspects. Vertical means you are moving to a much higher abstraction level, i.e. the problem space.

    > So, folks: could we please refrain from using unnecessary statement and list item separator tokens in our DSLs? *pretty please with sugar on top*

    I don’t think you can make this statement in general. Creating a good DSL means selecting your target user and thinking about the methodology (how are the DSLs used). This will lead to design choices regarding the abstract and concrete syntax. See also: http://www.theenterprisearchitect.eu/archive/2010/09/06/15-lessons-learned-during-the-development-of-a-model-driven-software-factory

    • October 26, 2010 at 1:30 pm

      Horizontal and vertical tend to switch around in my head, from time to time, so please don’t pin me down on that 😉

      @2nd item: In the end “it depends”, obviously. I just wanted to draw attention to something which I perceive as a knee-jerk reaction by DSL designers steeped in C-style syntax.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: