Readings in DSLs, Maintenance and Misc.

May 12, 2009

Warning: Many links are PDF.

ANTLR: A predicated-LL(k) parser generator.

An introduction to ANTLR. Pragprog is explicitly positioning ANTLR to be a tool of choice for the coming wave of mainstream DSL creation.

One thing I like about Lex and Yacc is that, given the Dragon book and a couple weeks, I could recreate most of their functionality. So if I wanted a Yacc written in Javascript or Ruby, I could make one. The principles are clearly laid out. I’d like to see a thorough exposition of the techniques and ideas underlying ANTLR.

SWYN: A visual representation for regular expressions.
SWYN is a system for constructing regular expressions by example.

Anecdotal claim of difficulty of comprehending regular expressions from a Cognitive Dimensions perspective. Mentions: Users have no clear mental model of the expression evaluator. Experiment comparing several alternative representations of regular expressions. Notation has a large effect on performance.

The perils and pitfalls of mining SourceForge

A useful collection of pragmatic tips on how to mine a web site like SourceForge, as well as practical notes to consider about the validity of mining SourceForge. It is not representative of software projects, so the paper warns against making that claim.

Supporting the DSL spectrum

A review of DSL approaches including XML files configuration files, Monadic parsers, small compilers built with tools such as Lex and YACC, Excel and Access, a commercial Powerpoint-based system, and embedded languages.

Anecdotal claim of 50:1 program length improvement for a military DSL project.

Weaving a debugging aspect into domain-specific language grammars

Grammar-driven generation of domain-specific language debuggers

Debugging domain-specific languages in eclipse

I group these three papers together because they seem to be a project of the common authors. An approach to DSL construction in which debugging code is woven into the parser using aspect-oriented programming, or the parser is explicitly given instruments to facilitate debugging. An Eclipse plugin consumes the debugger info to allow DSL-level debugging. Toy example of a procedural DSL for demonstration. No empirical evaluation.

Using this same framework, these researchers also generate unit testing tools and they’re working on profilers.

Domain-specific languages in software development–and the relation to partial evaluation

Early chapters of his PhD thesis include a thorough and balanced review of the field of DSLs.

Call for empirical studies on DSLs.

Little Languages: Little Maintenance?

A high-level description of a DSL project for financial engineering with extensive discussion of the thoughts that went into it as well as the tools used for implementation. Intuitive arguments that DSLs should increase maintainability along dimensions from previous literature. So, a claim that DSLs improve maintainability.

Call for empirical data collection:

“Collection of empirical data concerning maintenance costs in systems built using domain-specific languages, following, e.g., the maintenance metrics as proposed by Grady (1987).”

Domain-specific languages: An annotated bibliography

A comprehensive survey of the field of DSLs. Only two mentions of maintainability. (I note these two papers later below.) Their lab seems to be maintaining the survey, so there is a more recent unpublished version online.

The next 700 programming languages

This is an CACM paper from 1966.

Landin proposes a family of languages built on a common foundation of an Abstract Syntax Tree (and underlying semantics and abstract machine) with different resultant syntaxes. It includes a mention of the desire to customize programming languages to problem domains. As such it might be among the first published mentions of the idea of DSLs.

The discussion section is actually a transcript of a discussion between computer scientists! In it they debate, among other things:

  • Semantic indentation (We are still arguing about this over Python!)
  • Declarative languages, their utility and role, and whether to embed them in more general languages
  • Tail recursion before it was called tail recursion
  • Function currying before it was called currying
  • The difficulties caused by state for program transformation
  • The frustration of implicitly specifying sequencing in imperative languages. Matrix multiplication as an inherently parallelizable problem they cannot parallelize!

A software engineering experiment in software component generation

“During the post-experiment debriefing, however, the subjects’ answers to questions about ease of error location were not consistent with this data. Subjectively, only two of the four subjects stated that it was easier to locate errors in [the DSL]. In particular, the [DSL Compiler]’s error messages were poor, which impeded location of syntactic errors. For the [general purpose language] version, the error messages were good and the use of the Ada debugger allowed tracing of problems.”

Despite a small sample of four developers, the authors empirically demonstrate a high-confidence 3x productivity improvement and 2x reliability improvement for DSL use versus a general purpose language equivalent. Really nice exposition.

“For the [DSL], a minor extension to the [DSL language] was sufficient to encompass the out-of-scope problem. However, this extension could not be provided by the subjects themselves; it required the expertise of the OGI research team who designed and implemented the [DSL language].

Extension of the [GPL equivalent] was partially successful in handling the nine problems encountered. Extensibility was gained through the Ada programming ability of the subjects that allowed them to produce new code templates. However, in the majority of instances, extensibility problems were handled not by writing new templates, but by finding workarounds or accepting partial solutions.”

KHEPERA: A system for rapid implementation of domain specific languages

Khepera is a source-to-source compiler construction framework that carefully tracks AST node provenance through transformations, to allow DSL-level debugging of the end-product. Disadvantage: Must use Khepera to make your DSL!

Measuring and managing software maintenance

A fluffy high-level view of HP’s metrics and software quality improvement programs as of 1987.

It turns out it is far easier to type ‘—‘ and a newline than it is to post a whole new blog post. There is a reason for lightweight blogging applications! It is shocking that even the delightful WordPress is too heavyweight for some tasks.


3 Responses to “Readings in DSLs, Maintenance and Misc.”

  1. Neil Says:

    Thanks for the summaries. I’ve found Fowler’s book-in-progress pretty useful (short on maintenance details) —, and there’s this InfoQ video:

  2. Terence Parr Says:

    Concerning ANTLR, you said “One thing I like about Lex and Yacc is that, given the Dragon book and a couple weeks, I could recreate most of their functionality.” Really? Surely somebody would have done that if it only takes that long to compete. I don’t even think you could type fast enough let alone think about reproducing all of the goodies in ANTLR. There’s like 100,000 lines of code there. I’m sure this is just bravado from lack of experience with ANTLR and such, but I figured that I would respond, anyway.

  3. aran Says:

    @Terence: I think we’re in agreement. ANTLR is a complex, rich piece of software whereas the the core of Lex and Yacc are much simpler.

Comments are closed.

%d bloggers like this: