Archive for December, 2009

Language Monoliths

December 30, 2009

“Write programs that do one thing and do it well.”

The Little Computer Scientist appreciates simple design. To design systems, design teams of cooperating specialists. To keep warm, wear a coat on your body, scarf on your neck, boots on your feet, and mittens on your hands. Don’t wear a seamless fullbody wintersuit—it will be hard to go pee.

Computer people mostly get it. We write special-purpose programs and break them down into special purpose pieces. We try not to make mittens that spontaneously unravel when you change your hat.

We get it, that is, until we build our programming languages.

Programmers have gotten used to the idea that choosing a programming language is a big important decision. Why? Because you can’t just choose a syntax expressing a few core abstractions. For each language, only a handful of editors are reasonable. There’s a standard library to mentally index. There’s an incumbent community of developers, and they have a philosophy of software development. There are idioms to learn and implicit “you’re a bad programmer unless you do it this way” rules. There’s a generally accepted tao of quality assurance. There’s a small council of key thought leaders and They have RSS feeds. There’s a preferred build tool, a deployment tool, and the non-standard-library standard library, like CPAN or Ruby Gems or PyPI.

Now, there are many valid forces that push things to be this way. The people who go with the grain aren’t dumb.  But we should recognize the problem and ensuing opportunities here. Monoliths are bad. Programming languages are intrinsically monolithic. Working to oppose monolithic forces is good.

Many projects do it right. XML, JSON, Protocol Buffers and Avro tackle interlanguage data representation: programmers should use them for making their data objects. Tools like CouchDB with language-agnostic HTTP interfaces are winners. CoffeeScript puts a new syntax on JavaScript: A worthy experiment. Scala and Closure compile to the JVM: There’s little excuse for language designers building VMs any more.

The Little Computer Scientist hopes to see more work breaking languages down. Make them smaller, less monolithic, more flexible. Compose them of special-purpose pieces. Expose the AST as a protobuf. Give a meta-circular interpreter and little compiler so that building correct new implementations is a day’s work. Isolate the syntax as much as possible so it’s easy to make correct highlighting editors, pretty printers and style-checkers.

Decouple, specialize, cooperate. One thing done well is a good thing.

The Death of JavaScript

December 24, 2009

Jash Kenas released CoffeeScript today, a programming language that compiles to human-readable JavaScript.

It addresses a real need, exemplified by this image:

JavaScript was designed quickly and extended by dumb committees. It’s a messy language with a potent core. CoffeeScript aims give programmers the core without the mess. It’s nice. It’s broken in many of the same ways as JavaScript, because it’s just syntactic sugar on JavaScript, and JavaScript’s problems run deeper than syntax. But it can grow to fix these.

CoffeeScript and languages like it are the reason why JavaScript is going to die. Steve Yegge thinks JavaScript is the next big language, poised to take over the programming world. I disagree.

CoffeeScript is a baby in the growing family of languages that compile to JavaScript. These include Scheme, Hop, Java (via GWT), Smalltalk (via Clamato) and Objective-J. These languages are good. A programming language is a programmer’s main tool—and is thus the most rewarding place for improvement.

Imagine you can quantify the productivity of programmer using JavaScript. Over time, everything’s getting better: Libraries, JavaScript VMs, tools like Google Closure. JavaScript programmers are getting better at writing more better faster code as time goes on. Hurray!

There is another productivity curve. This is the curve of compiles-to-JavaScript language users. In this world, you can increase your productivity by using better languages.

Here’s the question: Does the compiles-to-JavaScript curve intersect the raw JavaScript curve anytime soon? If not, Steve Yegge is right. JavaScript wins as long as web browsers are the dominant application platform. But if it does… if it has happened already, even… then JavaScript as a human-use language will die, relegated to being a target language for compilers, like assembly code.

Here’s a brainstorm of some implications.

  • The ECMAScript committee should stop making up features to make JavaScript easier on programmers, and start changing features to make JavaScript more of a good target language.
  • The V8, SpiderMonkey, Carakan and other JavaScript implementations should start benchmarking on workloads that come out of code generators. (Actually, V8 already does this. They should do it more.)
  • The folks working on server-side JavaScript frameworks should just stop and go lie down, for the love of god.
  • Finally, future HTML standards should make allowances for JavaScript to be replaced. Something along the lines of <script language="CoffeeScript" compiler="coffee-script.js">code</script> could mean: Run code if this browser understands CoffeeScript natively. Otherwise, load coffee-script.js, find the compile function inside, call compile(code), and eval the result as JavaScript. I say this not as a proposal of how the forward-compatibility should work (I can think of at least one other way) but to make the point that we should be talking about it.

Research Ideas

December 17, 2009

I’m planning Project February: What Aran Does When His Master’s Is Done. I’ve got a startup on standby, and interviews with companies large and small.

One possibility to stay in academia for a PhD. If I were to do this, I would need a research direction. This would then influence who I’d court as a supervisor (Greg is leaving for industry) and therefore where I’d go to school.

I could stay at the University of Toronto (I’m already in. I love it here. Path of least resistance.) or look elsewhere and aim for a later start date. Washington is on the up and up. Stanford is the de facto Silicon Valley university. It would be a good fit for my entrepreneurial ambitions. And I follow the work of researchers at many other UK and American universities.

Of course, there’s no guarantee I could get in to these places. I have no connections, no reputation, no GRE scores and currently no publication record.

In any case, the first step is to figure out if I want to do more research. Since I’m brainstorming anyway, I figure I’d post my thoughts.

Dataflow languages

Dataflow languages don’t just separate what from how, as declarative languages do, they separate what and how from when. Spreadsheets are dataflow, and there are a couple dataflow languages and libraries, but I don’t feel dataflow programming as a general-purpose paradigm has gotten the attention it deserves. Dataflow becomes especially interesting when the execution is distributed and operations are expensive, since there are all sorts of wonderful scheduling problems and optimization opportunities.

Performance Types

Traditional separation of interface from implementation deals only with functionality. But sometimes, we want to specify performance characteristics in an interface. Take real-time programming or large-scale programming for example. What’s a good language for describing performance specifications? Can we encode it into a type system?

A/B User Interfaces

Modern web user interfaces are optimized on-the-fly. Can we make better tools for UI designers to manage hundreds of variations of a UI?

Algorithmic Information Theory and Machine Learning

Algorithmic Information Theory gives us a quantitative way of thinking about intelligence. Can we learn new things by bringing this view to bear on machine learning? Example research project: Define a programming language with boolean strings as values, do learning by highly-optimized program generation.

ESE

I find it easy to come up with ideas in Empirical Software Engineering, so there’s a disproportionate number of these below.

Tool-aware development

We’re all familiar with software quality attributes like performance, readability, security and so on. And we’re aware of good coding practices like mnemonic variable names and short functions. There’s a new quality attribute in town, though, and as far as I know, nobody’s talking about it yet.

Modern software engineers use a variety of tools, such as version control systems. In the presence of these tools, certain constructs become worse.

For example, in C, you might see a construct like

#define some_flag_1 0x0001
#define some_flag_2 0x0002
#define some_flag_3 0x0004

Nothing wrong with this on the surface, but in the presence of concurrent development, version control and automerges, this code is a hotspot for a merge error if two programmers add flags.

As another example,

y = f(x);
if(y < z)

is more friendly to someone stepping through a debugger than

if(f(x) < z)

because most debuggers only let you step through statements, not expressions.

One last example: On some teams, everyone uses the same tools. On others, everyone uses different tools. Some coding practices are annoying to deal with in certain development environments. Homogeneous tool environments call for different practices than heterogeneous environments. I believe research is warranted into the ways our coding practices should change based on the development context.

Tool-aware languages

How do you design a programming language differently when you know the developer won’t just have a text editor, but will have a debugger, static checker, syntax highlighter, unit test suite, doctests, version control system, code reviews and more?

Software Development Practice Interactions

Nobody uses code reviews or unit tests or pair programming or whatever in isolation. And not all bugs are equally important. Which practices find which bugs? And how do they overlap?

Empirical Bug Prioritization

Modern software is shipped rife with known bugs. Project managers prioritize bugs and optimize for “good-enough” quality in a largely ad-hoc way. Can we offer something better than “estimated number of affected users * expected annoyance” for prioritization? Are there unpublished best practices to be discovered and optimized?

Technique Drills for Programmers

Technique drills are used in just about every other skilled human activity. Why not programming?

Bug introduction in detail

How far can we push Ko’s framework on error introduction? Can we do a better job of putting programmers under the microscope and really zoom in on what happens when bugs are introduced?

Syntax Pragmatics

I’m pretty sure from my work on DSLs that syntax matters. Can I drop another level of abstraction and find out why? This would be an investigation of the connections between semiotics, programming languages and information visualization.

A Language for Empirical Programming Language Design

If we want to learn about the usability of programming language features, we’d like to be able to turn features on and off. Most languages are designed to be as good as possible for real-world use. What about a language with dozens of syntactic and semantic knobs just for experimenting on features in isolation?