Research Ideas

December 17, 2009

I’m planning Project February: What Aran Does When His Master’s Is Done. I’ve got a startup on standby, and interviews with companies large and small.

One possibility to stay in academia for a PhD. If I were to do this, I would need a research direction. This would then influence who I’d court as a supervisor (Greg is leaving for industry) and therefore where I’d go to school.

I could stay at the University of Toronto (I’m already in. I love it here. Path of least resistance.) or look elsewhere and aim for a later start date. Washington is on the up and up. Stanford is the de facto Silicon Valley university. It would be a good fit for my entrepreneurial ambitions. And I follow the work of researchers at many other UK and American universities.

Of course, there’s no guarantee I could get in to these places. I have no connections, no reputation, no GRE scores and currently no publication record.

In any case, the first step is to figure out if I want to do more research. Since I’m brainstorming anyway, I figure I’d post my thoughts.

Dataflow languages

Dataflow languages don’t just separate what from how, as declarative languages do, they separate what and how from when. Spreadsheets are dataflow, and there are a couple dataflow languages and libraries, but I don’t feel dataflow programming as a general-purpose paradigm has gotten the attention it deserves. Dataflow becomes especially interesting when the execution is distributed and operations are expensive, since there are all sorts of wonderful scheduling problems and optimization opportunities.

Performance Types

Traditional separation of interface from implementation deals only with functionality. But sometimes, we want to specify performance characteristics in an interface. Take real-time programming or large-scale programming for example. What’s a good language for describing performance specifications? Can we encode it into a type system?

A/B User Interfaces

Modern web user interfaces are optimized on-the-fly. Can we make better tools for UI designers to manage hundreds of variations of a UI?

Algorithmic Information Theory and Machine Learning

Algorithmic Information Theory gives us a quantitative way of thinking about intelligence. Can we learn new things by bringing this view to bear on machine learning? Example research project: Define a programming language with boolean strings as values, do learning by highly-optimized program generation.


I find it easy to come up with ideas in Empirical Software Engineering, so there’s a disproportionate number of these below.

Tool-aware development

We’re all familiar with software quality attributes like performance, readability, security and so on. And we’re aware of good coding practices like mnemonic variable names and short functions. There’s a new quality attribute in town, though, and as far as I know, nobody’s talking about it yet.

Modern software engineers use a variety of tools, such as version control systems. In the presence of these tools, certain constructs become worse.

For example, in C, you might see a construct like

#define some_flag_1 0x0001
#define some_flag_2 0x0002
#define some_flag_3 0x0004

Nothing wrong with this on the surface, but in the presence of concurrent development, version control and automerges, this code is a hotspot for a merge error if two programmers add flags.

As another example,

y = f(x);
if(y < z)

is more friendly to someone stepping through a debugger than

if(f(x) < z)

because most debuggers only let you step through statements, not expressions.

One last example: On some teams, everyone uses the same tools. On others, everyone uses different tools. Some coding practices are annoying to deal with in certain development environments. Homogeneous tool environments call for different practices than heterogeneous environments. I believe research is warranted into the ways our coding practices should change based on the development context.

Tool-aware languages

How do you design a programming language differently when you know the developer won’t just have a text editor, but will have a debugger, static checker, syntax highlighter, unit test suite, doctests, version control system, code reviews and more?

Software Development Practice Interactions

Nobody uses code reviews or unit tests or pair programming or whatever in isolation. And not all bugs are equally important. Which practices find which bugs? And how do they overlap?

Empirical Bug Prioritization

Modern software is shipped rife with known bugs. Project managers prioritize bugs and optimize for “good-enough” quality in a largely ad-hoc way. Can we offer something better than “estimated number of affected users * expected annoyance” for prioritization? Are there unpublished best practices to be discovered and optimized?

Technique Drills for Programmers

Technique drills are used in just about every other skilled human activity. Why not programming?

Bug introduction in detail

How far can we push Ko’s framework on error introduction? Can we do a better job of putting programmers under the microscope and really zoom in on what happens when bugs are introduced?

Syntax Pragmatics

I’m pretty sure from my work on DSLs that syntax matters. Can I drop another level of abstraction and find out why? This would be an investigation of the connections between semiotics, programming languages and information visualization.

A Language for Empirical Programming Language Design

If we want to learn about the usability of programming language features, we’d like to be able to turn features on and off. Most languages are designed to be as good as possible for real-world use. What about a language with dozens of syntactic and semantic knobs just for experimenting on features in isolation?


2 Responses to “Research Ideas”

  1. Ian Says:

    There are some really good ideas there – the Machine Learning one sounds very interesting. The tool aware development sounds very practical – it certainly applies to the development we’re doing … and trying to convince release management that you’re making a change for maintainability can be a challenge.

    Another idea I would put forward for you, is efficient scheduling algorithms for these new CPUs that have multi-cores, turbo boost, and hyper threading. Some of what is being done today seems counter to what *real* programs/systems might want … i.e. when there is less going on (fewer cores active), the processor kicks into turbo boost mode … so it can get 1 task done really fast, or many tasks done at a moderate pace. Similarly, scheduling lower priority work on virtual cpus and stuff like that. I have no idea what kinds of research (if any) is being one in this seemingly emerging area.

    …sounds like you’ll have more research to do before Project February can be completed!

  2. George Says:

    I think your AIT & ML direction is a bad choice for you to pursue for a PhD.

    On a somewhat unrelated note, it seems that if anyone has been strip-mining information theory for insights it has been machine learning researchers.

    Algorithmic information theory sometimes strikes me as a silly name since there are so many things in the field that aren’t computable.

    I recommend “the industry”, but maybe that is only because the other ideas you listed don’t really excite me that much, so if they excite you obviously you should go for them.

Comments are closed.

%d bloggers like this: