Archive for September, 2009

11 Proposals For A Better DCS

September 16, 2009

Incoming graduate chair Peter Marbach has been leading a series of “town hall” meetings to build support for a Department of Computer Science conference, organized by graduate students. The stated goals are a) to improve the quality of the graduate experience by promoting interaction between areas and b) to build the Department’s reputation in industry and academic circles by effectively communicating our successes.

I’m happy that Peter cares enough to try something big, and doubly happy that he is trying to listen to us and involve us in his work. But, while the goals are lofty, I don’t agree with the conference approach. I think that there are lightweight projects that would contribute more to interaction between areas, and I feel that hosting a conference is not an effective way to build a reputation for our work. I would also criticize the proposal for lacking specificity (especially budget, organizing team, schedule) and accountability (no simple success or failure metrics).

I hate to criticize an ambitious and well-meaning proposal without suggesting a replacement. Accordingly, I’ll list some specific proposals below, without taking credit for any of the ideas. Before I do, I’ll list a few of the community projects that already exist.

CSGSBS “big” events

The CSGSBS hosts large, general-interest events a few times a year, including an upcoming BBQ.

CSGSBS “little” events

The CSGSBS coordinates weekly events including cookie breaks, movie nights and pub nights. There are attended by a small percentage (~10%) of the grad student population, and most weeks attract the same crowd again and again.

Research-In-Action showcase

Annual conference hosted on campus with posters and demos from graduate students. A few industry and political guests attend. Typically heavily-weighted on demo-friendly DGP work.

Area talks

Most areas have weekly talks. Outsiders are welcome in theory, though an interested outsider would have to find the right mailing list and often the material is too advanced.

11 Proposals for a better DCS

1. Lightweight talks

1.a Area intro talk series

Each week, a different area gives the talk. At a casual, accessible level, the talk covers why the area is interesting, the most important fundamental closed and open problems in the area, key results, and how each student’s work in the area fits into the big picture. The speaker recommends a survey paper or introductory book for students interested in more. No posters, textual or mathematical slides or chalkboards allowed. Each proposed speaker is vetted by the organizer in advance to ensure a minimum public speaking ability.

1.b Monthly Mini-TED

Monthly casual 1.5 hour meeting in a lounge with coffee. Three speakers give TED-format talks on any beginner-accessible topic of intellectual interest not related to research. Given the variety of DCS member  interests, we might hear talks on sailing, design, typography, photography, quantum physics or hooping.

2. DCS Portfolio

2.a Technical reports database

For communication of early or small research, revive the technical reports database or provide a DCS-only externally-accessible site with ArXiv.org-like functionality.

2.b Publication portfolio

Provide an RSS feed with links to PDFs of all research published by members of the department.

2.c Entrepreneurial portfolio

Directory of links to companies founded by members of the DCS.

2.d Graduate success portfolio

Track the alumni network (undergrad and grad) and list the institutions they join after graduation.

2.e Blogroll

Directory and firehose of blogs written by DCS members. Michael Famelis has done good work in this area for software engineering. Extend it to the whole department. For bonus marks, provide a global WordPress installation on cs.toronto.edu for DCS member blogs.

2.f Software Portfolio

Directory of software written by members of the DCS. This can include side projects, not just research-related work. Subsection for links to software to which DCS members have contributed. For bonus marks, provide distributed version control hosting and virtual private servers for hosting web applications.

3. Activities

3.a Pick-up sports

Weekly pick-up soccer or ultimate games in the summer and fall.

4. Structural Changes

This is the controversial stuff.

4.a Common-space whiteboards

Move whiteboards out of private, locked areas into public spaces where they can foster discussion.

4.b Cooperative Education

This is the biggest proposal, with highest cost and greatest potential payoff.

For the undergraduate program, provide institutional support for four-month co-op terms after first and second years, and break the PEY into four consecutive four-month terms. For MSc students provide institutional support for corporate research internships between completion of class and start of full-time research. A common infrastructure would support both missions.

Why co-op?

Work terms create a feedback loop with in-class education. Real work motivates study and supports good study habits. Programming skills honed in the workplace support theoretical learning in the classroom or lab. Problems discovered in the workplace can motivate research directions.

Work terms create a feedback loop with companies. With more work terms, more companies see more students. Companies are eager to hire students who have worked for them before. Companies are willing to contribute resources to universities that are an important part of their HR pipeline.

Weakness of the PEY

I claim the following weaknesses in the PEY program:

  1. Students make long-term commitments on their first-ever round of interviews. They often have no professional experience, and thus are not yet qualified for advanced work. This restricts their placement to beginner work, but this commitment to beginner work lasts sixteen months instead of four.
  2. Some students just don’t fit with some companies. In other cases, companies abuse long-term students for menial tasks such as manual testing. Four-month terms salvage these cases, without restricting a student from returning to a particularly good employer.
  3. Students see only one corporate culture, team culture and set of engineering practices before graduation.
  4. Students only get one round of interview practice.

Each of these is addressed in the proposed model.

Addressing Common Objections to Co-op for the DCS

I have heard two common objections.

  1. Prohibitive cost: This program would be expensive to administer. Answer: Everyone wants something for nothing, but a program like this is an investment. Other universities successfully mitigate the costs of similar programs. We can do it too.
  2. Competition with Waterloo (and other schools): The implementation of such a program would position us as an alternative to Waterloo’s well-oiled machine, and would contradict our years of PEY marketing. Answer: We should neither hesitate to adopt a good idea just because someone else uses it, nor should we allow a desire for self-consistency to trump innovation.

Summary

A frank an open discussion of ideas might lead us to improvements in our graduate experience, provided we restrict ourselves to specific proposals with measurable outcomes.

Here’s where you come in

  1. Leave a comment in your real name, and rank the ideas you care about. “++” for “would actively support, defend and work for”, “+” is “support passively”, “-” is “oppose passively” and “–” is “would actively oppose, attack and work against.”  So if you hate co-op, dislike pick-up sports, but would fight to see whiteboards moved to hallways, write this: [3.a-, 4.a++, 4.b–].
  2. Go find someone else in the department, force them to read this through and make them think through their opinions on these issues.

Let’s send a message about how to make our department better.

Having an Awesome Time Doing Science

September 10, 2009

I noticed that an two-day old highlight on a paper looked richer than new highlights, even though the highlighter was the same.

The experiment: Highlight a block on a piece of paper. Cover half the block, leaving half exposed to the sunlight and evil fluorescent office lighting.

Controlling for: Effect of time on highlight colour.

Observation: Covered half is lighter than half exposed to light.

Conclusion: Lighting interacts with highlight colour.

Threat to validity: What if air flow is the real cause of highlight colour change?

Future research: Does it matter if the light is fluorescent or sunlight?

Intelligence from scratch

September 9, 2009

Sometimes I get bored at breakfast and start to derive the first principles of fields I know nothing about. I don’t feel bad about it—I could learn something new, but if I get something wrong or get stuck, no harm done. Today’s topic is intelligence.

What does it mean for a critter to be intelligent? I would start with, “an intelligent critter acts in ways that tend to cause outcomes it likes”.

I want to work on this definition a bit. First, I want to separate out that “likes” bit. I think knowing what “likes” means is a hard and worthy problem, but I’m not interested in it today. So I’ll say instead, “an intelligent critter acts in ways that tend to cause predicted outcomes.” If the critter can predict the outcomes of its actions, then it could choose actions with outcomes it liked.

Now, our critters need to know what’s going on in the world. We all make decisions based on a constant input stream of information from the world. Sights, sounds, smells and tastes and are just the beginning—we can also feel whether our arms are above our head, and whether that leftover chinese food is sitting properly, and whether that bad knee tells us a storm is coming. If we’re general with these “internal” senses, we might find it useful later to think about our “likes” as a part of that input stream. We might even go so far as to talk about remembering as if it was turning on an internal input stream that we control.

These input streams change all the time and they’re full of rich experience. If we wanted to build a robot critter, it would need to process megabytes and megabytes of data every instant. I’m happy to leave this problem for someone else, and treat our critter’s input as a big list of bits [0,1,0,0… ( millions more )…] that gets updated many times per second. Maybe the first bunch of bits say what color the world is in the top-left corner of your left eye, and the last bunch talk about whether you’re getting a massage behind your right shoulder-blade. It doesn’t really matter.

Real quick now, we apply the same principle to the critter’s output and treat is as a similar big vector of bits. Maybe turning on the first 100 bits means “raise your hand straight up.”

So we’ve got these big vectors of bits, and they change over time. Our little critter finds patterns in the input and predicts how actions on the output bits will change the input patterns. Then it chooses the actions that give it new inputs that it likes.

Let’s say our critter learns that some output bit always affects the same input bits in the next time step. That’s close to learning that a bunch of input bits are related to each other. So I’m not going to worry about finding patterns within one input vector. Instead I want to focus on finding patterns in the inputs as time changes. So, I’ll construct an idealized problem. Every time step, the critter gets to choose a one or zero as its output, and it gets as input a one or a zero. The critter is also allowed to remember its past inputs and outputs, because we’ve now broken the system of imagining memory retrieval as just another input.

What happens if the critter’s actions have no effect on its inputs? Well, the critter would feel very frustrated and have a low sense of self-efficacy. It might still be able to amuse itself by detecting a pattern in its input, though. We could call this problem “pattern recognition.”

What happens if the critter’s input has no pattern? First, that implies that the critter would be frustrated as above. An input with no pattern would be a world in which every input bit was a coin toss. No critter could do better than 50% for very long. This teaches us that it matters “how much” pattern there is in the input. Just to make something up for now, we’ll call that quantity Q and come back to it later.

With these cases in mind, let’s write down more precisely the problem of intelligence:

Given input [{0,1}, {0,1}, …, {0,1}] and output [{0,1}, {0,1}, …, {0,1}], n = |input| == |output|, give [input(n+1 | output(n) = 0), input(n+1 | output(n) = 1)]. That is, given a history of n input bits and n output bits, provide a prediction for input bit n+1 for both possibilities of output bit. (That is, the prediction has two bits.)

In the pattern recognition problem, the two bits of the prediction are always equal, because the output bit has no effect.

Over time, we can measure how well our critter does. Each turn, the critter chooses an output, and then we get to find out if its input prediction given this output was right. (It’s too bad, but we only ever get to test the possibilities the critter chooses.) If the critter always gets the right predictions, we would call it omniscient. If the critter’s predictions are completely random and unintelligent, it would get 50% right. (If it did worse than 50%, say getting 40% right, it could just flip its predictions to get 60% right.) So we can put an objective measure our critter’s perceived intelligence, which is how often its predictions are right, and the maximum limit of this intelligence depends on Q, “how much pattern” there is.

That was too much pseudo-math, so let’s take a walk in the park and look at some ants before talking about what “how much pattern” means.*

An ant can seem to behave intelligently. It leads other ants to food, it carries dead ants away from the ant hill, and fights off mean bugs like spiders. Of course, ants aren’t really intelligent. We find this out when we take an ant away from its environment and it can’t adapt.

The ant isn’t really learning patterns in its environment to act on them. The ant detects and acts on shallow indicators of patterns that its DNA already knows. Evolution executed a learning algorithm, partially evaluating it on the “pattern” of the ant’s environment. It was an engineering tradeoff, sacrificing adaptability to get a simplified design.

From the ant we learn that our critter can have some built-in knowledge, and it behaves just like built-in knowledge in other theories of communication.

Let’s return to discussing “how much pattern.” Kolmogorov teaches us that we can measure the information content of a string by the length of the smallest program that generates that string. There might be other measures we want to use, but I’m satisfied that there is at least one measure that gives us something to work with.

The critter’s input is its environment’s output. In our simplified world, we’ll say that the the universe is a program that feeds the critter its bits. The critter, wishing to learn the patterns in its input, needs to figure out the universe’s program.

For the sake of the argument, we’ll limit the universe’s program to be finite. If the universe’s program grows infinitely, the critter can never win, and that’s no fun.

Suddenly it makes sense to think of the (infinite) set of all possible finite programs. (Actually, the set of all possible programs that input and output one bit per time-step.) We’ll call it PROGRAMS, because that’s appropriately big and bold. The universe is a finite program, so it’s an element of PROGRAMS. The critter’s learning algorithm is a program, so it’s an element of PROGRAMS. Finally, the critter’s learning algorithm has the unenviable task of searching PROGRAMS for the universe’s program.

Let’s give every program in PROGRAMS a number. I don’t know how to do this, but I’m happy to pretend that I can for now. If you’re feeling really generous, we’ll eliminate programs that don’t terminate with an output each turn, because they’re no fun. Then one possibility for the critter’s learning algorithm looks something like:

program_num = 0;
/* Find a simple model of the universe 
   that's consistent with our memory */
while(!consistent(PROGRAMS[program_num], 
                  input_so_far, 
                  output_so_far)) 
{
    program_num = program_num + 1;
}
/* Generate predictions of what the 
   universe will do next */
/* If I output 0, what will the universe do? */
prediction[0] = PROGRAMS[program_num](output_so_far + 0)
/* If I output 1, what will the universe do? */
prediction[1] = PROGRAMS[program_num](output_so_far + 1)
/* Emotions and feelings go here */
return better_for_me_of(prediction[0], prediction[1])

I hope you get the idea.

This critter is actually very greedy and short-sighted. Another critter might simulate the universe for a few turns ahead before deciding what outcome it likes best. A very clever critter might become curious, and use carefully chosen outputs to strategically eliminate possibilities.

There’s lots of interesting things we can think about next. For example, what happens when the universe is a program that deliberately tries to fool the critter’s program? Exactly much better would a critter do if, each turn, we informed it what would have happened if it had chosen the other output? What is the Q of a neuron? What is the Q of a strand of DNA of some length?

We can also play a fun game to try and develop a critter program that doesn’t require us to enumerate PROGRAMS. I think this would be a fun competition. Let’s see what we can learn right away:

At time 0, the critter knows nothing about the universe. Its choice of output is arbitrary. Call it a.

At time 1, the universe has provided input i, which is either a or ~a. The universe’s program could be: Always output i, or always echo the critter’s output, or something else. I think we can prove, though, that the critter’s optimal strategy is to predict and output i on the next turn, just in case the universe is one of the really simple programs.

After that, things get a little more complicated, and it’s lunch time now.

* This is how far I got before I decided it was time to get dressed and go to school. At school I spoke with Zak, who pointed me at Algorithmic Information Theory, which is a new-ish branch of mathematics and computer science that studies these things.

How much of total software cost goes to maintenance?

September 8, 2009

I thought I had a simple question: How much of total software cost goes to maintenance? I want a number that is recent, backed by evidence, and generally citation-worthy.

It turns out it’s hard to find. Software engineering course notes online tend not to cite their sources. Recent estimates tend to come from magazine articles citing Gartner or Forrester. A 2000 IEEE-published article was cited as “90%,” but digging into the paper I find:

According to an informal industry poll,85 to 90 percent of IS budgets goes to legacy system operation and maintenance.

If you know where I should look, please get in touch.

Cite this page

September 3, 2009

Get BibTex-formatted citations for web pages in one click.

WordPress won’t let me embed JavaScript links in a page, so:

  1. Go here.
  2. Bookmark the link called “Cite this page”, e.g. by dragging it to your browser bookmarks bar.

Unescaped code (you might need to View Source on this page to get the whole thing):

javascript:(function(){var d = Date().toString().split(" "); alert("@misc{" + document.title.split(" ")[0] + ",\n    title={" + document.title + "},\n    url=\"\\url{" + window.location +"}\",\n    note=\"[Online; accessed " + d[2] + "-" + d[1] + "-" + d[3] + "]\",\n    author={" +(document.getElementsByName('author')[0] || {content:'TODO put in author'}).content + "}\n"  + "}")})()

Tested in Safari 4.

Mrrr.

September 2, 2009

And I quote:

For these reasons, our general approach to DSL development consists of two major tasks:

1. Tailor the DSL engineering process to the project’s context.

2. Apply the tailored DSL engineering process to develop the DSL.

To paraphrase: “Our process, that we are publishing here in this peer-reviewed journal, is: Make up a process, then use it.”
Even though this is a cheap shot, I would also like to point out that this same article has typos. Like, they misspelled “Haskell” as “Haskel,”, and “built” as “build.”
I wonder if I am the only human being on the planet who is reading this thing.