What do you want?

November 6, 2009

I was asked today what I wanted in a job.

It was surprisingly difficult to answer.

I like achievable challenges. That’s obvious. But—getting some CSS to display just right on Internet Explorer 6 is an achievable challenge, but it’s not work I’ll enjoy. Crushing a hard bug is fun, but debugging code that should never have made it through review is not. Writing efficient code is fun. Dealing with low-level memory management, not-so-much. Type errors in loosely typed languages feel preventable. Not fun. Designing a good API: Fun. Dealing with legacy constraints: Not fun.

These days it’s popular to filter software engineers: Front-end or back-end? Applications or systems?

But this isn’t a key distinction for me. It doesn’t capture the essence of what I care about. I could be happy working on front-end or back-end code. I could be unhappy working on front-end or back-end code.

Now, moments after my interview ended, I’ve managed to figure out the answer…

Accidental complexity: Not fun. Intrinsic complexity: Fun.


On Mobile Phones and Driving

November 5, 2009

Worth repeating, from Jakob Nielsen’s excellent newsletter:

In an opinion poll, the New York Times asked whether Americans wanted to outlaw mobile phone use while driving:

> http://www.nytimes.com/2009/11/02/technology/02textingside.html

80% of respondents said that using a HAND-HELD cellphone while driving should be illegal.

But almost 90% said that it should be legal to talk on a HANDS-FREE cellphone.

Completely OPPOSITE INTUITIONS about the danger of these two ways of using a mobile phone. This despite the fact that all studies show that it’s EQUALLY DANGEROUS to use hands-free and hand-held mobile phones while driving.

Research has found that the danger comes from the cognitive distraction of carrying out a conversation with somebody who’s not in the car. The problem is not holding the phone with one hand while driving with the other. The problem is the conversation and the way it lays claim to limited cognitive resources.

This finding is completely counter-intuitive: how can it endanger your life to carry on a simple, everyday task like a conversation? The survey clearly demonstrates that people’s perception of danger is completely divorced from the actual danger.

When driving, you are piloting a power plant, and a single mistake will cause human beings to die. This is a task worthy of your full attention.


Job Interview Notes

November 4, 2009

A message I sent to the students of CSC 290 (Communication for Computer Scientists) following our class with practice job interviews.

Alexey said: “I wish I had not used `you know`, `er`, and `well` that often. But these words come out instinctively, as they give me time (a second or two) to think of what I am going to say next. Still, there must be other ways.

Here is a golden phrase to memorize when you need time to think about your answer for a question:

That’s a good question. Let me think for a moment so I can give you a really good answer.

You can also buy time by paraphrasing their question:

Just so I know exactly what you’re asking—Are you saying <your interpretation of their question>?

Of course, if you use one of these on a question that should be easy, you’ll seem dumb. And you probably can’t get away with using either more than once.

Yanshuai mentioned that he should have prepared a pitch for his computer vision project. In fact, you should prepare a pitch for every single item on your resume. You should also prepare pitches about your experiences in overcoming challenges or conflict, working in a team, dealing with difficult people, disagreeing with a superior or with your team, and so on.

You shouldn’t memorize your pitch, though. Instead, think about and jot notes about the importance of the project, the transferable skills you learned, and how it might relate to work at X.

Lots of interviewers also ask, “Tell me about yourself.” “Why do you want to work for X?” “Why should we hire you?” “Where do you see yourself in 5 years?” There is no excuse for stumbling on any of these questions. To answer them, remember: They are asking you to tell them about a personal quality or skill set. Lil gives an example: “I can multitask and meet difficult deadlines.  For example, when I was taking X course, I was also working on project Y, and had deadline W at work.  I was able to complete all these tasks by ___ and ___ resulting in ___.

George mentioned doing mock interviews with a friend. This is a great idea.

Leon said, “I practiced answering questions rather than exploring my experiences themselves. If I had another chance, I would have prepared differently by studying past experiences and situations carefully.”

This is an important point and applies to more than just job interviews. Once you have developed good foundational speaking skills, your best preparation is not canned answers or prepared words. Your preparation will be about becoming deeply comfortable with your material. Then the words will flow.

Reza asked, “Also knowing that the interview is short should I let him ask more questions?

While it is important to give complete answers, you should always leave the interviewer in control. In fact, you should carefully listen to the interviewer to get a sense of whether he or she wants you to take charge and talk a lot, or whether he or she would prefer to move things along.

In an interview, it may feel as if you should talk a lot, but in fact a big part of the game is listening to your interviewer (deeply listening—not just staying quiet while they talk.) You should listen actively—for example, paraphrase them and ask questions. Another one from Lil: “So you’re saying that the researcher wants you to organize data on 10,000 irradiated mice and you’re looking for a way to automate it:  Is that it? What are the resource constraints?

I should probably explain “deep listening” better. Here’s an attempt.

We learn as children that “listening” means not interrupting the other person.

“Active listening” is a skill that involves engaging the other person in a conversation to gain a clear understanding of what they are saying.

When I say “deep listening,” I mean listening with all your brainpower. It’s hard work. You’re not just trying to understand their words. You’re also analyzing: What are they saying between the lines? How does this person feel? What’s going on in the back of their mind? What are they like as a person? What’s important to them?”

Actually, if you do this, you won’t need to worry about eye contact or eager posture since it will come naturally.

Taking a step back, the best way to prepare for an interview has nothing to do with interview questions, practice, listening or public speaking skills. To prepare, you must build your resume. By that I don’t mean putting words on page. Build your resume means actively seeking out new experiences. Then you can put them on your resume. Joining and contributing to team projects boosts your leadership and teamwork credentials. Writing open-source code outside of class is one of the best ways to get new technical experiences.

And of course,

  • Smile
  • Make eye contact
  • Shake hands firmly
  • Sit up
  • Judo: Turn weakness into strength
  • Give complete answers
  • Demonstrate enthusiasm and confidence
  • Be specific

Most importantly:

  • Prepare
  • Listen to your interviewer

If you have other important comments that I’ve missed, please let me know.

Your loquacious TA,

Aran


Units 0.03 is out

October 31, 2009

Units 0.03 is on PyPI. It includes a bug fix (A litre is a cubic decimetre, not a cubic centimetre!). It also has patches for PyPy that let you run a Python interpreter with built-in support for quantity literals. With these patches you can type 239 cm * cm * cm + 0.502 L and it will do what you expect.


Improve your Java: Restrict public methods to interface implementations

October 29, 2009

Good programming is all about restriction and constraint. Most possible programs do nothing interesting. Type systems, unit tests, code reviews and all that jazz are just ways of throwing out uninteresting programs.

Here is a chart view.

There are many more broken programs that good ones.

Notice the thickness of the blue slice.

This realization is what makes me feel comfortable proposing that we, as programmers, should mostly be thinking about new ways we can restrict ourselves.

Here is one way you can restrict your Java (or other classical Object-Oriented) programs to make them better.

Aran’s Public Interface Implementation Rule: All public methods on a class should be implementations of some explicit interface.

See, Java is tricky, because interfaces are interfaces, but so are classes. When you expose public methods on a class that are not part of an interface, a user of the method must specify the class name on the left hand side:

YourClassThatsAlsoAnInterface p = new YourClassThatsAlsoAnInterface();

That means their code is now glued to your implementation. They might not like that.

It would be better if you let them code to an interface and substitute implementations as desired. Thus, don’t bother putting public methods on a class that aren’t implementations of an interface.

YourInterface p = new yourClassThatsJustAClass();
YourInterface p2 = new theirReimplementationOfYourClassThatsJustAClass();

Incidentally, this rule totally enables Joshua Bloch’s Effective Java, Item 34: Refer to objects by their interfaces.

As Zak points out, the tradeoff is that you may hurt some compiler optimizations. If you have benchmarks to prove that following this rule causes unacceptable performance degradation, you have my permission to ignore it. Thanks for asking.

Leave your thoughts in the comments.


A Hack for “Unknown or expired link”

October 28, 2009

You’re using a web site, you fill in a big long form, and, upon submitting, you get “Unknown or expired link.” Why does this happen? The short[1] explanation is that the web site gave you a hidden code to submit with your form, and while you were typing, your code expired. If you hit “Refresh” to get a new code, your browser will probably throw away the contents of your form. You’re lucky you can hit “Back” from the “Unknown or expired link” page.

What to do?

Choice number 1: Copy out your form contents, refresh the page, and paste the contents back in. Slow, boring and painful.

Choice number 2: Get a new code and edit your current page to use it. Fast and fun!

How to get a new code: Open a new browser window, go to the page with the form, use “View source”, find the form you were filling in, find the hidden field in the form with a weird value, and copy the strange value.

How to edit your current page to use it: Go to the window where you’ve filled in your form, pop open your DOM editor, find the weird hidden field, and replace its value with your new code. Now when you submit your form, it will be using a fresh code!

OK, so maybe it isn’t easy, but at least it’s fast.

[1]: The long explanation is that the web site is probably using a continuation-passing style for managing client-side state and it garbage-collected your form submission continuation. I didn’t think you wanted to know.


11 Proposals For A Better DCS

September 16, 2009

Incoming graduate chair Peter Marbach has been leading a series of “town hall” meetings to build support for a Department of Computer Science conference, organized by graduate students. The stated goals are a) to improve the quality of the graduate experience by promoting interaction between areas and b) to build the Department’s reputation in industry and academic circles by effectively communicating our successes.

I’m happy that Peter cares enough to try something big, and doubly happy that he is trying to listen to us and involve us in his work. But, while the goals are lofty, I don’t agree with the conference approach. I think that there are lightweight projects that would contribute more to interaction between areas, and I feel that hosting a conference is not an effective way to build a reputation for our work. I would also criticize the proposal for lacking specificity (especially budget, organizing team, schedule) and accountability (no simple success or failure metrics).

I hate to criticize an ambitious and well-meaning proposal without suggesting a replacement. Accordingly, I’ll list some specific proposals below, without taking credit for any of the ideas. Before I do, I’ll list a few of the community projects that already exist.

CSGSBS “big” events

The CSGSBS hosts large, general-interest events a few times a year, including an upcoming BBQ.

CSGSBS “little” events

The CSGSBS coordinates weekly events including cookie breaks, movie nights and pub nights. There are attended by a small percentage (~10%) of the grad student population, and most weeks attract the same crowd again and again.

Research-In-Action showcase

Annual conference hosted on campus with posters and demos from graduate students. A few industry and political guests attend. Typically heavily-weighted on demo-friendly DGP work.

Area talks

Most areas have weekly talks. Outsiders are welcome in theory, though an interested outsider would have to find the right mailing list and often the material is too advanced.

11 Proposals for a better DCS

1. Lightweight talks

1.a Area intro talk series

Each week, a different area gives the talk. At a casual, accessible level, the talk covers why the area is interesting, the most important fundamental closed and open problems in the area, key results, and how each student’s work in the area fits into the big picture. The speaker recommends a survey paper or introductory book for students interested in more. No posters, textual or mathematical slides or chalkboards allowed. Each proposed speaker is vetted by the organizer in advance to ensure a minimum public speaking ability.

1.b Monthly Mini-TED

Monthly casual 1.5 hour meeting in a lounge with coffee. Three speakers give TED-format talks on any beginner-accessible topic of intellectual interest not related to research. Given the variety of DCS member  interests, we might hear talks on sailing, design, typography, photography, quantum physics or hooping.

2. DCS Portfolio

2.a Technical reports database

For communication of early or small research, revive the technical reports database or provide a DCS-only externally-accessible site with ArXiv.org-like functionality.

2.b Publication portfolio

Provide an RSS feed with links to PDFs of all research published by members of the department.

2.c Entrepreneurial portfolio

Directory of links to companies founded by members of the DCS.

2.d Graduate success portfolio

Track the alumni network (undergrad and grad) and list the institutions they join after graduation.

2.e Blogroll

Directory and firehose of blogs written by DCS members. Michael Famelis has done good work in this area for software engineering. Extend it to the whole department. For bonus marks, provide a global WordPress installation on cs.toronto.edu for DCS member blogs.

2.f Software Portfolio

Directory of software written by members of the DCS. This can include side projects, not just research-related work. Subsection for links to software to which DCS members have contributed. For bonus marks, provide distributed version control hosting and virtual private servers for hosting web applications.

3. Activities

3.a Pick-up sports

Weekly pick-up soccer or ultimate games in the summer and fall.

4. Structural Changes

This is the controversial stuff.

4.a Common-space whiteboards

Move whiteboards out of private, locked areas into public spaces where they can foster discussion.

4.b Cooperative Education

This is the biggest proposal, with highest cost and greatest potential payoff.

For the undergraduate program, provide institutional support for four-month co-op terms after first and second years, and break the PEY into four consecutive four-month terms. For MSc students provide institutional support for corporate research internships between completion of class and start of full-time research. A common infrastructure would support both missions.

Why co-op?

Work terms create a feedback loop with in-class education. Real work motivates study and supports good study habits. Programming skills honed in the workplace support theoretical learning in the classroom or lab. Problems discovered in the workplace can motivate research directions.

Work terms create a feedback loop with companies. With more work terms, more companies see more students. Companies are eager to hire students who have worked for them before. Companies are willing to contribute resources to universities that are an important part of their HR pipeline.

Weakness of the PEY

I claim the following weaknesses in the PEY program:

  1. Students make long-term commitments on their first-ever round of interviews. They often have no professional experience, and thus are not yet qualified for advanced work. This restricts their placement to beginner work, but this commitment to beginner work lasts sixteen months instead of four.
  2. Some students just don’t fit with some companies. In other cases, companies abuse long-term students for menial tasks such as manual testing. Four-month terms salvage these cases, without restricting a student from returning to a particularly good employer.
  3. Students see only one corporate culture, team culture and set of engineering practices before graduation.
  4. Students only get one round of interview practice.

Each of these is addressed in the proposed model.

Addressing Common Objections to Co-op for the DCS

I have heard two common objections.

  1. Prohibitive cost: This program would be expensive to administer. Answer: Everyone wants something for nothing, but a program like this is an investment. Other universities successfully mitigate the costs of similar programs. We can do it too.
  2. Competition with Waterloo (and other schools): The implementation of such a program would position us as an alternative to Waterloo’s well-oiled machine, and would contradict our years of PEY marketing. Answer: We should neither hesitate to adopt a good idea just because someone else uses it, nor should we allow a desire for self-consistency to trump innovation.

Summary

A frank an open discussion of ideas might lead us to improvements in our graduate experience, provided we restrict ourselves to specific proposals with measurable outcomes.

Here’s where you come in

  1. Leave a comment in your real name, and rank the ideas you care about. “++” for “would actively support, defend and work for”, “+” is “support passively”, “-” is “oppose passively” and “–” is “would actively oppose, attack and work against.”  So if you hate co-op, dislike pick-up sports, but would fight to see whiteboards moved to hallways, write this: [3.a-, 4.a++, 4.b--].
  2. Go find someone else in the department, force them to read this through and make them think through their opinions on these issues.

Let’s send a message about how to make our department better.


Having an Awesome Time Doing Science

September 10, 2009

I noticed that an two-day old highlight on a paper looked richer than new highlights, even though the highlighter was the same.

The experiment: Highlight a block on a piece of paper. Cover half the block, leaving half exposed to the sunlight and evil fluorescent office lighting.

Controlling for: Effect of time on highlight colour.

Observation: Covered half is lighter than half exposed to light.

Conclusion: Lighting interacts with highlight colour.

Threat to validity: What if air flow is the real cause of highlight colour change?

Future research: Does it matter if the light is fluorescent or sunlight?


Intelligence from scratch

September 9, 2009

Sometimes I get bored at breakfast and start to derive the first principles of fields I know nothing about. I don’t feel bad about it—I could learn something new, but if I get something wrong or get stuck, no harm done. Today’s topic is intelligence.

What does it mean for a critter to be intelligent? I would start with, “an intelligent critter acts in ways that tend to cause outcomes it likes”.

I want to work on this definition a bit. First, I want to separate out that “likes” bit. I think knowing what “likes” means is a hard and worthy problem, but I’m not interested in it today. So I’ll say instead, “an intelligent critter acts in ways that tend to cause predicted outcomes.” If the critter can predict the outcomes of its actions, then it could choose actions with outcomes it liked.

Now, our critters need to know what’s going on in the world. We all make decisions based on a constant input stream of information from the world. Sights, sounds, smells and tastes and are just the beginning—we can also feel whether our arms are above our head, and whether that leftover chinese food is sitting properly, and whether that bad knee tells us a storm is coming. If we’re general with these “internal” senses, we might find it useful later to think about our “likes” as a part of that input stream. We might even go so far as to talk about remembering as if it was turning on an internal input stream that we control.

These input streams change all the time and they’re full of rich experience. If we wanted to build a robot critter, it would need to process megabytes and megabytes of data every instant. I’m happy to leave this problem for someone else, and treat our critter’s input as a big list of bits [0,1,0,0... ( millions more )...] that gets updated many times per second. Maybe the first bunch of bits say what color the world is in the top-left corner of your left eye, and the last bunch talk about whether you’re getting a massage behind your right shoulder-blade. It doesn’t really matter.

Real quick now, we apply the same principle to the critter’s output and treat is as a similar big vector of bits. Maybe turning on the first 100 bits means “raise your hand straight up.”

So we’ve got these big vectors of bits, and they change over time. Our little critter finds patterns in the input and predicts how actions on the output bits will change the input patterns. Then it chooses the actions that give it new inputs that it likes.

Let’s say our critter learns that some output bit always affects the same input bits in the next time step. That’s close to learning that a bunch of input bits are related to each other. So I’m not going to worry about finding patterns within one input vector. Instead I want to focus on finding patterns in the inputs as time changes. So, I’ll construct an idealized problem. Every time step, the critter gets to choose a one or zero as its output, and it gets as input a one or a zero. The critter is also allowed to remember its past inputs and outputs, because we’ve now broken the system of imagining memory retrieval as just another input.

What happens if the critter’s actions have no effect on its inputs? Well, the critter would feel very frustrated and have a low sense of self-efficacy. It might still be able to amuse itself by detecting a pattern in its input, though. We could call this problem “pattern recognition.”

What happens if the critter’s input has no pattern? First, that implies that the critter would be frustrated as above. An input with no pattern would be a world in which every input bit was a coin toss. No critter could do better than 50% for very long. This teaches us that it matters “how much” pattern there is in the input. Just to make something up for now, we’ll call that quantity Q and come back to it later.

With these cases in mind, let’s write down more precisely the problem of intelligence:

Given input [{0,1}, {0,1}, ..., {0,1}] and output [{0,1}, {0,1}, ..., {0,1}], n = |input| == |output|, give [input(n+1 | output(n) = 0), input(n+1 | output(n) = 1)]. That is, given a history of n input bits and n output bits, provide a prediction for input bit n+1 for both possibilities of output bit. (That is, the prediction has two bits.)

In the pattern recognition problem, the two bits of the prediction are always equal, because the output bit has no effect.

Over time, we can measure how well our critter does. Each turn, the critter chooses an output, and then we get to find out if its input prediction given this output was right. (It’s too bad, but we only ever get to test the possibilities the critter chooses.) If the critter always gets the right predictions, we would call it omniscient. If the critter’s predictions are completely random and unintelligent, it would get 50% right. (If it did worse than 50%, say getting 40% right, it could just flip its predictions to get 60% right.) So we can put an objective measure our critter’s perceived intelligence, which is how often its predictions are right, and the maximum limit of this intelligence depends on Q, “how much pattern” there is.

That was too much pseudo-math, so let’s take a walk in the park and look at some ants before talking about what “how much pattern” means.*

An ant can seem to behave intelligently. It leads other ants to food, it carries dead ants away from the ant hill, and fights off mean bugs like spiders. Of course, ants aren’t really intelligent. We find this out when we take an ant away from its environment and it can’t adapt.

The ant isn’t really learning patterns in its environment to act on them. The ant detects and acts on shallow indicators of patterns that its DNA already knows. Evolution executed a learning algorithm, partially evaluating it on the “pattern” of the ant’s environment. It was an engineering tradeoff, sacrificing adaptability to get a simplified design.

From the ant we learn that our critter can have some built-in knowledge, and it behaves just like built-in knowledge in other theories of communication.

Let’s return to discussing “how much pattern.” Kolmogorov teaches us that we can measure the information content of a string by the length of the smallest program that generates that string. There might be other measures we want to use, but I’m satisfied that there is at least one measure that gives us something to work with.

The critter’s input is its environment’s output. In our simplified world, we’ll say that the the universe is a program that feeds the critter its bits. The critter, wishing to learn the patterns in its input, needs to figure out the universe’s program.

For the sake of the argument, we’ll limit the universe’s program to be finite. If the universe’s program grows infinitely, the critter can never win, and that’s no fun.

Suddenly it makes sense to think of the (infinite) set of all possible finite programs. (Actually, the set of all possible programs that input and output one bit per time-step.) We’ll call it PROGRAMS, because that’s appropriately big and bold. The universe is a finite program, so it’s an element of PROGRAMS. The critter’s learning algorithm is a program, so it’s an element of PROGRAMS. Finally, the critter’s learning algorithm has the unenviable task of searching PROGRAMS for the universe’s program.

Let’s give every program in PROGRAMS a number. I don’t know how to do this, but I’m happy to pretend that I can for now. If you’re feeling really generous, we’ll eliminate programs that don’t terminate with an output each turn, because they’re no fun. Then one possibility for the critter’s learning algorithm looks something like:

program_num = 0;
/* Find a simple model of the universe 
   that's consistent with our memory */
while(!consistent(PROGRAMS[program_num], 
                  input_so_far, 
                  output_so_far)) 
{
    program_num = program_num + 1;
}
/* Generate predictions of what the 
   universe will do next */
/* If I output 0, what will the universe do? */
prediction[0] = PROGRAMS[program_num](output_so_far + 0)
/* If I output 1, what will the universe do? */
prediction[1] = PROGRAMS[program_num](output_so_far + 1)
/* Emotions and feelings go here */
return better_for_me_of(prediction[0], prediction[1])

I hope you get the idea.

This critter is actually very greedy and short-sighted. Another critter might simulate the universe for a few turns ahead before deciding what outcome it likes best. A very clever critter might become curious, and use carefully chosen outputs to strategically eliminate possibilities.

There’s lots of interesting things we can think about next. For example, what happens when the universe is a program that deliberately tries to fool the critter’s program? Exactly much better would a critter do if, each turn, we informed it what would have happened if it had chosen the other output? What is the Q of a neuron? What is the Q of a strand of DNA of some length?

We can also play a fun game to try and develop a critter program that doesn’t require us to enumerate PROGRAMS. I think this would be a fun competition. Let’s see what we can learn right away:

At time 0, the critter knows nothing about the universe. Its choice of output is arbitrary. Call it a.

At time 1, the universe has provided input i, which is either a or ~a. The universe’s program could be: Always output i, or always echo the critter’s output, or something else. I think we can prove, though, that the critter’s optimal strategy is to predict and output i on the next turn, just in case the universe is one of the really simple programs.

After that, things get a little more complicated, and it’s lunch time now.

* This is how far I got before I decided it was time to get dressed and go to school. At school I spoke with Zak, who pointed me at Algorithmic Information Theory, which is a new-ish branch of mathematics and computer science that studies these things.


How much of total software cost goes to maintenance?

September 8, 2009

I thought I had a simple question: How much of total software cost goes to maintenance? I want a number that is recent, backed by evidence, and generally citation-worthy.

It turns out it’s hard to find. Software engineering course notes online tend not to cite their sources. Recent estimates tend to come from magazine articles citing Gartner or Forrester. A 2000 IEEE-published article was cited as “90%,” but digging into the paper I find:

According to an informal industry poll,85 to 90 percent of IS budgets goes to legacy system operation and maintenance.

If you know where I should look, please get in touch.


Cite this page

September 3, 2009

Get BibTex-formatted citations for web pages in one click.

WordPress won’t let me embed JavaScript links in a page, so:

  1. Go here.
  2. Bookmark the link called “Cite this page”, e.g. by dragging it to your browser bookmarks bar.

Unescaped code (you might need to View Source on this page to get the whole thing):

javascript:(function(){var d = Date().toString().split(" "); alert("@misc{" + document.title.split(" ")[0] + ",\n    title={" + document.title + "},\n    url=\"\\url{" + window.location +"}\",\n    note=\"[Online; accessed " + d[2] + "-" + d[1] + "-" + d[3] + "]\",\n    author={" +(document.getElementsByName('author')[0] || {content:'TODO put in author'}).content + "}\n"  + "}")})()

Tested in Safari 4.


Mrrr.

September 2, 2009

And I quote:

For these reasons, our general approach to DSL development consists of two major tasks:

1. Tailor the DSL engineering process to the project’s context.

2. Apply the tailored DSL engineering process to develop the DSL.

To paraphrase: “Our process, that we are publishing here in this peer-reviewed journal, is: Make up a process, then use it.”
Even though this is a cheap shot, I would also like to point out that this same article has typos. Like, they misspelled “Haskell” as “Haskel,”, and “built” as “build.”
I wonder if I am the only human being on the planet who is reading this thing.

Units is out

August 17, 2009

Units is a new library for making Python programs safer. It’s also kind of fun.

For more, check out http://pypi.python.org/pypi/units/.


Case Study Databases vs. Ethics and Participation

August 5, 2009

Robert Yin advocates the creation of a case study database alongside the publication of  case study. In my interpretation of this idea, the case study database includes the raw data collected, such as transcripts, interview recordings, and documents collected from archives.  The creation of an “analysis-free” data repository allows others to reanalyze the data to reach different conclusions or for new purposes. This potential for reanalysis in turn strengthens the case study.

For my study, that would mean publishing my interview recordings, my transcriptions of those recordings, and the statistics about software repositories that I collect.

Greg points out two issues with this that have caused me to reverse my plan.

  1. This open approach is rare enough that it might have trouble getting through the ethics board. The withdrawal protocols become awkward, for instance.
  2. This might discourage participation. People might not want to be interviewed if the interview recording will be made available.

I think I will try with a new policy:

  1. Raw data (interview recordings) will be kept private and secure immediately after recording. They will be destroyed after publication of an analysis.
  2. Objective data such as interview transcripts will be kept private and secure. They will be destroyed after publication of an analysis.
  3. A summary of the interviews and other derivative data will be publicly available. If a subject requests it, this summary will be anonymized and the subject will be asked to approve the anonymized version prior to publication. Subject withdrawal will result in destruction of the summary if it has not yet been published. If it has already been published then withdrawal is not possible.

Google Calendar Labs “Free or Busy” Should Be IM Integrated

August 2, 2009

Google Calendar Labs has introduced a new feature called “Free or Busy,” described as “See which of your friends are free or busy right now.”

You make a list of people who’s status you’d like to see, and a widget on the side displays their current calendar status based on their privacy and sharing settings.

It’s a neat idea, but in my opinion the calendar is the wrong place to view this information.  I use my calendar to see what I should be doing or to schedule things for later, not to check up on my friends. If I want to see what a friend is up to right now, I’ll check their instant messenger status, or maybe Facebook or Twitter. I already manage my friend lists in those applications.

Instant messengers are still the best tool for distributed presence information. This free/busy widget would make a great instant-messenger plugin.