Archive for November, 2008

Little Design Decisions in the Modeling of a Location

November 30, 2008

Alternate title: Procastination Sunday Afternoon

Let’s say you want your computer program to know about locations.

What are you building?

A simulation for an assignment.

What programming language should I use?

Python. I want to learn Python better and I’ll visualize the simulation in Nodebox, which I’ve used before.

What location format should you use?

Latitudes and longitudes. The real application would involve mapping these; the mapping tools I’ve used understand latitudes and longitudes. Six decimal places gives me 1 metre accuracy.

Should I make a class to represent locations, or just use something simple like tuples of floats?

Use a class. You can do some simple validations and unit tests. Good defensive programming. 

What should the class be called?

‘location’. It’s not important enough for me to go learn Python naming conventions right now.

How should the class represent the data?

Python properties seem to be the “new” way of doing basic data in Python, I’ll use those. Lesson learned the hard way: Must remember to inherit from ‘object’ when using Python properties or else they silently don’t work.

Should arbitrary numbers be allowed as latitudes and longitudes or should they be restricted to the degree ranges that maps actually use?

Restrict. I might as well have the class guarantee latitudes between -90 and 90, longitudes between -180 and 180. 

Should the class validate its inputs and reject bad input or fix them up to be valid? 

Validate and reject. “Be liberal in what you accept” is bunk.

Should you test your validations?

Might as well. I want to learn how unit testing works in Python.

Should you include an operation to move a location?

I guess so. 

Should it be named or should you overload addition?

Lat/lng pairs act like vectors in some ways, but I don’t want to go there. I’m not curious to know about Python operator overloading right now. I’ll keep it simple and just call the operation moveBy.

Should there be one moveBy for Latitudes and one for Longitudes, or keep them together?

I’ll keep them together, the alternative would be annoying.

Should both arguments be mandatory?

No, I’ll make them optional named arguments.

What happens when moveBy wants to move a location outside the range of valid latitudes and longitudes?

If I validate, then one call to moveBy could be valid while another identical call was invalid based on internal state of the location. That’s undesirable. I guess I’d better have moveBy fix up the location. It’s not that complicated: I just move the current lat or long into the positive domain, add the parameter modulo the range of allowed values, and adjust the new lat or long back to being centred on 0. This will be error prone, though, I’ll want to test it thoroughly.

You’re using floating point numbers internally; successive small moveBy calls could introduce numerical inaccuracies. Should you take steps to remedy this?

No, I don’t care enough about numerical stability here and I would have to work hard to learn what needs to be done.

Should you really have made a mutable data container? Are locations really mutable? It’s weird to use them as hash keys now, maybe you should have used an immutable pattern. If you make it immutable, operations like moveBy just become factories for new immutable locations.

I’ve already built and tested this class. It’s not worth changing over now. Besides, mutable is probably more efficient.


I don’t know why I was so unusually introspective of my own thoughts during this time, but there it is.

(43.7, 79.4)

Notes on “Design for Privacy in Ubiquitous Computing Environments”

November 25, 2008

Design for Privacy in Ubiquitous Computing Environments: Victoria Bellotti and Abigail Sellen

Design for Privacy explores how ubiquitous computing applications change existing privacy conventions and expectations. It focuses on the media lab at EuroPARC, where nearly everything is recorded and transmitted widely. The authors extract the privacy design principles that have emerged from EuroPARC.

The two broad principles are control and feedback. Control is about what the user can do about information collected. Feedback is about what the user knows about information collected.

The paper talks about how the researchers at EuroPARC have adapted to having every aspect of their working lives watched and transmitted. It doesn’t bother them. Yet, visitors to the lab are disturbed. To me this shows that even highly intelligent people can be convinced to give up a great deal of privacy. Since privacy is necessary for security, I interpret this fact to mean that we must be vigorous in  our opposition to privacy violations, lest we stop noticing them.

The paper talks about Disembodiment and Dissociation. To cut through the academic jargon, even PhD researchers who spend all day thinking about recording devices make mistakes about when they’re being recorded and where that information is going. What hope is there for the rest of us?

Furthermore, they talk about a problem they call “Breakdown of social and behavioural norms and practices.” To quote from the paper, 

For example, breakdowns associated with disembodiment include a tendency for users to engage in unintentional, prolonged observation of others over AV links.

In other words, the researchers got creepy.

To remediate these problems, the authors propose a list of design principles. If obeyed, these principles ought to safeguard against abuses of a ubiquitous information capture system.

I’d be very interested to see if systems designed with the principles really solve the problems.

Notes on “Privacy by Design – Principles of Privacy-Aware Ubiquitous Systems”

November 25, 2008

Privacy by Design – Principles of Privacy-Aware Ubiquitous Systems: Mark Langheinrich

Privacy by Design explores the intellectual history of privacy and applies it to ubiquitous computing. It explores the legal history of privacy in Western culture, outlines the principles of the top thought leaders, and explores contradicting viewpoints on the importance of privacy issues.

Ubiquitous Computing applications can mean more eyes and ears in our lives. Current research hasn’t spent much time exploring the privacy implications of these applications. Langheinrich makes an important contribution by summarizing the important principles to make it easy for an ubiquitous computing application developer to think through the privacy dimension of a technology. The structure he establishes allows some systematic evaluation of a system. 

The principles are simple:

  • An ubiquitous computing system should notify nearby people what it is recording.
  • An ubiquitous computing system should allow nearby people to control what is recorded or opt out of the recording.
  • An ubiquitous computing system should not violate the anonymity of a person.
  • An ubiquitous computing system should keep local information local.
  • An ubiquitous computing system should secure the information it collects.
  • An ubiquitous computing system should provide access to the information it collects. It should be possible to remove information.

I think the paper represents an important call to action for the ubiquitous computing community. These applications are increasingly widespread. At the same time, current lack of attention to privacy pushes back the boundary of acceptability.

Like many ubiquitous computing research papers, this one lacks a connection to reality. In the end, thinking about privacy issues and fixing them takes time and money. These are scarce resources and enforcing privacy principles will require more than papers in journals.

Notes on “At Home with Ubiquitous Computing: Seven Challenges”

November 17, 2008

At Home with Ubiquitous Computing: Seven Challenges: W. Keith Edwards and Rebecca E. Grinter

The authors outline seven key design challenges for ubiquitous computing applications in the home.

  1. No group of technologies will be introduced wholesale. Ubicomp technologies will enter the home piece by piece, so we cannot take advantage of whole-system design.
  2. We must somehow solve a hardware version of the polymorphism paradox. That is, adding new types of devices or new device capabilities can require modification of all existing devices.
  3. Most homes do not have a technology expert on call.
  4. Technologies must integrate into home routines.
  5. New technologies may change housework in undesirable ways.
  6. Home technologies must be highly reliable.
  7. Many applications must solve the “wizard” problem, in which unintelligent and perfectly intelligent systems are desirable but anything in between just causes problems.

One, two and seven are broadly applicable to many problem domains. There is no generic solution to #1. A good solution to #2 is unlikely as it’s a fundamentally hard problem. #3 is an instantiation of a need for simplicity in home technologies. I don’t think system designers need to worry too much about #4. I would consider it likely that home routines will bend themselves to new technology. #5 is a fundamental problem with any innovation and also shouldn’t trouble a system designer. We shouldn’t stop inventing things just because they might have some undesirable side-effects. #6 is a major open problem in software engineering. #7 is a well-known problem in all software design. Its implications are well understood and for this reason it is easier than the others.

Just as algorithm designers need to understand complexity to know when they have stumbled on known hard problems, home application designers should be aware of these principles to recognize a hard problem. This paper has value in that it provides a framework for thinking about the challenges in a clear way.

Notes on “The Heterogeneous Home”

November 17, 2008

The Heterogeneous Home: Ryan Aipperspach, Ben Hooker, Allison Woodruff

The gist: Homes are becoming too similar. More variety will help us be healthier and happier. We can use technology to provide some variety.

The good: <This space intentionally left blank.>

The bad: The authors don’t provide evidence for anything of their claims and they didn’t do anything.

The abstract of this paper begins,

Due to several recent trends, the domestic environment has become more homogeneous and undifferentiated. Drawing on concepts from environmental psychology, we critique these trends. We propose heterogeneity as a new framework for domestic design, and we present design sketches…

The introduction begins,

A growing number of scholars have noted the increasing homogeneity, or uniform and undifferentiated nature, of the domestic environment. For example, the modern housing landscape has been critiqued as offering limited variation in internal form and structure…

It is remarkable how much information I gleaned from these few sentences. From these alone, I immediately inferred:

  • The paper would be full of needless words. (Yes. E.g.: “Increased homogeneity in the domestic environment plainly offers attractions such as convenience.”)
  • The references list would be short on peer-reviewed journal articles in Ubiquitous Computing and long on books, old material and non-academic fluff. (Yes. By my count 19 of 55 references were peer-reviewed research in Ubicomp. Most of these were “We built something cool” conference submissions)
  • It would be hard to find a contribution to knowledge in the paper. (Yes.)

Just for fun, I quote from the “Approach” section the crux of the method:

“To generate the design sketches, the authors engaged in a collaborative dialog with each other that drew on several resources and perspectives.”

This crucial sentence almost slipped by, buried in a giant paragraph on the third page of the paper. To paraphrase: “We talked about homes and fluffed it into a nine-page conference paper.”

Notes on “Fishpong”

November 9, 2008

Fishpong: Encouraging Human-to-Human Interaction in Informal Social Environments: Jennifer Yoon, Jun Oishi, Jason Nawyn, Kazue Kobayashi, Neeti Gupta

Fishpong is a slow, casual game. An animated table with a display shows water. When someone puts a coffee mug on the table, it generates ripples and a fish. The fish swims in straight lines around the table. With one player, the fish might swim off the table, so the player must block it with his or her mug. With more than one player, it’s a game of pong with the mugs as paddles and fish as balls.

The idea is that strangers might spontaneously begin conversations when they find they are inadvertently playing a leisurely game together. This claim has not yet been evaluated.

It’s a clever idea. I’d like to see if it works in a real coffee shop.

Notes on “Uncle Roy All Around You”

November 9, 2008

Uncle Roy All Around You: Implicating the City in a Location-Based Performance: Steve Benford, Martin Flintham, Adam Drozd, Rob Anastasi, Duncan Rowland, Nick Tandavanitj, Matt Adams, Ju Row-Farr, Amanda Oldroyd, Jon Sutton

Uncle Roy All Around You is a game/performance in which participants travel London hunting clues to find a specific office. Along the way, actors and remote players watching online guide them.

The authors spin a paper out of the performance by talking about trust and explaining users’ reactions to the game elements. There aren’t many surprises in their findings: Online players are less engaged than real, on-the-ground players; Users did things while playing the game they wouldn’t normally do, such as stepping into a limousine; Unlabeled performers caused users to suspect everyone of being a performer; Users found workarounds for the game’s technical limitations.

I’m not quite sure if the paper had a point, so it is difficult to comment on the strengths or weaknesses of the paper in making its point.

I note that 19 papers cite it (via Google Scholar), of these, fewer than ten are not written by one of the paper’s authors.

On the Validation of Tool-Oriented Research

November 9, 2008

Or, Why the Little Computer Scientist Has Beef With “We Built Something Cool”-type Academic Papers.

Many papers describe a technology the authors built. I call these the “We Built Something Cool” papers. Most of the papers from my Ubiquitous Computing class fall into this category, along with a few Software Engineering reading group papers.

Here are some examples of papers in the category, of varying quality:

  • Fishpong: Encouraging Human-to-Human Interaction in Informal Social Environments by Yoon et al.
  • Uncle Roy All Around You: Implicating the City in a Location-Based Performance by Benford et al.
  • In Praise of Tweaking by Gulley
  • Digital Family Portraits: Supporting Peace of Mind for Extended Family Members by Mynatt et al.

There’s a common pattern: The researchers build a system, describe it, and publish. It’s getting old. 

Without some sort of validation of the utility of the system, there’s no answer to a claim of uselessness. We have a word for “useless things built for interest or fun”: Hobby. Go publish what you built on your blog. See if the world cares.

Here is the Little Computer Scientist’s Five-Step Plan for “We Built Something Cool” Papers That Don’t Suck:

  1. Name a clear problem. Quantitatively show that it exists and is worth fixing. (For those papers that don’t solve a problem but instead enhance the status quo, give a quantitative baseline to which you will compare your system.)
  2. Survey existing solutions for the problem. Explain their inadequacy.
  3. Describe the design of your solution. Describe the domain-specific lessons learned during the iterative development and testing of the solution.
  4. Show that your solution solves the problem better than anything else. Make sure that the testing generalizes to the target population. The results should be reproducible. I should be able to borrow or duplicate your system, evaluate it in the same way as you did, and get the same result.
  5. Describe how your solution has been made available to the public. Are you commercializing it? Is it available for download? Give some indication that it is on its way to enthusiastic adoption by those affected by the problem.

Some will argue that this bar is too high. There may be something to what they say. After all, it’s easy for me to say “do more.” I’m not writing papers yet; I’m no more than a practitioner-in-training and a cheap critic. It’s just that I’ve noticed a pattern in my reviews which is caused by a pattern in the papers I’ve reviewed.

How to Tell the Truth With Statistics

November 5, 2008

Carolyn graciously lent me a copy of “How to Lie with Statistics” by Darrell Huff. Despite its 1954 publication date, this book is remarkably relevant today. Below, I explain why the book, despite its high quality, will never achieve its aim, and my suggestion for a substitute.

How to Lie with Statistics is a gentle introduction to deceit with numbers. It is brief, the writing is elegant and light-hearted, and every single one of the lies described in the book is still in widespread use sixty years later. The book includes an informal catalogue of common statistical errors, reserving special scorn for the Precision Bias.

It is a valiant effort to craft an accessible and persuasive introduction to the issues. The author seems to believe that with sufficient widespread education, we can banish misleading numbers. I disagree. The problem is hard, in that the tiny individual payoff will never justify the effort needed to detect and oppose numerical deception. We need an easy way of certifying and enforcing honest data presentation. 

The core of Statistics is the comparison of expectations to results. All of the lies present accurate and precise numerical results (technical honesty) but mislead about the appropriate comparable expectation (de facto dishonesty). The situation is complicated by the fact that even professionals frequently have difficulty crafting the proper expectations. Malicious numerists always have plausible deniability.

To put it another way, statisticians have considerable flexibility in methods and presentation. Special interests abuse the flexibility for their own purposes. 

There is an analogy to accounting. Accountants have considerable flexibility in methods and presentation of financial results. Accounting is about leveraging that flexibility to avoid taxes. In response to the inevitable plethora of abuses, accountants developed the Generally Accepted Accounting Principles (GAAP), a catalogue of rules to govern the business.

I propose the development of Generally Accepted Numerical Principles. We must formalize the Expectation side of the statistical Expectation-Results dichotomy so that we may call out a liar and impose consequences where necessary.

How might such a system work? I would leave the details to the expert statisticians, but one way would be to develop a formal catalogue of Expectations given specific Results. It might look like something like the following (though this is not the formal proposal):

Use of “Average”

A number called an “average” in isolation entails the following assumptions:

  • The number presented is an arithmetic mean.
  • The sample of the average is an unbiased representative of the stated population
  • The population has a normal distribution in the variable. 
  • The median is within 0.1 standard deviations of the mean.
  • p < 0.05

N out of M/Percentages

A statement of the form “N out of M Practitioners <statement>” or “X% of Practioners <statement>” implies:

  • The sample is an unbiased representative of the stated population
  • The population has a normal distribution in the variable 
  • p < 0.01

Line Graphs

A line graph must:

  • Have axes labelled and units included
  • y-axis has 0 at the origin and no discontinuities
  • All data points collected with equal sample characteristics

Appropriate uses could be given a trustworthy logo or stamp. Publications could be “GANP 2011 certified” indicating that they obey the rules of the GANP. It would become easy for lay people know what numbers to trust.

Obviously, the development of such a catalogue would be a monumental task. The organizing committees would be subject to perpetual corruption and interference attempts. The first several iterations of the GANP would permit rampant abuses while loopholes were found and closed. Chaos, confusion and doubt would run amok. During the development of the rules, at least 452,235,239 people will die and more than 1.37 billion will suffer in poverty. Nevertheless, four out of five University of Toronto experts agree, this is a good idea.

Idea: Programmer Productivity Measurement via Inspection of Version Control

November 2, 2008

Measuring programmer productivity is a hard problem. Programmers game the system to maximize their top line and it’s hard to agree on a sensible definition of the bottom line. I understand this. I don’t claim to have solved the problem of measuring programmer productivity, because that would be crazy. Regardless, I’d be curious to see the result of the following:

Take a software project that has:

  • A version control system
  • An automated test or benchmark suite
  • Multiple full-time contributors
  • A large code base (thus, )
  • A significant history measured in years (thus, )
  • Lots of commit data

Now, we use the following system on existing data:

  • A programmer get points for lines of code committed, if and only if that code is exercised by the automated test or benchmark suite. The programmer gets negative points for code that isn’t exercised by the automated tests. This aims to “reward” useful contributions.
  • Those points are made negative if later another programmer changes or deletes the line of code. We assume that code was buggy or needed cleanup, costing another programmer or future self productivity. This is meant to “penalize” prolific-but-bad coders, introducers of bugs, and code churners.

The overall point of the exercise is to see if we can detect programmers who are hidden net negative contributors who wouldn’t obviously be revealed by contribution lines-of-code counts or who seem to get lots done at cost of teammates’ productivity.

I’d be happy for other suggestions on how to go about detecting these.

Idea: “Flow” detector and status publisher

November 2, 2008

Idea: An activity monitor for computer programmers that detects when the user is in the zone and automatically updates a public status (e.g. IM status or Twitter) accordingly.

Detecting “in the zone” might be hard. First I would try simple monitoring of task switches plus idle time. If a user is using one window with low idle time, they’re probably in the zone. If they go idle or switch windows (e.g. checking email or the news), they’re not in the zone.

If that turns out not to work, then I would try inputting a large number of features into a machine-learning algorithm of some sort to see if a pattern can be detected. These features would include the active program (or set of active programs), idle time, keystroke and mouse activity frequency and time since last task switch.

One hopes that a colleague would be less likely to cause an interruption if an easily available tool shows that the user is in the zone.

It might make sense to publish all of the features as raw data and let colleagues make their own decisions about whether the user is in the zone.

An experiment would passively observe users of the tool compared to users without it over a few days of work. An experimenter codes “in the zone” vs. “not in the zone” time and we look for a significant difference in the number of flow interruptions.

Notes on “Power Laws in Software”

November 2, 2008

Power Laws in Software: Panagiotis Louridas, Diomidis Spinellis and Vasileios Vlachos

The gist: Dependencies in software projects follow power laws (colloquially known as the 80-20 rule.)

The good: The paper includes a convincing demonstration across nearly twenty major software projects of a statistically quantifiable effect. The authors take care to avoid unjustified claims of causality or importance.

The bad: The causes of this effect are left to be determined. Nothing more than rough ideas are offered.

I liked that the paper mentioned network effects and self-reinforcing winner-take-all effects as hallmarks of power laws.

Notes on “Embedding Behavior Modification Strategies into a Consumer Electronic Device: A Case Study”

November 2, 2008

Embedding Behavior Modification Strategies into a Consumer Electronic Device: A Case Study: Jason Nawyn, Stephen S. Intille, Kent Larson

The Gist: The authors built a television remote control that gently encourages exercise and discourages extended television viewing. Along the way, they thoroughly elaborated several design principles to support effective persuasive technologies. In a preliminary study with one user, the remote control showed promise to achieve its goals.

The Good: The descriptions of persuasive technology design principles are well thought out. It’s clear how they influenced and improved the design of the remote control system.

The principles include:

  • Just-in-time interactions: The technology needs to change behaviour at the behaviour’s time and place
  • No time commitment: The technology may grab attention but must not require a user to explicitly devote time to it.
  • Sustain interaction over time: The technology must not be abandoned by users after the initial novelty period.
  • Non-coercive: The technology must not force the desired behaviour. 
  • No extrinsic motivation: The technology must not treat the undesired behaviour as a reward.

The remote control system cleverly meets these criteria. It suggests alternative activities to TV at key moments (e.g. between shows or at commercial segments.) It makes it easy to switch from TV-watching to active games and monitor total TV watching time. It never actively interferes with TV watching or annoys the user.

The Bad: They didn’t have the resources to fully evaluate the remote, so it’s unclear that the technology actually accomplishes its goals.

Notes on “Jogging over a Distance”

November 2, 2008

Jogging over a Distance—Supporting a “Jogging Together” Experience Although Being Apart: Florian Mueller, Alex Thorogood, Shannon O’Brien

The gist: This is a typical Ubicomp “we built something cool” paper. In this case, two joggers in different locations wear headsets that keep an audio connection alive. Spatial audio positioning makes it sound as if the other jogger is ahead or behind based on whether the other jogger is running faster or slower. 

The good:

  • Cool idea. 
  • They came up with a neat trick to do the spatial positioning with stereo headphones: Position the “front” sound at 1:30 and the “back” sound at 7:30 so that “front” sound is also slightly right and “back” sound is also slightly left.

The bad: They did no formal study of the utility of the system. This is a cool product prototype without much science backing it up. If they believe in it, they should commercialize and sell it, or do a formal study showing that users like it or that it encourages good behaviour.