Notes on “An Experimental Evaluation of the Effectiveness and Efficiency of the Test Driven Development”

October 12, 2008

An Experimental Evaluation of the Effectiveness and Efficiency of the Test Driven Development: Atul Gupta, Pankaj Jalote.

The Gist: The authors compared test-driven development with traditional development. The programmers were students and the project was a 200 line program. Conclusions: TDD leads to greater efficiency. Writing more test cases correlates with higher code quality. 

The Good: This was a straightforward study and a well-argued paper.

The Bad: I don’t believe the conclusions.

  • The TDD group was not allowed to do any design. At the end, they did not feel confident in the design that emerged. This suggests a threat to the external validity due to the small size of the project.
  • The traditional-methods group performed some manual testing. This suggests a confound due to the difference between manual/automatic testing instead of before/after testing. 
  • Their conclusions are based on ad-hoc definitions of code quality, development effort, and developer productivity. I believe that these measures must be independently validated before they are used in conclusions. For example, I reject their measure of developer productivity which relies on number of lines of code delivered.
The survey of related research highlights seven other studies, of which three found some benefit for TDD, three found no change, and one found an improvement in quality at cost of increased development time.
How I’d change the study: 
  • I’d give the groups equivalent instructions about every aspect of the software process except the testing. Both groups would be instructed to create automated tests only. The only difference would be: “Write tests before code” vs. “Write tests after code.”
  • We expect up to 100:1 differences in productivity between programmers with a non-normal distribution of skill. We don’t really know the underlying distribution. This generates a very strong expected discrepancy between random small groups. This might be one reason it’s hard to find statistically significant effects with group sizes like 8 to 10. I’d boost the sample sizes to increase the chances of finding a good effect.
My (mostly unsubstantiated) hypotheses about TDD:
  • In problem domains where the design is obvious, TDD helps.
  • TDD brings no benefit to beginners, and small benefits with a little practice.
  • TDD hurts exploratory programming.
  • TDD hurts when iterating on interfaces. TDD helps when iterating on implementations.
  • TDD encourages more thorough testing than test-after.
  • TDD creates a small increase in total development time because more tests are written.
%d bloggers like this: