Notes on “Software Libraries and their Reuse: Entropy, Kolmogorov Complexity, and Zipf’s Law”

October 11, 2008

Software Libraries and Their Reuse: Entropy, Kolmogorov Complexity, and Zipf’s Law: Todd L. Veldhuizen

During the last meeting of Greg’s reading group, I mentioned the idea of measuring the information content of a program as a language-independent way of measuring program size. All the senior members rolled their eyes and started talking about Kolmogorov complexity and fundamental limits of program compression. Greg added Veldhuizen’s paper to the reading list for the next meeting.

The gist:

  • Reusable libraries allow us to compress programs by allowing us to refer to a library rather than write code from scratch. There is a fundamental limit to the savings we can get from libraries, however, because we need an increasing amount of code to uniquely identify the library to use as the number of libraries grows. 
  • We will never be able to construct programs for complex domains by gluing together components. 
  • There will always be room for new libraries.
  • Narrow domains can have lots of reuse. Wide domains will never have lots of reuse.
  • Libraries evolve toward maximum reusability. Programs evolve toward maximum reuse.
It’s interesting to find a theoretical limit on the reusability of code, but I think that our industry’s problems reusing software have little to do with it. Before we get there, we need to solve problems like cross-platform interoperability, API learnability and semantic code search.
This paper has very little with what I was trying to say about the information content of programs, but I can understand why everybody went there since so many of the key words are similar. I’ll have to flesh out the original idea better and present it more clearly next time.
%d bloggers like this: