Notes on “Semantic Hashing”

October 13, 2008

Semantic Hashing: Ruslan Salakhutdinov, Geoffrey Hinton

I read this paper at George’s suggestion.

The gist: The authors present a way to create semantic hash codes for documents. The system outputs binary strings for documents such that similar documents result in strings with a low Hamming distance. They accomplish this by using a two step procedure for training a stack of Restricted Boltzmann Machines.

I like the promise of this paper, though I’m not convinced I could reproduce its results given only the information in the paper. Perhaps this is deliberate on the part of the authors, or perhaps it’s because I’m not a machine learning expert. I’m certainly tempted to try to build such a system and sell it to corporate intranet search companies.

I love that they trounce LSA. Once I wrote a rather long paper about LSA. I was rather taken with it at the time.

I am beginning to believe that dropping Hinton’s course was a bad judgement call. Good suggestion, George!


2 Responses to “Notes on “Semantic Hashing””

  1. George Says:

    The power of machine learning compels you! Glad you enjoyed the paper.

    As Geoff promised, the lectures have kept getting better.

  2. Lucian Says:

    “dropping Hinton’s course” – which course are you talking about? Right now a Coursera course is in progress, but certainly you are not talking about it. Which else, then?

Comments are closed.

%d bloggers like this: