Analytics

Sunday, June 23, 2013

HashCode "Randomness"

The basis of this post is a very confusing bug I ran into recently involving what appeared to be nondeterministic behavior in a program that has no explicit randomness. Consider the following example:


It looks innocent enough, but when you run it there is a very good chance it will not print out the same number all three times. Set data structures generally do not give you any guarantees as to the ordering of the elements, but Scala's HashSet also does not have any randomness, so the question remains as to why the head of the set would be different each time. Java veterans should be cringing by now at the fact that we are putting an object that does not override the equals() and hashCode() methods into a hash data structure, one of the basic rules of Java collections. And that is indeed the reason for the "nondeterministic" behavior we observe; the JVM's default hash code implementation can return basically anything (but tries to return unique numbers), so depending on the values of the hash codes for each instance of Foo that we create, the set may (and does) order them differently.

As it does with many confusing things in Java, Scala has ways of helping you out here. In the above code example, if we turn Foo into a case class, the same value will be printed every time. This is because case classes automatically have equals() and hashCode() implemented so you do not have to worry about doing it yourself, which is a good argument for making simple value wrapper classes always be case classes (not to mention the nice pattern matching functionality you get). For reference, the automatically generated hash code for case classes can be found here; it is an implementation of the MurmurHash algorithm using the hash codes for each of the fields in the class. So the hash code of the Foo case class is stable across different instances of the JVM as long as Scala does not change the implementation (which they could very well do).

In the end, this turned out to be just a lesson in Java basics brought to light by some old, convoluted code that I came across. It does make me wonder whether the Java and/or Scala compilers could emit a warning when they detect objects being put into a hash data structure without overriding equals() and hashCode() because I am sure that would save a lot of headaches. There is definitely a limit to how much can be done through static analysis, but perhaps a combination of that and runtime checks could do the trick.

No comments:

Post a Comment