When hashCodes Collide: Overriding hashCode() and equals()

When hashCodes Collide: Overriding hashCode() and equals()

We’ve seen these guys around before (the hashCode and equals methods). We know they are important. In fact, we’ve heard that there is some soft of “contract” we’re supposed to uphold if we override them. So what does it all mean? Let’s take a quick look…

The hashCode() and equals() methods are part of the java.lang.Object class. This means that any class that extends java.lang.Object implements the hashCode() and equals() methods.

The equals(Object o) simply allows us to tests for equality between two objects. We can override the equals() method on our objects but there are some rules that we should keep in mind if we do so. These rules are to remind us what it means for two objects to be equal, and we should preserve the meaning when modifying the default behavior of the equals method.

  • REFLEXIVE: x.equals(x) should always return true – this states that an object should always equal itself
  • SYMMETRIC: if x.equals(y) = true, then y.equals(x) = true – this states that if object1 is equal to object2, then we must also be able to say object2 is equal to object1
  • TRANSITIVE: if x.equals(y) = true and y.equals(z) = true, then x.equals(z) = true
  • CONSISTENT: Multiple invocations of x.equals(y) should always consistently return the same result, for a given object x and y.

The hashCode() method of Object is used anytime we insert the object into a HashMap, HashTable, or HashSet. The hashCode() method effectively generates an index to map objects to.

To over simplify, think of a hashtable’s main data structure as an array; then if we call hashTable.put(key,  value) the HashTable will call key.hashCode() to figure out the index to store the object key in the array.

Just like the contract for equals() that keep us honest, we have a contract for hashCode() as well. But let’s pay careful attention to the wording here because it will help us understand what is really going on. I have truncated a couple of the rules so that we can focus on the meat of it. The truncated rules end with an ellipsis (…), the full rules are available here:

  1. Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer… – Similar to equals(), this states that our hashCode() function should be consistent and always return the same result if the object has not changed.
  2. If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result. – This intuitively makes sense if you think about how the hashCode() maps objects to where they belong in the array. If object are equal they should be stored together.
  3. It is not required that if two objects are unequal according to the equals() method, then calling the hashCode method on each of the two objects must produce distinct integer results… – This is where it gets interesting and I’ll dig in to it below. “…However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.”

If we look again at the somewhat awkwardly worded #3, we see that it says “it is acceptable that two completely different objects produce the same hashcode”. This situation is called a hash collision and is completely legal. How is this possible? How do we find objects in the hashtable if different objects can map to the same location? To illustrate how this works, let’s extend the conceptual model of the hashtable we started above.

I previously over simplified a  hashtable’s main data structure as an array. Let’s continue with this simplification and say that each index of the array stores a Linked List of objects. This means that if we have 3 objects that map to index 5 of the hashtable’s array, then index 5 of the array will have a Linked List with 3 nodes; one for each object mapped to that location.

We now have a hashtable construct in our minds that looks like the following- Notice that we have a hashcode collision and the Cat object mapped to the same location as the Dog objects:

hash_function2

We see that the hash function might map different objects to the same location in a hashtable and store the objects as a Linked List. So how do we find the object again from the hashtable? Simply by traversing the linked list returned from the index at hashCode() and performing equals() on each node until we find the one we are looking for.

Now we see how the hashCode() function and the equals() function are tied together; when we call hashtable.get(object), the hashCode() maps us to where we will find a collection of objects, and then each one is checked with equals() until we find the the object we are looking for.

If we think about it, if hashcode always returned 1, for example, we would still be able to store and retrieve all of our objects; they would just be in one large list. So why do we need hashCode anyway? The short answer is performance.

Let’s take a very quick look at Big-O notation to show the significance of the hashCode() function. Let’s assume that our hashCode() function does a very good job at distributing different objects among the array structure so that the Linked List structures are very small. This hashtable will have an effective search time of O(1). For contrast, let’s assume the worst case scenario where the hashcode maps every single object to a single location in the array so that we are left with 1 long Linked List with all of our objects. The Big-O notation for searching a Linked List is O(n).

We can clearly see from Big-O analysis that we want our hashCode function to map different objects to different locations in the array structure so that we want to maximize performance. The third contract item from hashCode() above makes a lot of sense.

What we covered

We covered what the equals() and hashCode() functions do on Objects. We also covered how they are tied together in the use of a hashtable data structure, and why it is important (but not required) for hashcode() to generate different codes for different objects. I hope this information was useful!

References

java.lang.Object: http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html

Leave a Reply