java O java: HashTable usage and Collision in it.

HashTable

Hashtable is an implementation of a key-value pair data structure in java. You can store and retrieve a ‘value’ using a ‘key’ and it is an identifier of the value stored. It is obvious that the ‘key’ should be unique.

java.util.Hashtable extends Dictionary and implements Map. Objects with non-null value can be used as a key or value. Key of the Hashtable must implement hashcode() and equals() methods.

Hashing and Hashtable

Before seeing java’s Hashtable in detail you should understand hashing in general. Assume that, v is a value to be stored and k is the key used for storage / retrieval, then h is a hash function where v is stored at h(k) of table. To retrieve a value compute h(k) so that you can directly get the position of v. So in a key-value pair table, you need not sequentially scan through the keys to identify a value.

h(k) is the hashing function and it is used to find the location to store the corresponding value v. h(k) cannot compute to a indefinite space. Storage allocated for a Hashtable is limited within a program. So, the hasing function h(k) should return a number within that allocated spectrum(logical address space).

Collision in Hashtable

When we try to restrict the hashing function’s output within the allocated address spectrue limit, there is a possibility of a collision. For two different keys k1 and k2, if we have h(k1) = h(k2), then this is called collision in hashtable. What does this mean, our hashing function directs us store two different values (keys are also different) in the same location.

When we have a collision, there are multiple methodologies available to resolve it. To name a few hashtable collision resolution technique, ‘separate chaining’, ‘open addressing’, ‘robin hood hashing’, ‘cuckoo hashing’, etc. Java’s hashtable uses ‘separate chaining’ for collision resolution in Hashtable.

Collision Resolution in java’s Hashtable

Java uses separate chaining for collision resolution. Recall a point that Hashtable stores elements in buckets. In separate chaining, every bucket will store a reference to a linked list. Now assume that you have stored an element in bucket 1. That means, in bucket 1 you will have a reference to a linked list and in that linked list you will have two cells. In those two cells you will have key and its corresponding value. Hashtable Collision

Why do you want to store the key? Because when there is a collision i.e., when two keys results in same hashcode and directs to the same bucket (assume bucket 1) you want to store the second element also in the same bucket. You add this second element to the already created linked list as the adjacent element.

Now when you retrieve a value it will compute the hash code and direct you to a bucket which has two elements. You scan those two elements alone sequentially and compare the keys using their equals() method. When the key mathches you get the respective value. Hope you have got the reason behind the condition that your object must have hashCode() and equals() method.

Java has a private static class Entry inside Hashtable. It is an implementation of a list and you can see there, it stores both the key and value.

for more clarity open http://research.cs.vt.edu/AVresearch/hashing/buckethash.php

Hashtable performance

To get better performance from your java Hashtable, you need to
1) use the initialCapacity and loadFactor arguments
2) use them wisely

while instantiating a Hashtable.

initialCapacitiy is the number of buckets to be created at the time of Hashtable instantiation. The number of buckets and probability of collision is inversly proportional. If you have more number of buckets than needed then you have lesser possibility for a collision.

For example, if you are going to store 10 elements and if you are going to have initialCapacity as 100 then you will have 100 buckets. You are going to calculate hashCoe() only 10 times with a spectrum of 100 buckets. The possibility of a collision is very very less.

But if you are going to supply initialCapacity for the Hashtable as 10, then the possibility of collision is very large. loadFactor decides when to automatically increase the size of the Hashtable. The default size of initialCapacity is 11 and loadFactor is .75 That if the Hashtable is 3/4 th full then the size of the Hashtable is increased.

New capacity in java Hashtable is calculated as follows:

int newCapacity = oldCapacity * 2 + 1;

If you give a lesser capacity and loadfactor and often it does the rehash() which will cause you performance issues. Therefore for efficient performance for Hashtable in java, give initialCapacity as 25% extra than you need and loadFactor as 0.75 when you instantiate.

source from http://javapapers.com/core-java/java-hashtable/#comment-37896

java O java

Thursday, September 5, 2013

HashTable usage and Collision in it.

HashTable

Hashing and Hashtable

Collision in Hashtable

Collision Resolution in java’s Hashtable

Hashtable performance

No comments:

Post a Comment

About Me

Translate