HashTable
Hashtable is an implementation of a key-value pair data structure in java. You can store and retrieve a ‘value’ using a ‘key’ and it is an identifier of the value stored. It is obvious that the ‘key’ should be unique.java.util.Hashtable extends Dictionary and implements Map. Objects with non-null value can be used as a key or value. Key of the Hashtable must implement hashcode() and equals() methods.
Hashing and Hashtable
Before
seeing java’s Hashtable in detail you should understand hashing in
general. Assume that, v is a value to be stored and k is the key used
for storage / retrieval, then h is a hash function where v is stored at
h(k) of table. To retrieve a value compute h(k) so that you can directly
get the position of v. So in a key-value pair table, you need not
sequentially scan through the keys to identify a value.
h(k)
is the hashing function and it is used to find the location to store
the corresponding value v. h(k) cannot compute to a indefinite space.
Storage allocated for a Hashtable is limited within a program. So, the
hasing function h(k) should return a number within that allocated
spectrum(logical address space).
Collision in Hashtable
When
we try to restrict the hashing function’s output within the allocated
address spectrue limit, there is a possibility of a collision. For two
different keys k1 and k2, if we have h(k1) = h(k2), then this is called
collision in hashtable. What does this mean, our hashing function
directs us store two different values (keys are also different) in the
same location.
When
we have a collision, there are multiple methodologies available to
resolve it. To name a few hashtable collision resolution technique,
‘separate chaining’, ‘open addressing’, ‘robin hood hashing’, ‘cuckoo
hashing’, etc. Java’s hashtable uses ‘separate chaining’ for collision
resolution in Hashtable.
Collision Resolution in java’s Hashtable
Java
uses separate chaining for collision resolution. Recall a point that
Hashtable stores elements in buckets. In separate chaining, every bucket
will store a reference to a linked list. Now assume that you have
stored an element in bucket 1. That means, in bucket 1 you will have a
reference to a linked list and in that linked list you will have two
cells. In those two cells you will have key and its corresponding value.
Why
do you want to store the key? Because when there is a collision i.e.,
when two keys results in same hashcode and directs to the same bucket
(assume bucket 1) you want to store the second element also in the same
bucket. You add this second element to the already created linked list
as the adjacent element.
Now
when you retrieve a value it will compute the hash code and direct you
to a bucket which has two elements. You scan those two elements alone
sequentially and compare the keys using their equals() method. When the
key mathches you get the respective value. Hope you have got the reason
behind the condition that your object must have hashCode() and equals()
method.
Java
has a private static class Entry inside Hashtable. It is an
implementation of a list and you can see there, it stores both the key
and value.
for more clarity open http://research.cs.vt.edu/AVresearch/hashing/buckethash.php
Hashtable performance
To get better performance from your java Hashtable, you need to
1) use the initialCapacity and loadFactor arguments
2) use them wisely
1) use the initialCapacity and loadFactor arguments
2) use them wisely
while instantiating a Hashtable.
initialCapacitiy
is the number of buckets to be created at the time of Hashtable
instantiation. The number of buckets and probability of collision is
inversly proportional. If you have more number of buckets than needed
then you have lesser possibility for a collision.
For
example, if you are going to store 10 elements and if you are going to
have initialCapacity as 100 then you will have 100 buckets. You are
going to calculate hashCoe() only 10 times with a spectrum of 100
buckets. The possibility of a collision is very very less.
But
if you are going to supply initialCapacity for the Hashtable as 10,
then the possibility of collision is very large. loadFactor decides when
to automatically increase the size of the Hashtable. The default size
of initialCapacity is 11 and loadFactor is .75 That if the Hashtable is
3/4 th full then the size of the Hashtable is increased.
New capacity in java Hashtable is calculated as follows:
int newCapacity = oldCapacity * 2 + 1 ; |
If
you give a lesser capacity and loadfactor and often it does the
rehash() which will cause you performance issues. Therefore for
efficient performance for Hashtable in java, give initialCapacity as 25%
extra than you need and loadFactor as 0.75 when you instantiate.
No comments:
Post a Comment