Hashtable

Hashtables store key-value pairs for efficient insertion, retrieval and deletion using Hashing.

Implementation

The values are stored in array buckets, so we need to be able to convert the hash to an array index. So when we get a key, it is converted to hash which is then converted to an index - once we have the index it can be retrieved or inserted in constant time.

Conversion of Hash code to Array Index

Conversion to integer: The hash code which is often a hexadecimal string is converted to an integer first. This can be done using:
- Base conversion: Convert hexadecimal or binary representation to integer
- String to integer conversion
Modulo operation: Since the code has to be converted into an array index, it needs to be scaled down to a valid index if its out of bounds. This is done via modulo operations:
$index = hash\_integer\; \%\; array\_size$

Once you have the index, you can simply use it to insert or retrieve the element.

Collision Handling

There could be and almost always would be situations, where you end up at the same index via different keys or hash codes. It is inevitable if there are more keys than the array size.

Here are the two most common ways to handle it:

Chaining

Each bucket in array holds a linked list. The elements are simply chained in case of collisions. The key values are also stored in order to retrieve the element later.

In worst case, (with a bad hash function) all elements could collide and we would end up with a LinkedList to traverse, and retrieval would take O(n) time.

Separate chaining with Balanced Trees

Instead of a linked list - balanced trees can be used to optimize the retrieval further. This would offer a worst case complexity of O(log n).

Open Addressing

All elements are stored in array itself, collisions are resolved by finding the next available slot in the array. There are different ways to search for the next slot:

Linear Probing: Algorithm linearly checks for next slot ( $i+1$ , $i+2$ etc) wrapping around to start if required.
Quadratic Probing: Algorithm quadraticaly checks for next slot ( $i+1^2$ , $i+2^2$ etc).
Double Hashing:Performs a second hash, to determine the interval - unique interval for each key - better distribution.