HashMaps and Hash-based collections
Internal Implementation
Implementation of hash, the structure and rehashing is similar in all hash-based collections.
class HashMap <K,V> implements Map<K,V>{
Entry <K,V> {} //Inner class
Entry next; //reference to another entry to store entries like linked lists
int hash; //Hashvalue of key to avoid calculating everytime its needed
}
- Initially, it is stored as an array of initial capacity (0-15 if initial cap is 16).
- Once the array is filled more than 75% (load factor - 0.75), the table size doubles to 32. The hashtable is rehashed (Internal data structures are rebuilt)
- The hashcode of the key object is fetched from
hashcode()
method which returns anint
hashcode. This is converted to array index. - When a collision occurs, the key is stored along with value in the linked list (Hashtable#Chaining). While retrieving, the linked list is traversed performing a comparison through
equals()
method. This is usually a deep comparison, but you can override and make it shallow as well. - Java 8 improvement: When the number of nodes in a HashMap reaches the
TREEIFY_THRESHOLD
(constant), the nodes are converted to TreeNode. Basically, from linked list to a balanced tree. This improves the worst case performance from to . After the HashMap becomes small due to removal or resizing, TreeNodes are converted back to nodes. Since, HashSet also uses HashMap - it also benefits from this.TREEIFY_THRESHOLD
= 8 in Java 8 -> not configurable
- HashMap Bucket Resizing Overhead: You should create HashMap with initial capacity close to your expected volume. If its too small, and the load is in millions, whenever the bucket will reach its load factor it will resize and rehash → which is a very costly operation. Also, make sure to use full capacity else a lot of memory of unused blocks will go wasted.
Other Hash-based Collections
HashSet
HashSet
is a wrapper on top ofHashMap
which stores its values in thekey
s (so as to avoid duplicates) and stores some vague value for thevalue
part.