Data Structures - Hash Tables
Hash tables are containers of key and value pairs. Each key is unique and is mapped to some value. Hash Tables use a hash function to compute an index, hash code, and store that pair into a slot of some reserved part in memory. Hash tables are widely used because of their time complexity, lookup, search and delete all take O(1). One problem with hash tables is a collision. This happens when a hash function is imperfect and computes the same hash code for two different keys. Hash collisions are handled by chaining, using a linked list to keep the record of all the values that got the same hash code generated.
Difference from maps/dictionaries
Although they function pretty much the same way there are some differences between an implementation of a hash table and dictionary/ordered and unordered maps.
Dictionary or map is a generic data type, they are not and implementation, hash tables are used to implement dictionaries/maps. There is one other method that we’ll touch on later in tree section an that’s red black trees.
Function is the same, they store key-value pairs, difference can be that hash tables are always unordered as a result of hash function but dictionaries/maps are sometimes ordered.
Implementing a hash table
Let’s go over what functions one hash tables needs. Here is the checklist:
- Initialize the table size or bucket size.
- Create a helper class HashElement to store key value pairs
- Create a HashCode() function, this needs to be fast since every time we search, add or remove and item we need to calculate a hash, our hash function will just be key modulo bucket size
- Create a function Add() to insert element at a key
- Create a function Remove() to remove element at a key
- Choose a method to handle collisions, we’ll use chaining, I’ll cover other handling methods later so you can try and implement one by yourself
#include <list>
using namespace std;
class HashElement
{
public:
int key, value;
HashElement(int key, int value) : key(key), value(value){}
};
class HashTable
{
int BUCKET_SIZE;
list<HashElement*> *bucketArray;
public:
HashTable(int size)
{
BUCKET_SIZE = size;
bucketArray = new list<HashElement*>[BUCKET_SIZE];
}
//Computing the index in the bucket array
int HashCode(int key)
{
return key % BUCKET_SIZE;
}
//Inserting elements
void Add(int key, int value)
{
int index = HashCode(key);
HashElement* item = new HashElement(key, value);
bucketArray[index].push_back(item);
}
//Removing elements
void Remove(int key)
{
int index = HashCode(key);
//Find key in the bucket
list<HashElement*> :: iterator i;
for(i = bucketArray[index].begin(); i != bucketArray[index].end(); i++)
{
if( (**i).key == key)
{
break;
}
}
//If the key was found delete the element
if( i != bucketArray[index].end() )
{
bucketArray[index].erase(i);
}
}
};
Other Collision Handling Methods
These two methods are what is called open addressing techniques. This means that every slot in the reserved bucket size is occupied by only one element, as oppose to what we used where duplicate elements would be stored in a list on the same index.
Linear probing: technique works on the concept of keep incrementing until you find an empty slot.
Double hashing technique: In this technique we use two hashing functions h1(k) and h2(k). If the slot at h1(k) is occupied then the second hashing function h2(k) used to increment the index and find an empty slot, it’s a more complicated linear probing.