Data Structures - Hash Tables

Hash tables are containers of key and value pairs. Each key is unique and is mapped to some value. Hash Tables use a hash function to compute an index, hash code, and store that pair into a slot of some reserved part in memory. Hash tables are widely used because of their time complexity, lookup, search and delete all take O(1). One problem with hash tables is a collision. This happens when a hash function is imperfect and computes the same hash code for two different keys. Hash collisions are handled by chaining, using a linked list to keep the record of all the values that got the same hash code generated.

Difference from maps/dictionaries

Although they function pretty much the same way there are some differences between an implementation of a hash table and dictionary/ordered and unordered maps.

Dictionary or map is a generic data type, they are not and implementation, hash tables are used to implement dictionaries/maps. There is one other method that we’ll touch on later in tree section an that’s red black trees.

Function is the same, they store key-value pairs, difference can be that hash tables are always unordered as a result of hash function but dictionaries/maps are sometimes ordered.

Implementing a hash table

Let’s go over what functions one hash tables needs. Here is the checklist:

  • Initialize the table size or bucket size.
  • Create a helper class HashElement to store key value pairs
  • Create a HashCode() function, this needs to be fast since every time we search, add or remove and item we need to calculate a hash, our hash function will just be key modulo bucket size
  • Create a function Add() to insert element at a key
  • Create a function Remove() to remove element at a key
  • Choose a method to handle collisions, we’ll use chaining, I’ll cover other handling methods later so you can try and implement one by yourself
I’ll be implementing this in C++ but other languages will be on my GitHub repo linked here.
#include <list>

using namespace std;

class HashElement
{
public:
    int key, value;
    HashElement(int key, int value) : key(key), value(value){}
};


class HashTable
{
    int BUCKET_SIZE;

    list<HashElement*> *bucketArray;
public:
    HashTable(int size)
    {
        BUCKET_SIZE = size;
        bucketArray = new list<HashElement*>[BUCKET_SIZE];
    }
		//Computing the index in the bucket array
    int HashCode(int key)
    {
        return key % BUCKET_SIZE;
    }
		//Inserting elements
    void Add(int key, int value)
    {
        int index = HashCode(key);
        HashElement* item = new HashElement(key, value);
        bucketArray[index].push_back(item);
    }
		//Removing elements
    void Remove(int key)
    {
        int index = HashCode(key);
				//Find key in the bucket
        list<HashElement*> :: iterator i;
        for(i = bucketArray[index].begin(); i != bucketArray[index].end(); i++)
        {
            if( (**i).key == key)
            {
                break;
            }
        }
				//If the key was found delete the element
        if( i != bucketArray[index].end() )
        {
            bucketArray[index].erase(i);
        }
    }

};

Other Collision Handling Methods

These two methods are what is called open addressing techniques. This means that every slot in the reserved bucket size is occupied by only one element, as oppose to what we used where duplicate elements would be stored in a list on the same index.

Linear probing: technique works on the concept of keep incrementing until you find an empty slot.

Double hashing technique: In this technique we use two hashing functions h1(k) and h2(k). If the slot at h1(k) is occupied then the second hashing function h2(k) used to increment the index and find an empty slot, it’s a more complicated linear probing.