Hash Tables

srinath shrestha srinath shrestha

Hash tables are, at their core, straightforward. The idea is this: in an ordinary unsorted array, finding an element requires examining each one in sequence. A hash table eliminates that search by computing, from the value itself, exactly where in the array that value ought to live. You do not search for the element. You calculate its address and go directly there.

The mechanism that does this is called the hash function.


The Hash Function

c
int hash(int key) {
    int index = key % TABLE_SIZE;
    if (index < 0) {
        index += TABLE_SIZE;
    }
    return index;
}

TABLE_SIZE is the length of your array. The function divides the key by that size and returns the remainder. The remainder becomes the index.

An illustration. Suppose you have 101 slots, numbered 0 to 100. A key of 105 divides as follows: 105 ÷ 101 leaves a remainder of 4. The value 105 lives at index 4.

Negative keys require a small correction. In C, -5 % 101 evaluates to -5 rather than 96. Adding TABLE_SIZE wraps the result into the valid range: (-5 % 101) + 101 = 96.

Why 101 Slots and Not 100?

This is where the design becomes interesting. Consider keys that are multiples of 100 — 100, 200, 300, 400 — with a table of exactly 100 slots:

100 % 100 = 0
200 % 100 = 0
300 % 100 = 0

Every key maps to index 0. The remaining 99 slots are never used. The hash table has effectively become a linked list — with the overhead of an array attached to it for no benefit.

With 101 slots — a prime number:

100 % 101 = 100
200 % 101 = 99
300 % 101 = 98

The keys distribute across the table. This is why TABLE_SIZE should always be a prime: prime numbers resist the clustering behaviour that composite numbers permit.


Collisions

Here a natural question arises. If prime numbers distribute keys so evenly, why do collisions occur at all?

Because the number of possible keys is not bounded by your table size. A table of 7 slots has indices 0 through 6. Feed it a hundred keys and some indices will necessarily accommodate more than one:

keys → [10, 20, 30, 35, 36, 37, 40, 50, 70, 80, 90, 99]

Index 0  →  35, 70
Index 1  →  36, 50, 99
Index 2  →  30, 37
Index 3  →  10, 80
Index 4  →  (empty)
Index 5  →  40
Index 6  →  20, 90

Collisions are not a failure of the hash function. They are an arithmetic inevitability when your input set is larger than your table. The prime number minimises them. The linked list at each index handles the remainder.

Structure of the Table

Each index holds a pointer to the head of a linked list. When a collision occurs, the new element is prepended to that list:

index 0  →  Node → Node → NULL
index 1  →  Node → Node → Node → NULL
index 2  →  Node → NULL
index 3  →  NULL

The prime number does its job. The linked list does its job. Both are necessary; neither is redundant.


The Code

c
typedef struct Node {
    int key;
    char value[20];
    struct Node *next;
} Node;

typedef enum {
    INSERT_SUCCESS   =  0,
    INSERT_DUPLICATE =  1,
    INSERT_MEM_ERROR = -1
} InRes;

int hash(int key) {
    int index = key % TABLE_SIZE;
    if (index < 0) {   // handle negative keys
        index += TABLE_SIZE;
    }
    return index;
}

InRes insert(int key, const char *val, Node *arr[]) {
    int index = hash(key);
    Node *curr = arr[index];

    // 1. Check for duplicates
    // We don't want: index -> 7 -> 7 -> 7 -> 7 -> NULL
    while (curr != NULL) {
        if (curr->key == key) {
            return INSERT_DUPLICATE;
        }
        curr = curr->next;
    }

    // 2. Create new node
    Node *new_node = (Node *)malloc(sizeof(Node));
    if (new_node == NULL)
        return INSERT_MEM_ERROR;

    new_node->key = key;
    strncpy(new_node->value, val, 19);  // max 19 chars + null terminator
    new_node->value[19] = '\0';

    // 3. Link it at the front
    new_node->next = arr[index];
    arr[index] = new_node;

    return INSERT_SUCCESS;
}

void printTable(Node *arr[]) {
    for (int i = 0; i < TABLE_SIZE; i++) {
        if (arr[i] == NULL)
            continue;  // skip empty buckets, nothing to see here
        printf("bucket[%d]: ", i);
        Node *curr = arr[i];
        while (curr != NULL) {
            printf("[%d: %s] -> ", curr->key, curr->value);
            curr = curr->next;
        }
        printf("NULL\n");
    }
}

void freeTable(Node *arr[]) {
    for (int i = 0; i < TABLE_SIZE; i++) {
        Node *curr = arr[i];
        while (curr != NULL) {
            Node *temp = curr;
            curr = curr->next;
            free(temp);
        }
    }
}

/*
 * How freeTable traversal works:
 *
 * [ Person A ] -> [ Person B ] -> [ Person C ]
 *      ^    ^
 *     curr temp
 *
 * [ Person A ] -> [ Person B ] -> [ Person C ]
 *      ^                ^
 *     temp            curr   (leader moves ahead, temp gets freed)
 */

void printRes(int res) {
    if (res == INSERT_SUCCESS) {
        printf("Insertion successful\n");
    } else if (res == INSERT_DUPLICATE) {
        printf("Duplicate key\n");
    } else {
        printf("Memory error\n");
    }
}

int main() {
    Node *hashTable[TABLE_SIZE] = {NULL};

    insert(234, "Srinath", hashTable);
    insert(65,  "Alice",   hashTable);
    insert(987, "Bob",     hashTable);

    printTable(hashTable);

    // Searching also gives you access to the value
    int searchKey = 65;
    Node *curr = hashTable[hash(searchKey)];
    while (curr) {
        if (curr->key == searchKey) {
            printf("Found: %s\n", curr->value);
            break;
        }
        curr = curr->next;
    }

    freeTable(hashTable);
    return 0;
}

That is a hash table. An array that knows where things live before you ask.


One Limitation Worth Noting

Everything above assumes integer keys. The modulo operation requires a number, and key % TABLE_SIZE works precisely for that. String keys — a char array — cannot be fed to % directly. They require a function that converts the string to an integer first, then applies the modulo. Simply summing ASCII values works in principle but distributes poorly. The standard approach for string keys is the djb2 algorithm, which we shall examine separately.