手把手教你用C语言造轮子：从零实现哈希表（附可直接运行代码）-CSDN博客

本文链接：https://blog.csdn.net/notion2025/article/details/147956822

文章目录

前言：为什么要自己造轮子？

“都2024年了还用C语言写哈希表？直接用现成的库不香吗？” 这是我上周在公司被同事怼的原话（笑）。但作为一个有追求的程序猿，造轮子才是真正的快乐啊！更何况，理解哈希表底层实现能让你：

面试装逼成功率+50%（亲测有效）
排查内存泄漏问题时不再懵逼
真正理解那些高级语言里的字典/Map到底在干嘛

（小声BB：其实我就是被某个开源项目的哈希表坑过才决定自己搞的…）

一、设计蓝图：我们的哈希表长啥样？

先画个草图再写代码是好习惯（虽然我经常偷懒）。我们的迷你哈希表需要：

#define INITIAL_SIZE 16  // 初始容量别太小
#define LOAD_FACTOR 0.75 // 负载因子高了就扩容

typedef struct {
    char* key;
    void* value;
    struct Entry* next; // 链表解决冲突
} Entry;

typedef struct {
    Entry** buckets;    // 桶数组
    int size;           // 当前元素数
    int capacity;       // 总容量
} HashTable;

这里有几个关键点要注意：

链表法处理冲突：Java的HashMap同款方案，简单可靠
void指针存值：灵活支持各种数据类型（但类型安全就…你懂的）
动态扩容机制：避免哈希表变成链表（别笑，真有人这么栽过）

二、核心模块拆解

1. 哈希函数：简单≠没用

unsigned int hash(const char* key, int capacity) {
    unsigned long hash = 5381; // 魔法数字！
    int c;
    
    while ((c = *key++)) {
        hash = ((hash << 5) + hash) + c; // hash * 33 + c
    }
    return hash % capacity;
}

这个DJB2算法虽然简单，但在大多数场景下够用了。不过要注意：

永远别自己写加密哈希！（血的教训）
如果键不是字符串，得另写哈希函数
%运算前可以加个质数容量提升分散性

2. 插入操作：小心这些坑！

void insert(HashTable* table, const char* key, void* value) {
    // 检查扩容
    if ((float)table->size / table->capacity > LOAD_FACTOR) {
        resize(table);
    }
    
    unsigned int index = hash(key, table->capacity);
    Entry* entry = create_entry(key, value);
    
    // 头插法更高效
    entry->next = table->buckets[index];
    table->buckets[index] = entry;
    table->size++;
}

注意这里用头插法而不是尾插法，为什么？因为：

插入O(1)时间复杂度
最新插入的数据可能被更频繁访问（局部性原理）
实现简单不用遍历链表

但！在并发环境下可能有问题（不过咱们这次先不考虑线程安全）

3. 动态扩容：性能关键点

void resize(HashTable* table) {
    int new_capacity = table->capacity * 2;
    Entry** new_buckets = calloc(new_capacity, sizeof(Entry*));
    
    // 重新哈希所有元素
    for (int i = 0; i < table->capacity; i++) {
        Entry* current = table->buckets[i];
        while (current) {
            unsigned int new_index = hash(current->key, new_capacity);
            Entry* next = current->next;
            
            current->next = new_buckets[new_index];
            new_buckets[new_index] = current;
            
            current = next;
        }
    }
    
    free(table->buckets);
    table->buckets = new_buckets;
    table->capacity = new_capacity;
}

扩容时的重新哈希（rehashing）是性能瓶颈所在。这里有个优化技巧：可以在扩容时跳过空桶，不过对于我们的教学代码来说，保持简单更重要。

4. 查找与删除：细节决定成败

查找函数容易写，但要注意内存安全：

void* get(HashTable* table, const char* key) {
    unsigned int index = hash(key, table->capacity);
    Entry* current = table->buckets[index];
    
    while (current) {
        if (strcmp(current->key, key) == 0) {
            return current->value;
        }
        current = current->next;
    }
    return NULL; // 找不到返回NULL
}

删除操作要特别注意指针操作：

int remove_entry(HashTable* table, const char* key) {
    unsigned int index = hash(key, table->capacity);
    Entry* current = table->buckets[index];
    Entry* prev = NULL;
    
    while (current) {
        if (strcmp(current->key, key) == 0) {
            if (prev) {
                prev->next = current->next;
            } else {
                table->buckets[index] = current->next;
            }
            free_entry(current);
            table->size--;
            return 1;
        }
        prev = current;
        current = current->next;
    }
    return 0; // 没找到
}

这里最容易犯的错是忘记处理头节点的情况（prev为NULL时），导致链表断裂。

三、完整代码实现

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// ...（前面提到的结构体定义和函数实现）

int main() {
    HashTable* table = create_table();
    
    int a = 42;
    insert(table, "answer", &a);
    
    char* str = "hello world";
    insert(table, "greeting", str);
    
    int* val = (int*)get(table, "answer");
    if (val) {
        printf("The answer is %d\n", *val); // 输出42
    }
    
    remove_entry(table, "greeting");
    destroy_table(table);
    return 0;
}