哈希表

哈希表（Hash Table）是一种高效的数据结构，能够在平均 O(1) 时间内完成查找、插入和删除操作。掌握哈希表的原理和应用对于优化算法性能至关重要。

哈希表基础

哈希表的特点

快速查找：平均 O(1) 时间查找
动态大小：可以动态添加和删除元素
键值对存储：通过键快速访问值

哈希表的原理

哈希表通过哈希函数将键映射到数组索引，从而实现快速访问。

键 (Key) → 哈希函数 (Hash Function) → 索引 (Index) → 值 (Value)

哈希函数

好的哈希函数的特性

确定性：相同的输入总是产生相同的输出
均匀分布：尽可能均匀地分布到所有可能的索引
快速计算：计算速度快
雪崩效应：输入的微小变化导致输出的巨大变化

常见哈希函数

1. 除留余数法

def hash_function(key, table_size):
    return key % table_size

2. 乘法哈希

def hash_function(key, table_size):
    A = 0.6180339887  # (sqrt(5) - 1) / 2
    return int(table_size * ((key * A) % 1))

3. 字符串哈希

def hash_string(s, table_size):
    hash_value = 0
    for char in s:
        hash_value = (hash_value * 31 + ord(char)) % table_size
    return hash_value

冲突处理

当两个不同的键映射到同一个索引时，就会发生冲突。常见的冲突处理方法有：

1. 链地址法（Chaining）

使用链表存储冲突的元素。

class HashTable:
    def __init__(self, size=10):
        self.size = size
        self.table = [[] for _ in range(size)]
    
    def hash_function(self, key):
        return key % self.size
    
    def insert(self, key, value):
        index = self.hash_function(key)
        for pair in self.table[index]:
            if pair[0] == key:
                pair[1] = value
                return
        self.table[index].append([key, value])
    
    def get(self, key):
        index = self.hash_function(key)
        for pair in self.table[index]:
            if pair[0] == key:
                return pair[1]
        return None
    
    def delete(self, key):
        index = self.hash_function(key)
        for i, pair in enumerate(self.table[index]):
            if pair[0] == key:
                del self.table[index][i]
                return True
        return False

2. 开放寻址法（Open Addressing）

当发生冲突时，寻找下一个可用的位置。

线性探测（Linear Probing）

class HashTable:
    def __init__(self, size=10):
        self.size = size
        self.table = [None] * size
        self.keys = [None] * size
    
    def hash_function(self, key):
        return key % self.size
    
    def insert(self, key, value):
        index = self.hash_function(key)
        while self.keys[index] is not None:
            if self.keys[index] == key:
                self.table[index] = value
                return
            index = (index + 1) % self.size
        self.keys[index] = key
        self.table[index] = value
    
    def get(self, key):
        index = self.hash_function(key)
        while self.keys[index] is not None:
            if self.keys[index] == key:
                return self.table[index]
            index = (index + 1) % self.size
        return None

二次探测（Quadratic Probing）

def insert(self, key, value):
    index = self.hash_function(key)
    i = 0
    while self.keys[index] is not None:
        if self.keys[index] == key:
            self.table[index] = value
            return
        i += 1
        index = (self.hash_function(key) + i * i) % self.size
    self.keys[index] = key
    self.table[index] = value

Python 中的字典

Python 的 dict 类型就是哈希表的实现：

# 创建字典
my_dict = {}

# 插入/更新：O(1) 平均
my_dict['key'] = 'value'

# 查找：O(1) 平均
value = my_dict.get('key')

# 删除：O(1) 平均
del my_dict['key']

# 遍历：O(n)
for key, value in my_dict.items():
    print(key, value)

哈希表的应用

1. 快速查找

# 两数之和
def two_sum(nums, target):
    hash_map = {}
    for i, num in enumerate(nums):
        complement = target - num
        if complement in hash_map:
            return [hash_map[complement], i]
        hash_map[num] = i
    return []

2. 频率统计

# 统计字符频率
def char_frequency(s):
    freq = {}
    for char in s:
        freq[char] = freq.get(char, 0) + 1
    return freq

3. 去重

# 数组去重
def remove_duplicates(nums):
    seen = set()
    result = []
    for num in nums:
        if num not in seen:
            seen.add(num)
            result.append(num)
    return result

4. 缓存（LRU Cache）

from collections import OrderedDict

class LRUCache:
    def __init__(self, capacity):
        self.capacity = capacity
        self.cache = OrderedDict()
    
    def get(self, key):
        if key not in self.cache:
            return -1
        self.cache.move_to_end(key)
        return self.cache[key]
    
    def put(self, key, value):
        if key in self.cache:
            self.cache.move_to_end(key)
        self.cache[key] = value
        if len(self.cache) > self.capacity:
            self.cache.popitem(last=False)

常见题目

1. 两数之和

def two_sum(nums, target):
    hash_map = {}
    for i, num in enumerate(nums):
        complement = target - num
        if complement in hash_map:
            return [hash_map[complement], i]
        hash_map[num] = i
    return []

2. 最长连续序列

def longest_consecutive(nums):
    num_set = set(nums)
    max_length = 0
    
    for num in num_set:
        if num - 1 not in num_set:
            current_num = num
            current_length = 1
            
            while current_num + 1 in num_set:
                current_num += 1
                current_length += 1
            
            max_length = max(max_length, current_length)
    
    return max_length

3. 字母异位词分组

def group_anagrams(strs):
    groups = {}
    for s in strs:
        key = ''.join(sorted(s))
        if key not in groups:
            groups[key] = []
        groups[key].append(s)
    return list(groups.values())

性能分析

时间复杂度

平均情况：O(1) 查找、插入、删除
最坏情况：O(n) 所有元素冲突

空间复杂度

O(n) 存储 n 个元素

优化建议

选择合适的哈希函数：减少冲突
动态调整大小：当负载因子过高时扩容
使用链地址法：简单且有效
考虑使用集合（Set）：如果只需要键不需要值

总结

哈希表是算法优化的重要工具：

原理：通过哈希函数将键映射到索引
冲突处理：链地址法、开放寻址法
应用：快速查找、频率统计、去重、缓存
性能：平均 O(1) 时间，空间 O(n)

掌握哈希表的使用能够显著提升算法效率。

接下来，让我们学习图，这是最复杂的非线性数据结构之一。

上一章节树、字典树和堆深入理解树结构、二叉搜索树、字典树和堆的实现与应用下一章节图深入理解图的表示方法、遍历算法和常见图算法