缓存策略

问题引入

上一节我给所有天气数据都设置了 1 小时的缓存时间。

上线运行一段时间后，我收到了用户的反馈：

用户 A："为什么你们的数据总是比气象局慢？"
用户 B："现在外面在下雨，你们 API 还显示晴天"
用户 C："能不能实时更新天气数据？"

我意识到：1 小时的缓存时间太长了。

但是，如果缩短缓存时间：

外部 API 调用量会增加
响应时间会变慢

如何平衡性能和数据新鲜度？这成了我接下来要解决的问题。

数据分析

我研究了不同天气数据的变化规律：

当前温度

变化频率：实时变化
变化幅度：每分钟变化 0-1 度
用户期望：尽可能实时

天气状况

变化频率：几个小时变化一次
变化幅度：从晴天→阴天→雨天
用户期望：1 小时内更新即可

城市信息

变化频率：几乎不变
变化幅度：城市名、经纬度不变
用户期望：永久缓存

湿度数据

变化频率：几十分钟变化一次
变化幅度：变化 5-10%
用户期望：30 分钟内更新

分析完后，我明白了：不同数据的变化频率不同，不能用一个缓存时间对付所有情况。

优化策略

根据不同数据的变化频率，我设计了不同的缓存策略：

策略 1：分层缓存

@app.route('/api/weather')
def get_weather():
    city = request.args.get('city')

    # 1. 检查完整缓存（所有数据）
    full_cache_key = f'weather:{city}:full'
    full_cached = cache.get(full_cache_key)
    if full_cached:
        return jsonify(json.loads(full_cached))

    # 2. 分别检查不同数据的缓存
    current_cache_key = f'weather:{city}:current'
    condition_cache_key = f'weather:{city}:condition'
    city_cache_key = f'weather:{city}:info'

    current_data = cache.get(current_cache_key)
    condition_data = cache.get(condition_cache_key)
    city_data = cache.get(city_cache_key)

    # 3. 根据缓存情况，决定调用哪些外部 API
    if not current_data:
        # 调用实时温度 API
        current_data = fetch_current_temperature(city)
        cache.setex(current_cache_key, 300, json.dumps(current_data))  # 5 分钟

    if not condition_data:
        # 调用天气状况 API
        condition_data = fetch_weather_condition(city)
        cache.setex(condition_cache_key, 3600, json.dumps(condition_data))  # 1 小时

    if not city_data:
        # 调用城市信息 API
        city_data = fetch_city_info(city)
        cache.setex(city_cache_key, 86400, json.dumps(city_data))  # 24 小时

    # 4. 合并数据
    result = {
        **city_data,
        **condition_data,
        **current_data
    }

    # 5. 缓存完整结果（较短时间）
    cache.setex(full_cache_key, 600, json.dumps(result))  # 10 分钟

    return jsonify(result)

缓存时间设置

数据类型	缓存时间	理由
实时温度	5 分钟	温度变化较快，用户期望实时
天气状况	1 小时	天气状况变化较慢
湿度数据	30 分钟	湿度变化中等速度
城市信息	24 小时	城市信息几乎不变
完整数据	10 分钟	平衡性能和新鲜度

策略 2：主动更新

有些关键数据，我不能等缓存过期才更新。

定时更新任务

import threading
import time

def update_hot_cities():
    """定时更新热门城市的数据"""
    hot_cities = ['北京', '上海', '深圳', '广州', '杭州']

    while True:
        for city in hot_cities:
            # 主动调用 API 更新缓存
            try:
                data = fetch_weather_data(city)
                cache.setex(f'weather:{city}:full', 3600, json.dumps(data))
                logging.info(f'Updated {city}')
            except Exception as e:
                logging.error(f'Failed to update {city}: {e}')

        # 每 10 分钟更新一次
        time.sleep(600)

# 启动后台线程
update_thread = threading.Thread(target=update_hot_cities, daemon=True)
update_thread.start()

优势：

用户访问时，缓存已经是新的
热门城市数据始终保持新鲜

劣势：

增加了外部 API 调用量
需要维护热门城市列表

这个方案让我想起了”预加载”的概念——在用户需要之前，先把数据准备好。

策略 3：用户触发更新

允许用户主动刷新数据：

@app.route('/api/weather')
def get_weather():
    city = request.args.get('city')
    force_refresh = request.args.get('refresh', 'false').lower() == 'true'

    cache_key = f'weather:{city}:full'

    # 如果用户强制刷新
    if force_refresh:
        cache.delete(cache_key)

    # 检查缓存
    cached_data = cache.get(cache_key)
    if cached_data and not force_refresh:
        return jsonify(json.loads(cached_data))

    # 调用 API 获取新数据
    data = fetch_weather_data(city)
    cache.setex(cache_key, 3600, json.dumps(data))

    return jsonify(data)

使用方式：

# 普通请求（使用缓存）
GET /api/weather?city=北京

# 强制刷新（跳过缓存）
GET /api/weather?city=北京&refresh=true

这个方案把选择权交给了用户——对实时性要求高的场景，他们可以主动刷新。

最终方案

综合以上策略，我设计了这样的方案：

1. 分层缓存

根据数据变化频率设置不同的缓存时间
减少不必要的外部 API 调用

2. 定时更新

对热门城市（Top 10）每 10 分钟主动更新
保证高频访问城市的数据新鲜度

3. 用户可选

提供 refresh 参数，让用户可以选择强制刷新
满足对实时性要求高的场景

效果对比

优化前

指标	数值
平均响应时间	2000ms
外部 API 调用量	10 万次/天
数据新鲜度	最多延迟 1 小时

优化后（简单缓存）

指标	数值
平均响应时间	35ms
外部 API 调用量	5000 次/天
数据新鲜度	最多延迟 1 小时

优化后（分层缓存）

指标	数值
平均响应时间	40ms
外部 API 调用量	8000 次/天
数据延迟：
- 温度数据	最多 5 分钟
- 天气状况	最多 1 小时
- 热门城市	最多 10 分钟

权衡思考

分层缓存 vs 简单缓存：

维度	简单缓存	分层缓存
响应时间	35ms	40ms
API 调用量	5000 次/天	8000 次/天
数据新鲜度	1 小时	5-60 分钟
代码复杂度	简单	中等

我的结论： 对于天气数据，用户对数据新鲜度的要求高于响应时间，分层缓存是更好的选择。

虽然代码复杂度增加了一些，但用户体验的提升是值得的。

上一章节缓存雪崩大面积缓存失效，外部 API 瞬间被压垮