缓存策略

核对日期：2026-05-09。

1. 定义与边界

缓存策略是在 Agent 系统中复用可复用结果，以降低延迟、成本和外部依赖压力的设计。可缓存对象包括检索结果、工具查询结果、Prompt 编译结果、模型响应、嵌入向量和文件解析结果。

缓存不是记忆，也不是事实来源。缓存命中只能说明“过去某个输入得到过某个结果”，不能说明该结果仍然正确。

2. 为什么重要

Agent 系统的成本和延迟常被重复上下文、重复检索、重复工具查询放大。合理缓存可以减少 token、降低 API 调用量、保护下游系统。但错误缓存会带来陈旧事实、权限泄漏和安全绕过。

3. 核心机制

缓存键必须包含影响输出的关键因素：

cache_key = hash(
  tenant_id,
  user_permission_scope,
  operation,
  normalized_input,
  prompt_version,
  model_id,
  tool_schema_version,
  data_version
)

缓存层级：

4. 架构模式

模式	适用场景	不适用场景
Cache-aside	工具查询、检索结果	写入频繁且一致性要求高。
Write-through	配置、低频写高频读	写路径延迟敏感。
TTL 缓存	价格、库存、政策等会变化数据	不能接受短暂陈旧的交易动作。
Semantic cache	相似问题复用答案	高风险、强事实、权限敏感场景。
Prompt/model cache	重复长上下文	Prompt 或权限经常变化。

5. 工程实现

缓存读取包装：

def cached_tool_call(ctx, tool_name, payload):
    key = build_cache_key(ctx.tenant_id, ctx.scope, tool_name, payload, ctx.versions)
    cached = cache.get(key)
    if cached and not cached.is_expired():
        trace.add_event("cache.hit", key=key, age_ms=cached.age_ms)
        return cached.value

    result = tool_client.call(tool_name, payload)
    if is_cacheable(result, ctx):
        cache.set(key, sanitize(result), ttl=policy.ttl(tool_name))
    return result

缓存策略应写成配置：

tools:
  crm.lookup_customer:
    cache: false
  catalog.search:
    cache: true
    ttl_seconds: 300
    key_fields: [tenant_id, query, locale, permission_scope]
  policy.lookup:
    cache: true
    ttl_seconds: 3600
    version_field: policy_version

6. 生产实践

默认不缓存包含个人敏感信息、权限差异或交易状态的结果。
缓存命中和未命中都写 trace，便于解释回答来源。
缓存 TTL 按数据新鲜度设置，不用一个全局 TTL。
对热点 key 使用请求合并，避免缓存击穿。
对批量评测和回放使用独立缓存命名空间，避免污染线上。
模型响应缓存要绑定 Prompt、模型、工具和上下文 hash。

7. 常见反模式

缓存键不包含租户和权限，导致跨用户数据泄漏。
缓存模型最终回答，却不缓存构造该回答的来源和版本。
对时间敏感数据使用长 TTL。
缓存 prompt injection 后的污染内容，并在后续会话复用。
只看命中率，不看错误命中和陈旧命中。

8. 评测方法

命中率：按操作、租户、场景拆分，而不是只看总命中率。
正确性：缓存命中结果与源系统结果抽样比对。
成本：缓存带来的 token、模型调用、工具调用节省。
安全：跨租户、跨权限、数据删除后的缓存清理测试。
退化：Redis/KV 不可用时系统是否降级而非整体失败。

9. 安全与治理

缓存值进入存储前做敏感字段裁剪或加密。
权限变化、用户删除、策略更新要触发失效或版本切换。
回放缓存和生产缓存隔离。
对 semantic cache 设置安全阈值，不能把相似问题等同于同一授权上下文。

10. 权威资料

RFC 9111 HTTP Caching: https://datatracker.ietf.org/doc/html/rfc9111
Redis caching use cases: https://redis.io/solutions/use-cases/caching/
AWS caching guidance: https://aws.amazon.com/caching/
OpenAI Prompt caching: https://developers.openai.com/api/docs/guides/prompt-caching
OpenAI Production best practices: https://developers.openai.com/api/docs/guides/production-best-practices

11. 二次精修：Agent 缓存分层

Agent 缓存不是“把回答存起来”。生产系统要区分确定性缓存、近似缓存、上下文缓存和外部工具缓存。

缓存层	Key	Value	TTL	失效条件
Prompt prefix cache	模型供应商规则	公共 prompt token	供应商控制	prompt 前缀变化
Retrieval cache	query hash + corpus version	文档片段	分钟到小时	知识库版本变化
Tool result cache	tool + normalized args + auth scope	工具结果	按业务语义	数据源更新或权限变化
Planning cache	task intent + constraints	初始计划	短	prompt/model/tool 变化
Final answer cache	normalized request + context version	答案	很短或不用	用户上下文变化

12. 缓存决策流程

def cache_key(tool_name, args, auth_scope, corpus_version):
    normalized = canonical_json(args)
    return sha256(f"{tool_name}:{normalized}:{auth_scope}:{corpus_version}")

13. 缓存指标与事故边界

指标	说明
Hit Rate	命中率高不等于正确，要结合 stale rate
Stale Response Rate	返回过期信息的比例
Cost Saved	缓存节省的 token 和工具调用费用
Latency Saved P95	对用户体验的实际收益
Permission Leak Count	不同权限用户命中同一缓存的事故数
Cache Bypass Rate	因敏感、高风险或低置信而绕过的比例

安全要求：缓存 key 必须包含权限域和数据版本；value 要脱敏或加密；高危动作的工具结果只能缓存查询结果，不能缓存“已执行副作用”；用户删除请求要能清理相关缓存。

14. 灾难恢复与回放

缓存不是事实源，灾备时应能全部清空并从事实源恢复。
回放时要固定缓存快照，或者记录线上缓存命中结果，否则回放会因缓存状态不同产生漂移。
评测时同时跑冷缓存和热缓存，避免只在理想路径上评估成本和延迟。
缓存穿透要有保护：请求合并、singleflight、限流和负缓存。

15. 补充权威资料

OpenAI prompt caching: https://platform.openai.com/docs/guides/prompt-caching （核对日期：2026-05-09）
Redis caching use cases: https://redis.io/solutions/use-cases/caching/ （核对日期：2026-05-09）

1. 定义与边界​

2. 为什么重要​

3. 核心机制​

4. 架构模式​

5. 工程实现​

6. 生产实践​

7. 常见反模式​

8. 评测方法​

9. 安全与治理​

10. 权威资料​

11. 二次精修：Agent 缓存分层​

12. 缓存决策流程​

13. 缓存指标与事故边界​

14. 灾难恢复与回放​

15. 补充权威资料​