如何缓存聊天模型响应

¥How to cache chat model responses

Prerequisites

本指南假设你熟悉以下概念：

¥This guide assumes familiarity with the following concepts:

LangChain 为聊天模型提供了一个可选的缓存层。这很有用，原因有二：

¥LangChain provides an optional caching layer for chat models. This is useful for two reasons:

如果你经常多次请求相同的补全，它可以通过减少你向 LLM 提供商发出的 API 调用次数来节省你的成本。它可以通过减少你向 LLM 提供商发出的 API 调用次数来加快你的应用速度。

¥It can save you money by reducing the number of API calls you make to the LLM provider, if you're often requesting the same completion multiple times. It can speed up your application by reducing the number of API calls you make to the LLM provider.

import { ChatOpenAI } from "@langchain/openai";

// To make the caching really obvious, lets use a slower model.
const model = new ChatOpenAI({
  model: "gpt-4",
  cache: true,
});

内存缓存

¥In Memory Cache

默认缓存存储在内存中。这意味着如果你重新启动应用，缓存将被清除。

¥The default cache is stored in-memory. This means that if you restart your application, the cache will be cleared.

console.time();

// The first time, it is not yet in cache, so it should take longer
const res = await model.invoke("Tell me a joke!");
console.log(res);

console.timeEnd();

/*
  AIMessage {
    lc_serializable: true,
    lc_kwargs: {
      content: "Why don't scientists trust atoms?\n\nBecause they make up everything!",
      additional_kwargs: { function_call: undefined, tool_calls: undefined }
    },
    lc_namespace: [ 'langchain_core', 'messages' ],
    content: "Why don't scientists trust atoms?\n\nBecause they make up everything!",
    name: undefined,
    additional_kwargs: { function_call: undefined, tool_calls: undefined }
  }
  default: 2.224s
*/

console.time();

// The second time it is, so it goes faster
const res2 = await model.invoke("Tell me a joke!");
console.log(res2);

console.timeEnd();
/*
  AIMessage {
    lc_serializable: true,
    lc_kwargs: {
      content: "Why don't scientists trust atoms?\n\nBecause they make up everything!",
      additional_kwargs: { function_call: undefined, tool_calls: undefined }
    },
    lc_namespace: [ 'langchain_core', 'messages' ],
    content: "Why don't scientists trust atoms?\n\nBecause they make up everything!",
    name: undefined,
    additional_kwargs: { function_call: undefined, tool_calls: undefined }
  }
  default: 181.98ms
*/

使用 Redis 缓存

¥Caching with Redis

LangChain 还提供基于 Redis 的缓存。如果你想在多个进程或服务器之间共享缓存，这很有用。要使用它，你需要安装 redis 软件包：

¥LangChain also provides a Redis-based cache. This is useful if you want to share the cache across multiple processes or servers. To use it, you'll need to install the redis package:

npm
Yarn
pnpm

npm install ioredis @langchain/community @langchain/core

yarn add ioredis @langchain/community @langchain/core

pnpm add ioredis @langchain/community @langchain/core

然后，你可以在实例化 LLM 时传递 cache 选项。例如：

¥Then, you can pass a cache option when you instantiate the LLM. For example:

import { ChatOpenAI } from "@langchain/openai";
import { Redis } from "ioredis";
import { RedisCache } from "@langchain/community/caches/ioredis";

const client = new Redis("redis://localhost:6379");

const cache = new RedisCache(client, {
  ttl: 60, // Optional key expiration value
});

const model = new ChatOpenAI({ cache });

const response1 = await model.invoke("Do something random!");
console.log(response1);
/*
  AIMessage {
    content: "Sure! I'll generate a random number for you: 37",
    additional_kwargs: {}
  }
*/

const response2 = await model.invoke("Do something random!");
console.log(response2);
/*
  AIMessage {
    content: "Sure! I'll generate a random number for you: 37",
    additional_kwargs: {}
  }
*/

await client.disconnect();

API Reference:

ChatOpenAI from @langchain/openai
RedisCache from @langchain/community/caches/ioredis

使用 Upstash Redis 缓存

¥Caching with Upstash Redis

LangChain 提供了一个基于 Upstash Redis 的缓存。与基于 Redis 的缓存一样，如果你想在多个进程或服务器之间共享缓存，此缓存非常有用。Upstash Redis 客户端使用 HTTP 并支持边缘环境。要使用它，你需要安装 @upstash/redis 软件包：

¥LangChain provides an Upstash Redis-based cache. Like the Redis-based cache, this cache is useful if you want to share the cache across multiple processes or servers. The Upstash Redis client uses HTTP and supports edge environments. To use it, you'll need to install the @upstash/redis package:

npm
Yarn
pnpm

npm install @upstash/redis

yarn add @upstash/redis

pnpm add @upstash/redis

你还需要一个 Upstash 账户和一个 Redis 数据库来连接。完成后，检索你的 REST URL 和 REST 令牌。

¥You'll also need an Upstash account and a Redis database to connect to. Once you've done that, retrieve your REST URL and REST token.

然后，你可以在实例化 LLM 时传递 cache 选项。例如：

¥Then, you can pass a cache option when you instantiate the LLM. For example:

import { ChatOpenAI } from "@langchain/openai";
import { UpstashRedisCache } from "@langchain/community/caches/upstash_redis";

// See https://docs.upstash.com/redis/howto/connectwithupstashredis#quick-start for connection options
const cache = new UpstashRedisCache({
  config: {
    url: "UPSTASH_REDIS_REST_URL",
    token: "UPSTASH_REDIS_REST_TOKEN",
  },
  ttl: 3600,
});

const model = new ChatOpenAI({ cache });

API Reference:

ChatOpenAI from @langchain/openai
UpstashRedisCache from @langchain/community/caches/upstash_redis

你还可以直接传入之前创建的 @upstash/redis 客户端实例：

¥You can also directly pass in a previously created @upstash/redis client instance:

import { Redis } from "@upstash/redis";
import https from "https";

import { ChatOpenAI } from "@langchain/openai";
import { UpstashRedisCache } from "@langchain/community/caches/upstash_redis";

// const client = new Redis({
//   url: process.env.UPSTASH_REDIS_REST_URL!,
//   token: process.env.UPSTASH_REDIS_REST_TOKEN!,
//   agent: new https.Agent({ keepAlive: true }),
// });

// Or simply call Redis.fromEnv() to automatically load the UPSTASH_REDIS_REST_URL and UPSTASH_REDIS_REST_TOKEN environment variables.
const client = Redis.fromEnv({
  agent: new https.Agent({ keepAlive: true }),
});

const cache = new UpstashRedisCache({ client });
const model = new ChatOpenAI({ cache });

API Reference:

ChatOpenAI from @langchain/openai
UpstashRedisCache from @langchain/community/caches/upstash_redis

使用 Vercel KV 缓存

¥Caching with Vercel KV

LangChain 提供了一个基于 Vercel KV 的缓存。与基于 Redis 的缓存一样，如果你想在多个进程或服务器之间共享缓存，此缓存非常有用。Vercel KV 客户端使用 HTTP 并支持边缘环境。要使用它，你需要安装 @vercel/kv 软件包：

¥LangChain provides an Vercel KV-based cache. Like the Redis-based cache, this cache is useful if you want to share the cache across multiple processes or servers. The Vercel KV client uses HTTP and supports edge environments. To use it, you'll need to install the @vercel/kv package:

npm
Yarn
pnpm

npm install @vercel/kv

yarn add @vercel/kv

pnpm add @vercel/kv

你还需要一个 Vercel 账户和一个键值数据库来连接。完成后，检索你的 REST URL 和 REST 令牌。

¥You'll also need an Vercel account and a KV database to connect to. Once you've done that, retrieve your REST URL and REST token.

然后，你可以在实例化 LLM 时传递 cache 选项。例如：

¥Then, you can pass a cache option when you instantiate the LLM. For example:

import { ChatOpenAI } from "@langchain/openai";
import { VercelKVCache } from "@langchain/community/caches/vercel_kv";
import { createClient } from "@vercel/kv";

// See https://vercel.com/docs/storage/vercel-kv/kv-reference#createclient-example for connection options
const cache = new VercelKVCache({
  client: createClient({
    url: "VERCEL_KV_API_URL",
    token: "VERCEL_KV_API_TOKEN",
  }),
  ttl: 3600,
});

const model = new ChatOpenAI({
  model: "gpt-4o-mini",
  cache,
});

API Reference:

ChatOpenAI from @langchain/openai
VercelKVCache from @langchain/community/caches/vercel_kv

使用缓存 Cloudflare KV

¥Caching with Cloudflare KV

info

此集成仅支持 Cloudflare Workers。

¥This integration is only supported in Cloudflare Workers.

如果你将项目部署为 Cloudflare Worker，则可以使用 LangChain 的 Cloudflare KV 驱动的 LLM 缓存。

¥If you're deploying your project as a Cloudflare Worker, you can use LangChain's Cloudflare KV-powered LLM cache.

有关如何在 Cloudflare 中设置 KV 的信息，请参阅官方文档。

¥For information on how to set up KV in Cloudflare, see the official documentation.

注意：如果你使用 TypeScript，则可能需要安装尚未安装的类型：

¥Note: If you are using TypeScript, you may need to install types if they aren't already present:

npm
Yarn
pnpm

npm install -S @cloudflare/workers-types

yarn add @cloudflare/workers-types

pnpm add @cloudflare/workers-types

import type { KVNamespace } from "@cloudflare/workers-types";

import { ChatOpenAI } from "@langchain/openai";
import { CloudflareKVCache } from "@langchain/cloudflare";

export interface Env {
  KV_NAMESPACE: KVNamespace;
  OPENAI_API_KEY: string;
}

export default {
  async fetch(_request: Request, env: Env) {
    try {
      const cache = new CloudflareKVCache(env.KV_NAMESPACE);
      const model = new ChatOpenAI({
        cache,
        model: "gpt-3.5-turbo",
        apiKey: env.OPENAI_API_KEY,
      });
      const response = await model.invoke("How are you today?");
      return new Response(JSON.stringify(response), {
        headers: { "content-type": "application/json" },
      });
    } catch (err: any) {
      console.log(err.message);
      return new Response(err.message, { status: 500 });
    }
  },
};

API Reference:

ChatOpenAI from @langchain/openai
CloudflareKVCache from @langchain/cloudflare

文件系统缓存

¥Caching on the File System

danger

不建议将此缓存用于生产环境。它仅用于本地开发。

¥This cache is not recommended for production use. It is only intended for local development.

LangChain 提供了一个简单的文件系统缓存。默认情况下，缓存存储在一个临时目录中，但你可以根据需要指定自定义目录。

¥LangChain provides a simple file system cache. By default the cache is stored a temporary directory, but you can specify a custom directory if you want.

const cache = await LocalFileCache.create();

后续步骤

¥Next steps

现在你已经学习了如何缓存模型响应以节省时间和金钱。

¥You've now learned how to cache model responses to save time and money.

接下来，查看其他关于聊天模型（例如如何使模型返回结构化输出或如何创建你自己的自定义聊天模型）的操作指南。

¥Next, check out the other how-to guides on chat models, like how to get a model to return structured output or how to create your own custom chat model.

如何缓存聊天模型响应

内存缓存​

使用 Redis 缓存​

API Reference:

使用 Upstash Redis 缓存​

API Reference:

API Reference:

使用 Vercel KV 缓存​

API Reference:

使用缓存 Cloudflare KV​

API Reference:

文件系统缓存​

后续步骤​

内存缓存

使用 Redis 缓存

使用 Upstash Redis 缓存

使用 Vercel KV 缓存

使用缓存 Cloudflare KV

文件系统缓存

后续步骤