Upstash 速率限制回调

¥Upstash Ratelimit Callback

在本指南中，我们将介绍如何使用 UpstashRatelimitHandler 根据请求数或令牌数添加速率限制。此处理程序使用 Upstash 的速率限制库，而 Upstash 的速率限制库又利用了 Upstash Redis。

¥In this guide, we will go over how to add rate limiting based on number of requests or the number of tokens using UpstashRatelimitHandler. This handler uses Upstash's ratelimit library, which utilizes Upstash Redis.

Upstash Ratelimit 的工作原理是每次调用 limit 方法时向 Upstash Redis 发送 HTTP 请求。用户剩余的令牌/请求将被检查并更新。根据剩余的令牌，我们可以停止执行开销高昂的操作，例如调用 LLM 或查询向量存储：

¥Upstash Ratelimit works by sending an HTTP request to Upstash Redis every time the limit method is called. Remaining tokens/requests of the user are checked and updated. Based on the remaining tokens, we can stop the execution of costly operations, like invoking an LLM or querying a vector store:

const response = await ratelimit.limit();
if (response.success) {
  execute_costly_operation();
}

UpstashRatelimitHandler 允许你在几分钟内将此速率限制逻辑合并到你的链中。

¥UpstashRatelimitHandler allows you to incorporate this ratelimit logic into your chain in a few minutes.

设置

¥Setup

首先，你需要登录 Upstash 控制台并创建一个 Redis 数据库 (查看我们的文档)。创建数据库后，你需要设置环境变量：

¥First, you will need to go to the Upstash Console and create a redis database (see our docs). After creating a database, you will need to set the environment variables:

UPSTASH_REDIS_REST_URL="****"
UPSTASH_REDIS_REST_TOKEN="****"

接下来，你需要安装 Upstash Ratelimit 和 @langchain/community：

¥Next, you will need to install Upstash Ratelimit and @langchain/community:

tip

请参阅此部分包含有关安装集成包的一般说明。

¥See this section for general instructions on installing integration packages.

npm
Yarn
pnpm

npm install @upstash/ratelimit @langchain/community @langchain/core

yarn add @upstash/ratelimit @langchain/community @langchain/core

pnpm add @upstash/ratelimit @langchain/community @langchain/core

现在你可以为你的链添加速率限制了！

¥You are now ready to add rate limiting to your chain!

每个请求的速率限制

¥Ratelimiting Per Request

假设我们希望用户每分钟调用我们的链 10 次。实现此目标非常简单：

¥Let's imagine that we want to allow our users to invoke our chain 10 times per minute. Achieving this is as simple as:

const UPSTASH_REDIS_REST_URL = "****";
const UPSTASH_REDIS_REST_TOKEN = "****";

import {
  UpstashRatelimitHandler,
  UpstashRatelimitError,
} from "@langchain/community/callbacks/handlers/upstash_ratelimit";
import { RunnableLambda } from "@langchain/core/runnables";
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";

// create ratelimit
const ratelimit = new Ratelimit({
  redis: new Redis({
    url: UPSTASH_REDIS_REST_URL,
    token: UPSTASH_REDIS_REST_TOKEN,
  }),
  // 10 requests per window, where window size is 60 seconds:
  limiter: Ratelimit.fixedWindow(10, "60 s"),
});

// create handler
const user_id = "user_id"; // should be a method which gets the user id
const handler = new UpstashRatelimitHandler(user_id, {
  requestRatelimit: ratelimit,
});

// create mock chain
const chain = new RunnableLambda({ func: (str: string): string => str });

try {
  const response = await chain.invoke("hello world", {
    callbacks: [handler],
  });
  console.log(response);
} catch (err) {
  if (err instanceof UpstashRatelimitError) {
    console.log("Handling ratelimit.");
  }
}

请注意，我们将处理程序传递给 invoke 方法，而不是在定义链时传递处理程序。

¥Note that we pass the handler to the invoke method instead of passing the handler when defining the chain.

有关 LangChain.js 常见用例的其他书面指南，请查看 FixedWindow 和 upstash-ratelimit 文档部分。

¥For rate limiting algorithms other than FixedWindow, see upstash-ratelimit docs.

在执行管道中的任何步骤之前，ratelimit 都会检查用户是否超过了请求限制。如果是这样，则会引发 UpstashRatelimitError 异常。

¥Before executing any steps in our pipeline, ratelimit will check whether the user has passed the request limit. If so, UpstashRatelimitError is raised.

每个令牌的速率限制

¥Ratelimiting Per Token

另一个选项是根据以下情况对链式调用进行速率限制：

¥Another option is to rate limit chain invokations based on:

提示中的标记数
¥number of tokens in prompt
提示中的标记数和 LLM 完成情况
¥number of tokens in prompt and LLM completion

这仅在你的链中包含 LLM 时有效。另一个要求是，你使用的 LLM 应该在其 LLMOutput 中返回令牌使用情况。返回的令牌使用字典的格式取决于 LLM。要了解如何根据你的 LLM 配置处理程序，请参阅下面“配置”部分的末尾。

¥This only works if you have an LLM in your chain. Another requirement is that the LLM you are using should return the token usage in it's LLMOutput. The format of the token usage dictionary returned depends on the LLM. To learn about how you should configure the handler depending on your LLM, see the end of the Configuration section below.

工作原理

¥How it works

处理程序将在调用 LLM 之前获取剩余的令牌。如果剩余标记大于 0，则将调用 LLM。否则，将引发 UpstashRatelimitError 问题。

¥The handler will get the remaining tokens before calling the LLM. If the remaining tokens is more than 0, LLM will be called. Otherwise UpstashRatelimitError will be raised.

调用 LLM 后，将使用令牌使用信息从用户剩余的令牌中减去。在链的这个阶段不会引发任何错误。

¥After LLM is called, token usage information will be used to subtracted from the remaining tokens of the user. No error is raised at this stage of the chain.

配置

¥Configuration

对于第一个配置，只需像这样初始化处理程序：

¥For the first configuration, simply initialize the handler like this:

const user_id = "user_id"; // should be a method which gets the user id
const handler = new UpstashRatelimitHandler(user_id, {
  requestRatelimit: ratelimit,
});

对于第二种配置，初始化处理程序的方法如下：

¥For the second configuration, here is how to initialize the handler:

const user_id = "user_id"; // should be a method which gets the user id
const handler = new UpstashRatelimitHandler(user_id, {
  tokenRatelimit: ratelimit,
});

你还可以同时基于请求和令牌进行速率限制，只需传递 request_ratelimit 和 token_ratelimit 参数即可。

¥You can also employ ratelimiting based on requests and tokens at the same time, simply by passing both request_ratelimit and token_ratelimit parameters.

为了使 token 使用正常工作，LangChain.js 中的 LLM 步骤应返回以下格式的 token 使用字段：

¥For token usage to work correctly, the LLM step in LangChain.js should return a token usage field in the following format:

{
  "tokenUsage": {
    "totalTokens": 123,
    "promptTokens": 456,
    "otherFields: "..."
  },
  "otherFields: "..."
}

然而，并非 LangChain.js 中的所有 LLM 都符合此格式。如果你的 LLM 返回相同的值但键不同，你可以将参数 llmOutputTokenUsageField、llmOutputTotalTokenField 和 llmOutputPromptTokenField 传递给处理程序来使用它们：

¥Not all LLMs in LangChain.js comply with this format however. If your LLM returns the same values with different keys, you can use the parameters llmOutputTokenUsageField, llmOutputTotalTokenField and llmOutputPromptTokenField by passing them to the handler:

const handler = new UpstashRatelimitHandler(
  user_id,
  {
    requestRatelimit: ratelimit
    llmOutputTokenUsageField: "usage",
    llmOutputTotalTokenField: "total",
    llmOutputPromptTokenField: "prompt"
  }
)

下面是一个使用 LLM 的链的示例：

¥Here is an example with a chain utilizing an LLM:

const UPSTASH_REDIS_REST_URL = "****";
const UPSTASH_REDIS_REST_TOKEN = "****";
const OPENAI_API_KEY = "****";

import {
  UpstashRatelimitHandler,
  UpstashRatelimitError,
} from "@langchain/community/callbacks/handlers/upstash_ratelimit";
import { RunnableLambda, RunnableSequence } from "@langchain/core/runnables";
import { OpenAI } from "@langchain/openai";
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";

// create ratelimit
const ratelimit = new Ratelimit({
  redis: new Redis({
    url: UPSTASH_REDIS_REST_URL,
    token: UPSTASH_REDIS_REST_TOKEN,
  }),
  // 500 tokens per window, where window size is 60 seconds:
  limiter: Ratelimit.fixedWindow(500, "60 s"),
});

// create handler
const user_id = "user_id"; // should be a method which gets the user id
const handler = new UpstashRatelimitHandler(user_id, {
  tokenRatelimit: ratelimit,
});

// create mock chain
const asStr = new RunnableLambda({ func: (str: string): string => str });
const model = new OpenAI({
  apiKey: OPENAI_API_KEY,
});
const chain = RunnableSequence.from([asStr, model]);

// invoke chain with handler:
try {
  const response = await chain.invoke("hello world", {
    callbacks: [handler],
  });
  console.log(response);
} catch (err) {
  if (err instanceof UpstashRatelimitError) {
    console.log("Handling ratelimit.");
  }
}

Upstash 速率限制回调

设置​

每个请求的速率限制​

每个令牌的速率限制​

工作原理​

配置​

设置

每个请求的速率限制

每个令牌的速率限制

工作原理

配置