Skip to main content

如何跟踪令牌使用情况

¥How to track token usage

Prerequisites

本指南假设你熟悉以下概念:

¥This guide assumes familiarity with the following concepts:

本注意本介绍了如何跟踪特定调用的令牌使用情况。

¥This notebook goes over how to track your token usage for specific calls.

使用 AIMessage.usage_metadata

¥Using AIMessage.usage_metadata

许多模型提供商会在聊天生成响应中返回令牌使用信息。如果可用,此信息将包含在相应模型生成的 AIMessage 对象中。

¥A number of model providers return token usage information as part of the chat generation response. When available, this information will be included on the AIMessage objects produced by the corresponding model.

LangChain AIMessage 对象包含一个 usage_metadata 属性,用于支持支持的提供程序。填充后,此属性将是一个带有标准键(例如 "input_tokens" 和 "output_tokens")的对象。

¥LangChain AIMessage objects include a usage_metadata attribute for supported providers. When populated, this attribute will be an object with standard keys (e.g., "input_tokens" and "output_tokens").

OpenAI

npm install @langchain/openai @langchain/core
import { ChatOpenAI } from "@langchain/openai";

const chatModel = new ChatOpenAI({
model: "gpt-3.5-turbo-0125",
});

const res = await chatModel.invoke("Tell me a joke.");

console.log(res.usage_metadata);

/*
{ input_tokens: 12, output_tokens: 17, total_tokens: 29 }
*/

API Reference:

Anthropic

npm install @langchain/anthropic @langchain/core
import { ChatAnthropic } from "@langchain/anthropic";

const chatModel = new ChatAnthropic({
model: "claude-3-haiku-20240307",
});

const res = await chatModel.invoke("Tell me a joke.");

console.log(res.usage_metadata);

/*
{ input_tokens: 12, output_tokens: 98, total_tokens: 110 }
*/

API Reference:

使用 AIMessage.response_metadata

¥Using AIMessage.response_metadata

许多模型提供商会在聊天生成响应中返回令牌使用信息。如果可用,此信息将包含在 AIMessage.response_metadata 字段中。

¥A number of model providers return token usage information as part of the chat generation response. When available, this is included in the AIMessage.response_metadata field.

OpenAI

import { ChatOpenAI } from "@langchain/openai";

const chatModel = new ChatOpenAI({
model: "gpt-4o-mini",
});

const res = await chatModel.invoke("Tell me a joke.");

console.log(res.response_metadata);

/*
{
tokenUsage: { completionTokens: 15, promptTokens: 12, totalTokens: 27 },
finish_reason: 'stop'
}
*/

API Reference:

Anthropic

import { ChatAnthropic } from "@langchain/anthropic";

const chatModel = new ChatAnthropic({
model: "claude-3-sonnet-20240229",
});

const res = await chatModel.invoke("Tell me a joke.");

console.log(res.response_metadata);

/*
{
id: 'msg_017Mgz6HdgNbi3cwL1LNB9Dw',
model: 'claude-3-sonnet-20240229',
stop_sequence: null,
usage: { input_tokens: 12, output_tokens: 30 },
stop_reason: 'end_turn'
}
*/

API Reference:

流式传输

¥Streaming

某些提供商在流式上下文中支持令牌计数元数据。

¥Some providers support token count metadata in a streaming context.

OpenAI

例如,OpenAI 会在流末尾返回一个包含标记使用信息的消息块。@langchain/openai >= 0.1.0 支持此行为,你可以在调用时通过传递 stream_options 参数来启用它。

¥For example, OpenAI will return a message chunk at the end of a stream with token usage information. This behavior is supported by @langchain/openai >= 0.1.0 and can be enabled by passing a stream_options parameter when making your call.

info

默认情况下,流中的最后一条消息块将在消息的 response_metadata 属性中包含 finish_reason。如果我们在流模式下包含令牌使用情况,则会在流的末尾添加一个包含使用情况元数据的附加块,这样 finish_reason 就会出现在倒数第二个消息块中。

¥By default, the last message chunk in a stream will include a finish_reason in the message's response_metadata attribute. If we include token usage in streaming mode, an additional chunk containing usage metadata will be added to the end of the stream, such that finish_reason appears on the second to last message chunk.

import type { AIMessageChunk } from "@langchain/core/messages";
import { ChatOpenAI } from "@langchain/openai";
import { concat } from "@langchain/core/utils/stream";

// Instantiate the model
const model = new ChatOpenAI();

const response = await model.stream("Hello, how are you?", {
// Pass the stream options
stream_options: {
include_usage: true,
},
});

// Iterate over the response, only saving the last chunk
let finalResult: AIMessageChunk | undefined;
for await (const chunk of response) {
if (finalResult) {
finalResult = concat(finalResult, chunk);
} else {
finalResult = chunk;
}
}

console.log(finalResult?.usage_metadata);

/*
{ input_tokens: 13, output_tokens: 30, total_tokens: 43 }
*/

API Reference:

使用回调

¥Using callbacks

你还可以使用 handleLLMEnd 回调获取 LLM 的完整输出,包括受支持模型的令牌使用情况。以下是一个示例:

¥You can also use the handleLLMEnd callback to get the full output from the LLM, including token usage for supported models. Here's an example of how you could do that:

import { ChatOpenAI } from "@langchain/openai";

const chatModel = new ChatOpenAI({
model: "gpt-4o-mini",
callbacks: [
{
handleLLMEnd(output) {
console.log(JSON.stringify(output, null, 2));
},
},
],
});

await chatModel.invoke("Tell me a joke.");

/*
{
"generations": [
[
{
"text": "Why did the scarecrow win an award?\n\nBecause he was outstanding in his field!",
"message": {
"lc": 1,
"type": "constructor",
"id": [
"langchain_core",
"messages",
"AIMessage"
],
"kwargs": {
"content": "Why did the scarecrow win an award?\n\nBecause he was outstanding in his field!",
"tool_calls": [],
"invalid_tool_calls": [],
"additional_kwargs": {},
"response_metadata": {
"tokenUsage": {
"completionTokens": 17,
"promptTokens": 12,
"totalTokens": 29
},
"finish_reason": "stop"
}
}
},
"generationInfo": {
"finish_reason": "stop"
}
}
]
],
"llmOutput": {
"tokenUsage": {
"completionTokens": 17,
"promptTokens": 12,
"totalTokens": 29
}
}
}
*/

API Reference:

后续步骤

¥Next steps

现在你已经看到了一些关于如何跟踪受支持提供商的聊天模型令牌使用情况的示例。

¥You've now seen a few examples of how to track chat model token usage for supported providers.

接下来,查看本节中其他关于聊天模型的操作指南,例如 如何使模型返回结构化输出如何为聊天模型添加缓存

¥Next, check out the other how-to guides on chat models in this section, like how to get a model to return structured output or how to add caching to your chat models.