如何跟踪令牌使用情况
¥How to track token usage
本注意本介绍了如何跟踪特定调用的令牌使用情况。
¥This notebook goes over how to track your token usage for specific calls.
使用 AIMessage.usage_metadata
¥Using AIMessage.usage_metadata
许多模型提供商会在聊天生成响应中返回令牌使用信息。如果可用,此信息将包含在相应模型生成的 AIMessage
对象中。
¥A number of model providers return token usage information as part of the chat generation response. When available, this information will be included on the AIMessage
objects produced by the corresponding model.
LangChain AIMessage
对象包含一个 usage_metadata
属性,用于支持支持的提供程序。填充后,此属性将是一个带有标准键(例如 "input_tokens" 和 "output_tokens")的对象。
¥LangChain AIMessage
objects include a usage_metadata
attribute for supported providers. When populated, this attribute will be an object with standard keys (e.g., "input_tokens" and "output_tokens").
OpenAI
- npm
- Yarn
- pnpm
npm install @langchain/openai @langchain/core
yarn add @langchain/openai @langchain/core
pnpm add @langchain/openai @langchain/core
import { ChatOpenAI } from "@langchain/openai";
const chatModel = new ChatOpenAI({
model: "gpt-3.5-turbo-0125",
});
const res = await chatModel.invoke("Tell me a joke.");
console.log(res.usage_metadata);
/*
{ input_tokens: 12, output_tokens: 17, total_tokens: 29 }
*/
API Reference:
- ChatOpenAI from
@langchain/openai
Anthropic
- npm
- Yarn
- pnpm
npm install @langchain/anthropic @langchain/core
yarn add @langchain/anthropic @langchain/core
pnpm add @langchain/anthropic @langchain/core
import { ChatAnthropic } from "@langchain/anthropic";
const chatModel = new ChatAnthropic({
model: "claude-3-haiku-20240307",
});
const res = await chatModel.invoke("Tell me a joke.");
console.log(res.usage_metadata);
/*
{ input_tokens: 12, output_tokens: 98, total_tokens: 110 }
*/
API Reference:
- ChatAnthropic from
@langchain/anthropic
使用 AIMessage.response_metadata
¥Using AIMessage.response_metadata
许多模型提供商会在聊天生成响应中返回令牌使用信息。如果可用,此信息将包含在 AIMessage.response_metadata
字段中。
¥A number of model providers return token usage information as part of the chat generation response. When available, this is included in the AIMessage.response_metadata
field.
OpenAI
import { ChatOpenAI } from "@langchain/openai";
const chatModel = new ChatOpenAI({
model: "gpt-4o-mini",
});
const res = await chatModel.invoke("Tell me a joke.");
console.log(res.response_metadata);
/*
{
tokenUsage: { completionTokens: 15, promptTokens: 12, totalTokens: 27 },
finish_reason: 'stop'
}
*/
API Reference:
- ChatOpenAI from
@langchain/openai
Anthropic
import { ChatAnthropic } from "@langchain/anthropic";
const chatModel = new ChatAnthropic({
model: "claude-3-sonnet-20240229",
});
const res = await chatModel.invoke("Tell me a joke.");
console.log(res.response_metadata);
/*
{
id: 'msg_017Mgz6HdgNbi3cwL1LNB9Dw',
model: 'claude-3-sonnet-20240229',
stop_sequence: null,
usage: { input_tokens: 12, output_tokens: 30 },
stop_reason: 'end_turn'
}
*/
API Reference:
- ChatAnthropic from
@langchain/anthropic
流式传输
¥Streaming
某些提供商在流式上下文中支持令牌计数元数据。
¥Some providers support token count metadata in a streaming context.
OpenAI
例如,OpenAI 会在流末尾返回一个包含标记使用信息的消息块。@langchain/openai
>= 0.1.0 支持此行为,你可以在调用时通过传递 stream_options
参数来启用它。
¥For example, OpenAI will return a message chunk at the end of a stream with token usage information. This behavior is supported by @langchain/openai
>= 0.1.0 and can be enabled by passing a stream_options
parameter when making your call.
默认情况下,流中的最后一条消息块将在消息的 response_metadata
属性中包含 finish_reason
。如果我们在流模式下包含令牌使用情况,则会在流的末尾添加一个包含使用情况元数据的附加块,这样 finish_reason
就会出现在倒数第二个消息块中。
¥By default, the last message chunk in a stream will include a finish_reason
in the message's response_metadata
attribute. If we include token usage in streaming mode, an additional chunk containing usage metadata will be added to the end of the stream, such that finish_reason
appears on the second to last message chunk.
import type { AIMessageChunk } from "@langchain/core/messages";
import { ChatOpenAI } from "@langchain/openai";
import { concat } from "@langchain/core/utils/stream";
// Instantiate the model
const model = new ChatOpenAI();
const response = await model.stream("Hello, how are you?", {
// Pass the stream options
stream_options: {
include_usage: true,
},
});
// Iterate over the response, only saving the last chunk
let finalResult: AIMessageChunk | undefined;
for await (const chunk of response) {
if (finalResult) {
finalResult = concat(finalResult, chunk);
} else {
finalResult = chunk;
}
}
console.log(finalResult?.usage_metadata);
/*
{ input_tokens: 13, output_tokens: 30, total_tokens: 43 }
*/
API Reference:
- AIMessageChunk from
@langchain/core/messages
- ChatOpenAI from
@langchain/openai
- concat from
@langchain/core/utils/stream
使用回调
¥Using callbacks
你还可以使用 handleLLMEnd
回调获取 LLM 的完整输出,包括受支持模型的令牌使用情况。以下是一个示例:
¥You can also use the handleLLMEnd
callback to get the full output from the LLM, including token usage for supported models.
Here's an example of how you could do that:
import { ChatOpenAI } from "@langchain/openai";
const chatModel = new ChatOpenAI({
model: "gpt-4o-mini",
callbacks: [
{
handleLLMEnd(output) {
console.log(JSON.stringify(output, null, 2));
},
},
],
});
await chatModel.invoke("Tell me a joke.");
/*
{
"generations": [
[
{
"text": "Why did the scarecrow win an award?\n\nBecause he was outstanding in his field!",
"message": {
"lc": 1,
"type": "constructor",
"id": [
"langchain_core",
"messages",
"AIMessage"
],
"kwargs": {
"content": "Why did the scarecrow win an award?\n\nBecause he was outstanding in his field!",
"tool_calls": [],
"invalid_tool_calls": [],
"additional_kwargs": {},
"response_metadata": {
"tokenUsage": {
"completionTokens": 17,
"promptTokens": 12,
"totalTokens": 29
},
"finish_reason": "stop"
}
}
},
"generationInfo": {
"finish_reason": "stop"
}
}
]
],
"llmOutput": {
"tokenUsage": {
"completionTokens": 17,
"promptTokens": 12,
"totalTokens": 29
}
}
}
*/
API Reference:
- ChatOpenAI from
@langchain/openai
后续步骤
¥Next steps
现在你已经看到了一些关于如何跟踪受支持提供商的聊天模型令牌使用情况的示例。
¥You've now seen a few examples of how to track chat model token usage for supported providers.
接下来,查看本节中其他关于聊天模型的操作指南,例如 如何使模型返回结构化输出 或 如何为聊天模型添加缓存。
¥Next, check out the other how-to guides on chat models in this section, like how to get a model to return structured output or how to add caching to your chat models.