Skip to main content

聊天模型

¥Chat models

概述

¥Overview

大型语言模型 (LLM) 是一种先进的机器学习模型,在各种语言相关任务(例如文本生成、翻译、摘要、问答等)中表现出色,且无需针对每个场景进行特定任务的调整。

¥Large Language Models (LLMs) are advanced machine learning models that excel in a wide range of language-related tasks such as text generation, translation, summarization, question answering, and more, without needing task-specific tuning for every scenario.

现代 LLM 通常通过聊天模型接口访问,该接口以 messages 列表为输入,返回 message 作为输出。

¥Modern LLMs are typically accessed through a chat model interface that takes a list of messages as input and returns a message as output.

最新一代的聊天模型提供了额外的功能:

¥The newest generation of chat models offer additional capabilities:

  • 工具调用:许多流行的聊天模型都提供了原生的 工具调用 API。此 API 允许开发者构建丰富的应用,使 AI 能够与外部服务、API 和数据库交互。工具调用还可用于从非结构化数据中提取结构化信息并执行各种其他任务。

    ¥Tool calling: Many popular chat models offer a native tool calling API. This API allows developers to build rich applications that enable AI to interact with external services, APIs, and databases. Tool calling can also be used to extract structured information from unstructured data and perform various other tasks.

  • 结构化输出:一种使聊天模型以结构化格式(例如与给定模式匹配的 JSON)进行响应的技术。

    ¥Structured output: A technique to make a chat model respond in a structured format, such as JSON that matches a given schema.

  • 多模态:处理文本以外数据的能力;例如,图片、音频和视频。

    ¥Multimodality: The ability to work with data other than text; for example, images, audio, and video.

功能

¥Features

LangChain 提供了一致的接口,用于处理来自不同提供商的聊天模型,同时提供了用于监控、调试和优化使用 LLM 的应用性能的附加功能。

¥LangChain provides a consistent interface for working with chat models from different providers while offering additional features for monitoring, debugging, and optimizing the performance of applications that use LLMs.

  • 与许多聊天模型提供商集成(例如,Anthropic、OpenAI、Ollama、Microsoft Azure、Google Vertex、Amazon Bedrock、Hugging Face、Cohere、Groq)。请参阅 聊天模型集成 获取最新的支持模型列表。

    ¥Integrations with many chat model providers (e.g., Anthropic, OpenAI, Ollama, Microsoft Azure, Google Vertex, Amazon Bedrock, Hugging Face, Cohere, Groq). Please see chat model integrations for an up-to-date list of supported models.

  • 使用 LangChain 的 messages 格式或 OpenAI 格式。

    ¥Use either LangChain's messages format or OpenAI format.

  • 标准 工具调用 API:用于将工具绑定到模型、访问模型发出的工具调用请求以及将工具结果返回给模型的标准接口。

    ¥Standard tool calling API: standard interface for binding tools to models, accessing tool call requests made by models, and sending tool results back to the model.

  • 通过 withStructuredOutput 方法实现 结构化输出 的标准 API。

    ¥Standard API for structuring outputs via the withStructuredOutput method.

  • LangSmith 集成,用于监控和调试基于 LLM 的生产级应用。

    ¥Integration with LangSmith for monitoring and debugging production-grade applications based on LLMs.

  • 其他功能,例如标准化 令牌使用速率限制caching 等。

    ¥Additional features like standardized token usage, rate limiting, caching and more.

集成

¥Integrations

LangChain 集成了多种聊天模型,可让你使用来自不同提供商的各种模型。

¥LangChain has many chat model integrations that allow you to use a wide variety of models from different providers.

这些集成有两种类型:

¥These integrations are one of two types:

  1. 官方模型:这些模型由 LangChain 和/或模型提供商官方支持。你可以在 @langchain/<provider> 软件包中找到这些模型。

    ¥Official models: These are models that are officially supported by LangChain and/or model provider. You can find these models in the @langchain/<provider> packages.

  2. 社区模型:有些模型主要由社区贡献和支持。你可以在 @langchain/community 包中找到这些模型。

    ¥Community models: There are models that are mostly contributed and supported by the community. You can find these models in the @langchain/community package.

LangChain 聊天模型的命名约定是将 "聊天" 添加到类名的前缀(例如 ChatOllamaChatAnthropicChatOpenAI 等)。

¥LangChain chat models are named with a convention that prefixes "Chat" to their class names (e.g., ChatOllama, ChatAnthropic, ChatOpenAI, etc.).

请查看 聊天模型集成 获取支持的模型列表。

¥Please review the chat model integrations for a list of supported models.

note

名称中不包含前缀 "聊天" 或名称中不包含 "LLM" 作为后缀的模型通常是指不遵循聊天模型接口的旧模型,而是使用以字符串为输入并返回字符串作为输出的接口。

¥Models that do not include the prefix "Chat" in their name or include "LLM" as a suffix in their name typically refer to older models that do not follow the chat model interface and instead use an interface that takes a string as input and returns a string as output.

接口

¥Interface

LangChain 聊天模型实现了 BaseChatModel 接口。由于 BaseChatModel 也实现了 Runnable 接口,因此聊天模型支持 标准流接口、优化的 batching 等等。请参阅 Runnable 接口 了解更多详情。

¥LangChain chat models implement the BaseChatModel interface. Because BaseChatModel also implements the Runnable Interface, chat models support a standard streaming interface, optimized batching, and more. Please see the Runnable Interface for more details.

聊天模型的许多关键方法都将 messages 作为输入进行操作,并返回消息作为输出。

¥Many of the key methods of chat models operate on messages as input and return messages as output.

聊天模型提供了一组标准参数,可用于配置模型。这些参数通常用于控制模型的行为,例如输出的温度、响应中的最大令牌数以及等待响应的最长时间。有关更多详情,请参阅 标准参数 部分。

¥Chat models offer a standard set of parameters that can be used to configure the model. These parameters are typically used to control the behavior of the model, such as the temperature of the output, the maximum number of tokens in the response, and the maximum time to wait for a response. Please see the standard parameters section for more details.

note

在文档中,我们经常互换使用术语 "LLM" 和 "聊天模型"。这是因为大多数现代 LLM 都通过聊天模型接口向用户公开。

¥In documentation, we will often use the terms "LLM" and "Chat Model" interchangeably. This is because most modern LLMs are exposed to users via a chat model interface.

然而,LangChain 也有一些旧版 LLM 的实现,这些 LLM 不遵循聊天模型接口,而是使用一个以字符串为输入并返回字符串为输出的接口。这些模型的命名通常不带 "聊天" 前缀(例如,OllamaAnthropicOpenAI 等)。这些模型实现了 BaseLLM 接口,但可能包含 "LLM" 后缀(例如,OpenAILLM 等)。通常,用户不应使用这些模型。

¥However, LangChain also has implementations of older LLMs that do not follow the chat model interface and instead use an interface that takes a string as input and returns a string as output. These models are typically named without the "Chat" prefix (e.g., Ollama, Anthropic, OpenAI, etc.). These models implement the BaseLLM interface and may be named with the "LLM" suffix (e.g., OpenAILLM, etc.). Generally, users should not use these models.

关键方法

¥Key methods

聊天模型的关键方法如下:

¥The key methods of a chat model are:

  1. 调用:与聊天模型交互的主要方法。它以 messages 列表作为输入,并返回消息列表作为输出。

    ¥invoke: The primary method for interacting with a chat model. It takes a list of messages as input and returns a list of messages as output.

  2. 流:一种允许你在聊天模型生成时流式传输其输出的方法。

    ¥stream: A method that allows you to stream the output of a chat model as it is generated.

  3. batch:一种允许你将多个请求批量发送到聊天模型以提高处理效率的方法。

    ¥batch: A method that allows you to batch multiple requests to a chat model together for more efficient processing.

  4. bindTools:一种允许你将工具绑定到聊天模型以便在模型执行中使用的方法上下文。

    ¥bindTools: A method that allows you to bind a tool to a chat model for use in the model's execution context.

  5. withStructuredOutput:invoke 方法的封装器,适用于原生支持 结构化输出 的模型。

    ¥withStructuredOutput: A wrapper around the invoke method for models that natively support structured output.

其他重要方法可以在 BaseChatModel API 参考 中找到。

¥Other important methods can be found in the BaseChatModel API Reference.

输入和输出

¥Inputs and outputs

现代 LLM 通常通过聊天模型接口访问,该接口以 messages 为输入,返回 messages 作为输出。消息通常与一个角色(例如 "system"、"human"、"assistant")以及一个或多个包含文本或潜在多模态数据(例如图片、音频、视频)的内容块相关联。

¥Modern LLMs are typically accessed through a chat model interface that takes messages as input and returns messages as output. Messages are typically associated with a role (e.g., "system", "human", "assistant") and one or more content blocks that contain text or potentially multimodal data (e.g., images, audio, video).

LangChain 支持两种与聊天模型交互的消息格式:

¥LangChain supports two message formats to interact with chat models:

  1. LangChain 消息格式:LangChain 自身的消息格式,默认使用,并由 LangChain 内部使用。

    ¥LangChain Message Format: LangChain's own message format, which is used by default and is used internally by LangChain.

  2. OpenAI 的消息格式:OpenAI 的消息格式。

    ¥OpenAI's Message Format: OpenAI's message format.

标准参数

¥Standard parameters

许多聊天模型都具有标准化参数,可用于配置模型:

¥Many chat models have standardized parameters that can be used to configure the model:

ParameterDescription
modelThe name or identifier of the specific AI model you want to use (e.g., "gpt-3.5-turbo" or "gpt-4").
temperatureControls the randomness of the model's output. A higher value (e.g., 1.0) makes responses more creative, while a lower value (e.g., 0.1) makes them more deterministic and focused.
timeoutThe maximum time (in seconds) to wait for a response from the model before canceling the request. Ensures the request doesn’t hang indefinitely.
maxTokensLimits the total number of tokens (words and punctuation) in the response. This controls how long the output can be.
stopSpecifies stop sequences that indicate when the model should stop generating tokens. For example, you might use specific strings to signal the end of a response.
maxRetriesThe maximum number of attempts the system will make to resend a request if it fails due to issues like network timeouts or rate limits.
apiKeyThe API key required for authenticating with the model provider. This is usually issued when you sign up for access to the model.
baseUrlThe URL of the API endpoint where requests are sent. This is typically provided by the model's provider and is necessary for directing your requests.

一些重要事项:

¥Some important things to note:

  • 标准参数仅适用于公开具有预期功能的参数的模型提供程序。例如,某些提供程序没有公开最大输出令牌的配置,因此这些提供程序不支持 max_tokens。

    ¥Standard parameters only apply to model providers that expose parameters with the intended functionality. For example, some providers do not expose a configuration for maximum output tokens, so max_tokens can't be supported on these.

  • 标准参数目前仅在具有自身集成包的集成(例如 @langchain/openai@langchain/anthropic 等)中强制执行,在 @langchain/community 中的模型中不强制执行。

    ¥Standard params are currently only enforced on integrations that have their own integration packages (e.g. @langchain/openai, @langchain/anthropic, etc.), they're not enforced on models in @langchain/community.

ChatModels 还接受特定于该集成的其他参数。要查找 ChatModel 支持的所有参数,请访问该模型的 API 参考

¥ChatModels also accept other parameters that are specific to that integration. To find all the parameters supported by a ChatModel head to the API reference for that model.

工具调用

¥Tool calling

聊天模型可以调用 tools 执行任务,例如从数据库获取数据、发出 API 请求或运行自定义代码。有关更多信息,请参阅 工具调用 指南。

¥Chat models can call tools to perform tasks such as fetching data from a database, making API requests, or running custom code. Please see the tool calling guide for more information.

结构化输出

¥Structured outputs

可以请求聊天模型以特定格式(例如 JSON 或匹配特定模式)进行响应。此功能对于信息提取任务非常有用。请阅读 结构化输出 指南中有关该技术的更多信息。

¥Chat models can be requested to respond in a particular format (e.g., JSON or matching a particular schema). This feature is extremely useful for information extraction tasks. Please read more about the technique in the structured outputs guide.

多模态

¥Multimodality

大型语言模型 (LLM) 不仅限于处理文本。它们还可用于处理其他类型的数据,例如图片、音频和视频。这被称为 multimodality

¥Large Language Models (LLMs) are not limited to processing text. They can also be used to process other types of data, such as images, audio, and video. This is known as multimodality.

目前,只有部分 LLM 支持多模态输入,几乎没有 LLM 支持多模态输出。有关详细信息,请参阅特定模型文档。

¥Currently, only some LLMs support multimodal inputs, and almost none support multimodal outputs. Please consult the specific model documentation for details.

上下文窗口

¥Context window

聊天模型的上下文窗口是指模型一次可以处理的输入序列的最大大小。虽然现代 LLM 的上下文窗口非常大,但它们仍然存在一个限制,开发者在使用聊天模型时必须牢记这一点。

¥A chat model's context window refers to the maximum size of the input sequence the model can process at one time. While the context windows of modern LLMs are quite large, they still present a limitation that developers must keep in mind when working with chat models.

如果输入超出上下文窗口,模型可能无法处理整个输入并可能引发错误。在对话应用中,这一点尤为重要,因为上下文窗口决定了模型在整个对话过程中可以 "remember" 的信息量。开发者通常需要管理上下文窗口内的输入,以保持对话的连贯性而不超出限制。有关动态映射的更多详细信息,请参阅 memory

¥If the input exceeds the context window, the model may not be able to process the entire input and could raise an error. In conversational applications, this is especially important because the context window determines how much information the model can "remember" throughout a conversation. Developers often need to manage the input within the context window to maintain a coherent dialogue without exceeding the limit. For more details on handling memory in conversations, refer to the memory.

输入的大小以 tokens 为单位,tokens 是模型使用的处理单位。

¥The size of the input is measured in tokens which are the unit of processing that the model uses.

高级主题

¥Advanced topics

缓存

¥Caching

聊天模型 API 可能很慢,因此一个自然而然的问题是是否要缓存之前对话的结果。理论上,缓存可以通过减少向模型提供程序发出的请求数量来帮助提高性能。在实践中,缓存聊天模型响应是一个复杂的问题,应谨慎处理。

¥Chat model APIs can be slow, so a natural question is whether to cache the results of previous conversations. Theoretically, caching can help improve performance by reducing the number of requests made to the model provider. In practice, caching chat model responses is a complex problem and should be approached with caution.

原因是,如果依赖于将精确的输入缓存到模型中,那么在对话的第一次或第二次交互后,不太可能获得缓存命中。例如,你认为多个对话以完全相同的消息开头的可能性有多大?那同样的三条消息呢?

¥The reason is that getting a cache hit is unlikely after the first or second interaction in a conversation if relying on caching the exact inputs into the model. For example, how likely do you think that multiple conversations start with the exact same message? What about the exact same three messages?

另一种方法是使用语义缓存,即根据输入的含义而不是确切的输入本身来缓存响应。这在某些情况下有效,但在其他情况下无效。

¥An alternative approach is to use semantic caching, where you cache responses based on the meaning of the input rather than the exact input itself. This can be effective in some situations, but not in others.

语义缓存会在应用的关键路径上引入对另一个模型的依赖(例如,语义缓存可能依赖于 嵌入模型 将文本转换为向量表示),并且无法保证准确捕获输入的含义。

¥A semantic cache introduces a dependency on another model on the critical path of your application (e.g., the semantic cache may rely on an embedding model to convert text to a vector representation), and it's not guaranteed to capture the meaning of the input accurately.

但是,在某些情况下,缓存聊天模型响应可能会有所帮助。例如,如果你有一个用于回答常见问题的聊天模型,缓存响应可以帮助减轻模型提供程序的负载并缩短响应时间。

¥However, there might be situations where caching chat model responses is beneficial. For example, if you have a chat model that is used to answer frequently asked questions, caching responses can help reduce the load on the model provider and improve response times.

有关更多详情,请参阅 如何缓存聊天模型响应 指南。

¥Please see the how to cache chat model responses guide for more details.

¥Related resources

概念指南

¥Conceptual guides