如何嵌入文本数据

¥How to embed text data

info

前往集成获取有关与文本嵌入提供程序内置集成的文档。

¥Head to Integrations for documentation on built-in integrations with text embedding providers.

Prerequisites

本指南假设你熟悉以下概念：

¥This guide assumes familiarity with the following concepts:

嵌入
¥Embeddings

嵌入创建一段文本的向量表示。这很有用，因为它意味着我们可以在向量空间中思考文本，并执行诸如语义搜索之类的操作，即在向量空间中查找最相似的文本片段。

¥Embeddings create a vector representation of a piece of text. This is useful because it means we can think about text in the vector space, and do things like semantic search where we look for pieces of text that are most similar in the vector space.

LangChain 中的基础 Embeddings 类公开了两种方法：一个用于嵌入文档，另一个用于嵌入查询。前者接受多个文本作为输入，而后者接受单个文本作为输入。将它们作为两种独立方法的原因是，某些嵌入提供商对文档（待搜索）和查询（搜索查询本身）有不同的嵌入方法。

¥The base Embeddings class in LangChain exposes two methods: one for embedding documents and one for embedding a query. The former takes as input multiple texts, while the latter takes a single text. The reason for having these as two separate methods is that some embedding providers have different embedding methods for documents (to be searched over) vs queries (the search query itself).

开始使用

¥Get started

下面是如何使用 OpenAI 嵌入的示例。嵌入有时针对查询和文档有不同的嵌入方法，因此嵌入类公开了 embedQuery 和 embedDocuments 方法。

¥Below is an example of how to use the OpenAI embeddings. Embeddings occasionally have different embedding methods for queries versus documents, so the embedding class exposes a embedQuery and embedDocuments method.

tip

请参阅此部分包含有关安装集成包的一般说明。

¥See this section for general instructions on installing integration packages.

npm
Yarn
pnpm

npm install @langchain/openai @langchain/core

yarn add @langchain/openai @langchain/core

pnpm add @langchain/openai @langchain/core

开始使用

¥Get started

import { OpenAIEmbeddings } from "@langchain/openai";

const embeddings = new OpenAIEmbeddings();

嵌入查询

¥Embed queries

const res = await embeddings.embedQuery("Hello world");
/*
[
   -0.004845875,   0.004899438,  -0.016358767,  -0.024475135, -0.017341806,
    0.012571548,  -0.019156644,   0.009036391,  -0.010227379, -0.026945334,
    0.022861943,   0.010321903,  -0.023479493, -0.0066544134,  0.007977734,
   0.0026371893,   0.025206111,  -0.012048521,   0.012943339,  0.013094575,
   -0.010580265,  -0.003509951,   0.004070787,   0.008639394, -0.020631202,
  ... 1511 more items
]
*/

嵌入文档

¥Embed documents

const documentRes = await embeddings.embedDocuments(["Hello world", "Bye bye"]);
/*
[
  [
    -0.004845875,   0.004899438,  -0.016358767,  -0.024475135, -0.017341806,
      0.012571548,  -0.019156644,   0.009036391,  -0.010227379, -0.026945334,
      0.022861943,   0.010321903,  -0.023479493, -0.0066544134,  0.007977734,
    0.0026371893,   0.025206111,  -0.012048521,   0.012943339,  0.013094575,
    -0.010580265,  -0.003509951,   0.004070787,   0.008639394, -0.020631202,
    ... 1511 more items
  ]
  [
      -0.009446913,  -0.013253193,   0.013174579,  0.0057552797,  -0.038993083,
      0.0077763423,    -0.0260478, -0.0114384955, -0.0022683728,  -0.016509168,
      0.041797023,    0.01787183,    0.00552271, -0.0049789557,   0.018146982,
      -0.01542166,   0.033752076,   0.006112323,   0.023872782,  -0.016535373,
      -0.006623321,   0.016116094, -0.0061090477, -0.0044155475,  -0.016627092,
    ... 1511 more items
  ]
]
*/

后续步骤

¥Next steps

现在你已经学习了如何将嵌入模型与查询和文本一起使用。

¥You've now learned how to use embeddings models with queries and text.

接下来，查看如何执行避免使用缓存过度重新计算嵌入或检索增强生成的完整教程。

¥Next, check out how to avoid excessively recomputing embeddings with caching, or the full tutorial on retrieval-augmented generation.

如何嵌入文本数据

开始使用​

开始使用​

嵌入查询​

嵌入文档​

后续步骤​

开始使用

开始使用

嵌入查询

嵌入文档

后续步骤