Vectara
Vectara 是一个用于构建 GenAI 应用的平台。它提供了一个易于使用的 API 用于文档索引和查询,该 API 由 Vectara 管理,并针对性能和准确性进行了优化。
¥Vectara is a platform for building GenAI applications. It provides an easy-to-use API for document indexing and querying that is managed by Vectara and is optimized for performance and accuracy.
你可以将 Vectara 用作 LangChain.js 的向量存储。
¥You can use Vectara as a vector store with LangChain.js.
👉 包含的嵌入
¥👉 Embeddings Included
Vectara 内部使用自己的嵌入,因此你无需自己提供任何嵌入或调用其他服务来获取嵌入。
¥Vectara uses its own embeddings under the hood, so you don't have to provide any yourself or call another service to obtain embeddings.
这也意味着,如果你提供自己的嵌入,它们将没有任何作用。
¥This also means that if you provide your own embeddings, they'll be a no-op.
const store = await VectaraStore.fromTexts(
["hello world", "hi there"],
[{ foo: "bar" }, { foo: "baz" }],
// This won't have an effect. Provide a FakeEmbeddings instance instead for clarity.
new OpenAIEmbeddings(),
args
);
设置
¥Setup
你需要:
¥You'll need to:
创建一个 免费 Vectara 账户。
¥Create a free Vectara account.
创建一个 corpus 来存储你的数据
¥Create a corpus to store your data
创建一个具有 QueryService 和 IndexService 访问权限的 API 密钥,以便你可以访问此语料库。
¥Create an API key with QueryService and IndexService access so you can access this corpus
配置你的 .env
文件或提供参数以将 LangChain 连接到你的 Vectara 语料库:
¥Configure your .env
file or provide args to connect LangChain to your Vectara corpus:
VECTARA_CUSTOMER_ID=your_customer_id
VECTARA_CORPUS_ID=your_corpus_id
VECTARA_API_KEY=your-vectara-api-key
请注意,你可以提供多个用逗号分隔的语料库 ID,以便一次查询多个语料库。例如:VECTARA_CORPUS_ID=3,8,9,43
。要索引多个语料库,你需要为每个语料库创建一个单独的 VectaraStore 实例。
¥Note that you can provide multiple corpus IDs separated by commas for querying multiple corpora at once. For example: VECTARA_CORPUS_ID=3,8,9,43
.
For indexing multiple corpora, you'll need to create a separate VectaraStore instance for each corpus.
用法
¥Usage
import { VectaraStore } from "@langchain/community/vectorstores/vectara";
import { VectaraSummaryRetriever } from "@langchain/community/retrievers/vectara_summary";
import { Document } from "@langchain/core/documents";
// Create the Vectara store.
const store = new VectaraStore({
customerId: Number(process.env.VECTARA_CUSTOMER_ID),
corpusId: Number(process.env.VECTARA_CORPUS_ID),
apiKey: String(process.env.VECTARA_API_KEY),
verbose: true,
});
// Add two documents with some metadata.
const doc_ids = await store.addDocuments([
new Document({
pageContent: "Do I dare to eat a peach?",
metadata: {
foo: "baz",
},
}),
new Document({
pageContent: "In the room the women come and go talking of Michelangelo",
metadata: {
foo: "bar",
},
}),
]);
// Perform a similarity search.
const resultsWithScore = await store.similaritySearchWithScore(
"What were the women talking about?",
1,
{
lambda: 0.025,
}
);
// Print the results.
console.log(JSON.stringify(resultsWithScore, null, 2));
/*
[
[
{
"pageContent": "In the room the women come and go talking of Michelangelo",
"metadata": {
"lang": "eng",
"offset": "0",
"len": "57",
"foo": "bar"
}
},
0.4678752
]
]
*/
const retriever = new VectaraSummaryRetriever({ vectara: store, topK: 3 });
const documents = await retriever.invoke("What were the women talking about?");
console.log(JSON.stringify(documents, null, 2));
/*
[
{
"pageContent": "<b>In the room the women come and go talking of Michelangelo</b>",
"metadata": {
"lang": "eng",
"offset": "0",
"len": "57",
"foo": "bar"
}
},
{
"pageContent": "<b>In the room the women come and go talking of Michelangelo</b>",
"metadata": {
"lang": "eng",
"offset": "0",
"len": "57",
"foo": "bar"
}
},
{
"pageContent": "<b>In the room the women come and go talking of Michelangelo</b>",
"metadata": {
"lang": "eng",
"offset": "0",
"len": "57",
"foo": "bar"
}
}
]
*/
// Delete the documents.
await store.deleteDocuments(doc_ids);
API Reference:
- VectaraStore from
@langchain/community/vectorstores/vectara
- VectaraSummaryRetriever from
@langchain/community/retrievers/vectara_summary
- Document from
@langchain/core/documents
请注意,lambda
是一个与 Vectara 混合搜索功能相关的参数,它在神经搜索和布尔/精确匹配之间进行权衡,如 此处 所述。我们建议将默认值设为 0.025,同时为高级用户提供自定义此值的方法。
¥Note that lambda
is a parameter related to Vectara's hybrid search capbility, providing a tradeoff between neural search and boolean/exact match as described here. We recommend the value of 0.025 as a default, while providing a way for advanced users to customize this value if needed.
APIs
Vectara 的 LangChain 向量存储使用 Vectara 的核心 API:
¥Vectara's LangChain vector store consumes Vectara's core APIs:
索引 API 用于将文档存储在 Vectara 语料库中。
¥Indexing API for storing documents in a Vectara corpus.
搜索 API 提供查询此数据的信息。此 API 支持混合搜索。
¥Search API for querying this data. This API supports hybrid search.
相关
¥Related
向量存储 概念指南
¥Vector store conceptual guide
向量存储 操作指南
¥Vector store how-to guides