Azure Cosmos DB for NoSQL
Azure Cosmos DB for NoSQL 支持使用灵活的模式查询项目,并原生支持 JSON。现在提供向量索引和搜索功能。此功能旨在处理高维向量,从而实现在任何规模下高效准确的向量搜索。现在你可以将向量与数据一起直接存储在文档中。数据库中的每个文档不仅可以包含传统的无模式数据,还可以包含高维向量作为文档的其他属性。
¥Azure Cosmos DB for NoSQL provides support for querying items with flexible schemas and native support for JSON. It now offers vector indexing and search. This feature is designed to handle high-dimensional vectors, enabling efficient and accurate vector search at any scale. You can now store vectors directly in the documents alongside your data. Each document in your database can contain not only traditional schema-free data, but also high-dimensional vectors as other properties of the documents.
了解如何利用 Azure Cosmos DB for NoSQL 的向量搜索功能(此页面 版本)。如果你没有 Azure 账户,可以使用 创建免费账户 开始使用。
¥Learn how to leverage the vector search capabilities of Azure Cosmos DB for NoSQL from this page. If you don't have an Azure account, you can create a free account to get started.
设置
¥Setup
你首先需要安装 @langchain/azure-cosmosdb
软件包:
¥You'll first need to install the @langchain/azure-cosmosdb
package:
- npm
- Yarn
- pnpm
npm install @langchain/azure-cosmosdb @langchain/core
yarn add @langchain/azure-cosmosdb @langchain/core
pnpm add @langchain/azure-cosmosdb @langchain/core
你还需要运行一个 Azure Cosmos DB for NoSQL 实例。你可以按照 此指南 在 Azure 门户上免费部署一个版本。
¥You'll also need to have an Azure Cosmos DB for NoSQL instance running. You can deploy a free version on Azure Portal without any cost, following this guide.
实例运行后,请确保你拥有连接字符串。你可以在 Azure 门户中实例的 "设置/密钥" 部分下找到它们。然后,你需要设置以下环境变量:
¥Once you have your instance running, make sure you have the connection string. You can find them in the Azure Portal, under the "Settings / Keys" section of your instance. Then you need to set the following environment variables:
# Use connection string to authenticate
AZURE_COSMOSDB_NOSQL_CONNECTION_STRING=
# Use managed identity to authenticate
AZURE_COSMOSDB_NOSQL_ENDPOINT=
API Reference:
使用 Azure 托管身份
¥Using Azure Managed Identity
如果你使用的是 Azure 托管身份,你可以像这样配置凭据:
¥If you're using Azure Managed Identity, you can configure the credentials like this:
import { AzureCosmosDBNoSQLVectorStore } from "@langchain/azure-cosmosdb";
import { OpenAIEmbeddings } from "@langchain/openai";
// Create Azure Cosmos DB vector store
const store = new AzureCosmosDBNoSQLVectorStore(new OpenAIEmbeddings(), {
// Or use environment variable AZURE_COSMOSDB_NOSQL_ENDPOINT
endpoint: "https://my-cosmosdb.documents.azure.com:443/",
// Database and container must already exist
databaseName: "my-database",
containerName: "my-container",
});
API Reference:
- AzureCosmosDBNoSQLVectorStore from
@langchain/azure-cosmosdb
- OpenAIEmbeddings from
@langchain/openai
使用 Azure 托管标识和基于角色的访问控制时,必须确保已预先创建数据库和容器。RBAC 不提供创建数据库和容器的权限。你可以在 Azure Cosmos DB 文档 中获取有关权限模型的更多信息。
¥When using Azure Managed Identity and role-based access control, you must ensure that the database and container have been created beforehand. RBAC does not provide permissions to create databases and containers. You can get more information about the permission model in the Azure Cosmos DB documentation.
用法示例
¥Usage example
以下示例从 Azure Cosmos DB for NoSQL 中的文件索引文档,运行向量搜索查询,并最终使用链式查询,根据检索到的文档以自然语言回答问题。
¥Below is an example that indexes documents from a file in Azure Cosmos DB for NoSQL, runs a vector search query, and finally uses a chain to answer a question in natural language based on the retrieved documents.
import { AzureCosmosDBNoSQLVectorStore } from "@langchain/azure-cosmosdb";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai";
import { createStuffDocumentsChain } from "langchain/chains/combine_documents";
import { createRetrievalChain } from "langchain/chains/retrieval";
import { TextLoader } from "langchain/document_loaders/fs/text";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
// Load documents from file
const loader = new TextLoader("./state_of_the_union.txt");
const rawDocuments = await loader.load();
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 0,
});
const documents = await splitter.splitDocuments(rawDocuments);
// Create Azure Cosmos DB vector store
const store = await AzureCosmosDBNoSQLVectorStore.fromDocuments(
documents,
new OpenAIEmbeddings(),
{
databaseName: "langchain",
containerName: "documents",
}
);
// Performs a similarity search
const resultDocuments = await store.similaritySearch(
"What did the president say about Ketanji Brown Jackson?"
);
console.log("Similarity search results:");
console.log(resultDocuments[0].pageContent);
/*
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
*/
// Use the store as part of a chain
const model = new ChatOpenAI({ model: "gpt-3.5-turbo-1106" });
const questionAnsweringPrompt = ChatPromptTemplate.fromMessages([
[
"system",
"Answer the user's questions based on the below context:\n\n{context}",
],
["human", "{input}"],
]);
const combineDocsChain = await createStuffDocumentsChain({
llm: model,
prompt: questionAnsweringPrompt,
});
const chain = await createRetrievalChain({
retriever: store.asRetriever(),
combineDocsChain,
});
const res = await chain.invoke({
input: "What is the president's top priority regarding prices?",
});
console.log("Chain response:");
console.log(res.answer);
/*
The president's top priority is getting prices under control.
*/
// Clean up
await store.delete();
API Reference:
- AzureCosmosDBNoSQLVectorStore from
@langchain/azure-cosmosdb
- ChatPromptTemplate from
@langchain/core/prompts
- ChatOpenAI from
@langchain/openai
- OpenAIEmbeddings from
@langchain/openai
- createStuffDocumentsChain from
langchain/chains/combine_documents
- createRetrievalChain from
langchain/chains/retrieval
- TextLoader from
langchain/document_loaders/fs/text
- RecursiveCharacterTextSplitter from
@langchain/textsplitters
相关
¥Related
向量存储 概念指南
¥Vector store conceptual guide
向量存储 操作指南
¥Vector store how-to guides