Azure AI 搜索
¥Azure AI Search
Azure AI 搜索(以前称为 Azure 搜索和 Azure 认知搜索)是一个分布式 RESTful 搜索引擎,针对 Azure 上生产规模工作负载的速度和相关性进行了优化。它还支持使用 k-最近邻 (kNN) 算法和 语义搜索 算法进行向量搜索。
¥Azure AI Search (formerly known as Azure Search and Azure Cognitive Search) is a distributed, RESTful search engine optimized for speed and relevance on production-scale workloads on Azure. It supports also vector search using the k-nearest neighbor (kNN) algorithm and also semantic search.
此向量存储集成支持全文搜索、向量搜索和 混合搜索以获得最佳排名性能。
¥This vector store integration supports full text search, vector search and hybrid search for best ranking performance.
了解如何利用 Azure AI Search 的向量搜索功能(此页面 版本)。如果你没有 Azure 账户,可以使用 创建免费账户 开始使用。
¥Learn how to leverage the vector search capabilities of Azure AI Search from this page. If you don't have an Azure account, you can create a free account to get started.
设置
¥Setup
你首先需要安装 @azure/search-documents
SDK 和 @langchain/community
软件包:
¥You'll first need to install the @azure/search-documents
SDK and the @langchain/community
package:
- npm
- Yarn
- pnpm
npm install -S @langchain/community @langchain/core @azure/search-documents
yarn add @langchain/community @langchain/core @azure/search-documents
pnpm add @langchain/community @langchain/core @azure/search-documents
你还需要运行 Azure AI Search 实例。你可以按照 此指南 在 Azure 门户上免费部署一个版本。
¥You'll also need to have an Azure AI Search instance running. You can deploy a free version on Azure Portal without any cost, following this guide.
实例运行后,请确保你拥有端点和管理密钥(查询密钥只能用于搜索文档,不能用于索引、更新或删除)。端点是你的实例的 URL,你可以在 Azure 门户的实例的 "概述" 部分下找到它。管理员密钥位于实例的 "密钥" 部分下。然后,你需要设置以下环境变量:
¥Once you have your instance running, make sure you have the endpoint and the admin key (query keys can be used only to search document, not to index, update or delete). The endpoint is the URL of your instance which you can find in the Azure Portal, under the "Overview" section of your instance. The admin key can be found under the "Keys" section of your instance. Then you need to set the following environment variables:
# Azure AI Search connection settings
AZURE_AISEARCH_ENDPOINT=
AZURE_AISEARCH_KEY=
# If you're using Azure OpenAI API, you'll need to set these variables
AZURE_OPENAI_API_KEY=
AZURE_OPENAI_API_INSTANCE_NAME=
AZURE_OPENAI_API_DEPLOYMENT_NAME=
AZURE_OPENAI_API_EMBEDDINGS_DEPLOYMENT_NAME=
AZURE_OPENAI_API_VERSION=
# Or you can use the OpenAI API directly
OPENAI_API_KEY=
API Reference:
关于混合搜索
¥About hybrid search
混合搜索是一项结合全文搜索和向量搜索优势的功能,可提供最佳的排名性能。它在 Azure AI Search 向量存储中默认启用,但你可以在创建向量存储时通过设置 search.type
属性来选择其他搜索查询类型。
¥Hybrid search is a feature that combines the strengths of full text search and vector search to provide the best ranking performance. It's enabled by default in Azure AI Search vector stores, but you can select a different search query type by setting the search.type
property when creating the vector store.
你可以在 官方文档 中阅读更多关于混合搜索及其如何改善搜索结果的信息。
¥You can read more about hybrid search and how it may improve your search results in the official documentation.
在某些场景下,例如检索增强生成 (RAG),你可能希望除了混合搜索之外还启用语义排名,以提高搜索结果的相关性。你可以在创建向量存储时将 search.type
属性设置为 AzureAISearchQueryType.SemanticHybrid
来启用语义排名。请注意,语义排名功能仅在基本版及更高定价层级中可用,并受 区域可用性 约束。
¥In some scenarios like retrieval-augmented generation (RAG), you may want to enable semantic ranking in addition to hybrid search to improve the relevance of the search results. You can enable semantic ranking by setting the search.type
property to AzureAISearchQueryType.SemanticHybrid
when creating the vector store.
Note that semantic ranking capabilities are only available in the Basic and higher pricing tiers, and subject to regional availability.
你可以在 此博客文章 中阅读更多关于使用语义排名进行混合搜索的性能的信息。
¥You can read more about the performance of using semantic ranking with hybrid search in this blog post.
示例:索引文档、向量搜索和 LLM 集成
¥Example: index docs, vector search and LLM integration
下面是示例,它从 Azure AI 搜索中的文件中索引文档,运行混合搜索查询,并最终使用链根据检索到的文档以自然语言回答问题。
¥Below is an example that indexes documents from a file in Azure AI Search, runs a hybrid search query, and finally uses a chain to answer a question in natural language based on the retrieved documents.
import {
AzureAISearchVectorStore,
AzureAISearchQueryType,
} from "@langchain/community/vectorstores/azure_aisearch";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai";
import { createStuffDocumentsChain } from "langchain/chains/combine_documents";
import { createRetrievalChain } from "langchain/chains/retrieval";
import { TextLoader } from "langchain/document_loaders/fs/text";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
// Load documents from file
const loader = new TextLoader("./state_of_the_union.txt");
const rawDocuments = await loader.load();
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 0,
});
const documents = await splitter.splitDocuments(rawDocuments);
// Create Azure AI Search vector store
const store = await AzureAISearchVectorStore.fromDocuments(
documents,
new OpenAIEmbeddings(),
{
search: {
type: AzureAISearchQueryType.SimilarityHybrid,
},
}
);
// The first time you run this, the index will be created.
// You may need to wait a bit for the index to be created before you can perform
// a search, or you can create the index manually beforehand.
// Performs a similarity search
const resultDocuments = await store.similaritySearch(
"What did the president say about Ketanji Brown Jackson?"
);
console.log("Similarity search results:");
console.log(resultDocuments[0].pageContent);
/*
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
*/
// Use the store as part of a chain
const model = new ChatOpenAI({ model: "gpt-3.5-turbo-1106" });
const questionAnsweringPrompt = ChatPromptTemplate.fromMessages([
[
"system",
"Answer the user's questions based on the below context:\n\n{context}",
],
["human", "{input}"],
]);
const combineDocsChain = await createStuffDocumentsChain({
llm: model,
prompt: questionAnsweringPrompt,
});
const chain = await createRetrievalChain({
retriever: store.asRetriever(),
combineDocsChain,
});
const response = await chain.invoke({
input: "What is the president's top priority regarding prices?",
});
console.log("Chain response:");
console.log(response.answer);
/*
The president's top priority is getting prices under control.
*/
API Reference:
- AzureAISearchVectorStore from
@langchain/community/vectorstores/azure_aisearch
- AzureAISearchQueryType from
@langchain/community/vectorstores/azure_aisearch
- ChatPromptTemplate from
@langchain/core/prompts
- ChatOpenAI from
@langchain/openai
- OpenAIEmbeddings from
@langchain/openai
- createStuffDocumentsChain from
langchain/chains/combine_documents
- createRetrievalChain from
langchain/chains/retrieval
- TextLoader from
langchain/document_loaders/fs/text
- RecursiveCharacterTextSplitter from
@langchain/textsplitters
相关
¥Related
向量存储 概念指南
¥Vector store conceptual guide
向量存储 操作指南
¥Vector store how-to guides