如何创建和查询向量存储
¥How to create and query vector stores
前往 集成 获取有关与矢量存储提供程序内置集成的文档。
¥Head to Integrations for documentation on built-in integrations with vectorstore providers.
本指南假设你熟悉以下概念:
¥This guide assumes familiarity with the following concepts:
存储和搜索非结构化数据的最常见方法之一是将其嵌入并存储生成的嵌入向量,然后在查询时嵌入非结构化查询并检索与嵌入查询匹配的嵌入向量。向量存储负责存储嵌入数据并为你执行向量搜索。
¥One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query. A vector store takes care of storing embedded data and performing vector search for you.
本教程使用一个名为 MemoryVectorStore
的基本、未优化的实现,它将嵌入存储在内存中,并对最相似的嵌入进行精确的线性搜索。LangChain 包含许多内置集成 - 更多信息请参阅 此部分 或 完整的集成列表。
¥This walkthrough uses a basic, unoptimized implementation called MemoryVectorStore
that stores embeddings in-memory and does an exact, linear search for the most similar embeddings.
LangChain contains many built-in integrations - see this section for more, or the full list of integrations.
创建新索引
¥Creating a new index
大多数情况下,你需要加载并准备要搜索的数据。以下是从文件加载最近语音的示例:
¥Most of the time, you'll need to load and prepare the data you want to search over. Here's an example that loads a recent speech from a file:
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings } from "@langchain/openai";
import { TextLoader } from "langchain/document_loaders/fs/text";
// Create docs with a loader
const loader = new TextLoader("src/document_loaders/example_data/example.txt");
const docs = await loader.load();
// Load the docs into the vector store
const vectorStore = await MemoryVectorStore.fromDocuments(
docs,
new OpenAIEmbeddings()
);
// Search for the most similar document
const resultOne = await vectorStore.similaritySearch("hello world", 1);
console.log(resultOne);
/*
[
Document {
pageContent: "Hello world",
metadata: { id: 2 }
}
]
*/
API Reference:
- MemoryVectorStore from
langchain/vectorstores/memory
- OpenAIEmbeddings from
@langchain/openai
- TextLoader from
langchain/document_loaders/fs/text
大多数情况下,你需要将加载的文本拆分为准备步骤。请参阅 此部分 了解更多关于文本分割器的信息。
¥Most of the time, you'll need to split the loaded text as a preparation step. See this section to learn more about text splitters.
从文本创建新索引
¥Creating a new index from texts
如果你已经准备好要搜索的数据,则可以直接从文本块初始化向量存储:
¥If you have already prepared the data you want to search over, you can initialize a vector store directly from text chunks:
- npm
- Yarn
- pnpm
npm install @langchain/openai @langchain/core
yarn add @langchain/openai @langchain/core
pnpm add @langchain/openai @langchain/core
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings } from "@langchain/openai";
const vectorStore = await MemoryVectorStore.fromTexts(
["Hello world", "Bye bye", "hello nice world"],
[{ id: 2 }, { id: 1 }, { id: 3 }],
new OpenAIEmbeddings()
);
const resultOne = await vectorStore.similaritySearch("hello world", 1);
console.log(resultOne);
/*
[
Document {
pageContent: "Hello world",
metadata: { id: 2 }
}
]
*/
API Reference:
- MemoryVectorStore from
langchain/vectorstores/memory
- OpenAIEmbeddings from
@langchain/openai
应该选择哪一个?
¥Which one to pick?
以下是一份快速指南,可帮助你根据用例选择合适的向量存储:
¥Here's a quick guide to help you pick the right vector store for your use case:
如果你需要能够在 Node.js 应用内部(内存中)运行,无需任何其他服务器,那么请选择 HNSWLib、Faiss、LanceDB 或 CloseVector。
¥If you're after something that can just run inside your Node.js application, in-memory, without any other servers to stand up, then go for HNSWLib, Faiss, LanceDB or CloseVector
如果你正在寻找一个可以在类似浏览器的环境中在内存中运行的数据库,那么 MemoryVectorStore 或 CloseVector 是你的理想之选。
¥If you're looking for something that can run in-memory in browser-like environments, then go for MemoryVectorStore or CloseVector
如果你来自 Python 并且正在寻找类似于 FAISS 的功能,请尝试 HNSWLib 或 Faiss
¥If you come from Python and you were looking for something similar to FAISS, try HNSWLib or Faiss
如果你正在寻找一个可以在 Docker 容器中本地运行的开源全功能矢量数据库,那么 Chroma 是你的理想之选。
¥If you're looking for an open-source full-featured vector database that you can run locally in a docker container, then go for Chroma
如果你正在寻找一个提供低延迟、本地文档嵌入并支持边缘应用的开源矢量数据库,那么 Zep 是你的理想之选。
¥If you're looking for an open-source vector database that offers low-latency, local embedding of documents and supports apps on the edge, then go for Zep
如果你正在寻找一个可以在本地(Docker 容器中)或云端托管的开源生产级矢量数据库,那么 Weaviate 是你的理想之选。
¥If you're looking for an open-source production-ready vector database that you can run locally (in a docker container) or hosted in the cloud, then go for Weaviate.
如果你已经在使用 Supabase,请查看 Supabase 向量存储,以便也使用相同的 Postgres 数据库进行嵌入。
¥If you're using Supabase already then look at the Supabase vector store to use the same Postgres database for your embeddings too
如果你正在寻找一款无需自行托管、可立即投入生产的矢量存储,Pinecone 是你的理想之选。
¥If you're looking for a production-ready vector store you don't have to worry about hosting yourself, then go for Pinecone
如果你已经在使用 SingleStore,或者你需要一个分布式高性能数据库,你可能需要考虑 SingleStore 向量存储。
¥If you are already utilizing SingleStore, or if you find yourself in need of a distributed, high-performance database, you might want to consider the SingleStore vector store.
如果你正在寻找在线 MPP(大规模并行处理)数据仓库服务,你可能需要考虑 AnalyticDB 向量存储。
¥If you are looking for an online MPP (Massively Parallel Processing) data warehousing service, you might want to consider the AnalyticDB vector store.
如果你正在寻找一款经济实惠、支持使用 SQL 进行矢量搜索的矢量数据库,MyScale 是你的最佳选择。
¥If you're in search of a cost-effective vector database that allows run vector search with SQL, look no further than MyScale.
如果你正在寻找一款可以从浏览器和服务器端加载的矢量数据库,CloseVector 是你的最佳选择。这是一个旨在跨平台的向量数据库。
¥If you're in search of a vector database that you can load from both the browser and server side, check out CloseVector. It's a vector database that aims to be cross-platform.
如果你正在寻找一款可扩展、开源、具有出色分析查询性能的列式数据库,ClickHouse 是你的理想之选。
¥If you're looking for a scalable, open-source columnar database with excellent performance for analytical queries, then consider ClickHouse.
后续步骤
¥Next steps
现在你已经学习了如何将数据加载到向量存储中。
¥You've now learned how to load data into a vectorstore.
接下来,查看 检索增强生成的完整教程。
¥Next, check out the full tutorial on retrieval-augmented generation.