Typesense
使用 Typesense 搜索引擎的向量存储。
¥Vector store that utilizes the Typesense search engine.
基本用法
¥Basic Usage
- npm
- Yarn
- pnpm
npm install @langchain/openai @langchain/community @langchain/core
yarn add @langchain/openai @langchain/community @langchain/core
pnpm add @langchain/openai @langchain/community @langchain/core
import {
Typesense,
TypesenseConfig,
} from "@lanchain/community/vectorstores/typesense";
import { OpenAIEmbeddings } from "@langchain/openai";
import { Client } from "typesense";
import { Document } from "@langchain/core/documents";
const vectorTypesenseClient = new Client({
nodes: [
{
// Ideally should come from your .env file
host: "...",
port: 123,
protocol: "https",
},
],
// Ideally should come from your .env file
apiKey: "...",
numRetries: 3,
connectionTimeoutSeconds: 60,
});
const typesenseVectorStoreConfig = {
// Typesense client
typesenseClient: vectorTypesenseClient,
// Name of the collection to store the vectors in
schemaName: "your_schema_name",
// Optional column names to be used in Typesense
columnNames: {
// "vec" is the default name for the vector column in Typesense but you can change it to whatever you want
vector: "vec",
// "text" is the default name for the text column in Typesense but you can change it to whatever you want
pageContent: "text",
// Names of the columns that you will save in your typesense schema and need to be retrieved as metadata when searching
metadataColumnNames: ["foo", "bar", "baz"],
},
// Optional search parameters to be passed to Typesense when searching
searchParams: {
q: "*",
filter_by: "foo:[fooo]",
query_by: "",
},
// You can override the default Typesense import function if you want to do something more complex
// Default import function:
// async importToTypesense<
// T extends Record<string, unknown> = Record<string, unknown>
// >(data: T[], collectionName: string) {
// const chunkSize = 2000;
// for (let i = 0; i < data.length; i += chunkSize) {
// const chunk = data.slice(i, i + chunkSize);
// await this.caller.call(async () => {
// await this.client
// .collections<T>(collectionName)
// .documents()
// .import(chunk, { action: "emplace", dirty_values: "drop" });
// });
// }
// }
import: async (data, collectionName) => {
await vectorTypesenseClient
.collections(collectionName)
.documents()
.import(data, { action: "emplace", dirty_values: "drop" });
},
} satisfies TypesenseConfig;
/**
* Creates a Typesense vector store from a list of documents.
* Will update documents if there is a document with the same id, at least with the default import function.
* @param documents list of documents to create the vector store from
* @returns Typesense vector store
*/
const createVectorStoreWithTypesense = async (documents: Document[] = []) =>
Typesense.fromDocuments(
documents,
new OpenAIEmbeddings(),
typesenseVectorStoreConfig
);
/**
* Returns a Typesense vector store from an existing index.
* @returns Typesense vector store
*/
const getVectorStoreWithTypesense = async () =>
new Typesense(new OpenAIEmbeddings(), typesenseVectorStoreConfig);
// Do a similarity search
const vectorStore = await getVectorStoreWithTypesense();
const documents = await vectorStore.similaritySearch("hello world");
// Add filters based on metadata with the search parameters of Typesense
// will exclude documents with author:JK Rowling, so if Joe Rowling & JK Rowling exists, only Joe Rowling will be returned
vectorStore.similaritySearch("Rowling", undefined, {
filter_by: "author:!=JK Rowling",
});
// Delete a document
vectorStore.deleteDocuments(["document_id_1", "document_id_2"]);
构造函数
¥Constructor
开始之前,请在 Typesense 中创建一个包含 ID、向量字段和文本字段的架构。根据需要为元数据添加尽可能多的其他字段。
¥Before starting, create a schema in Typesense with an id, a field for the vector and a field for the text. Add as many other fields as needed for the metadata.
constructor(embeddings: Embeddings, config: TypesenseConfig)
:构造Typesense
类的新实例。¥
constructor(embeddings: Embeddings, config: TypesenseConfig)
: Constructs a new instance of theTypesense
class.embeddings
:用于嵌入文档的Embeddings
类的实例。¥
embeddings
: An instance of theEmbeddings
class used for embedding documents.config
:Typesense 向量存储的配置对象。¥
config
: Configuration object for the Typesense vector store.typesenseClient
:Typesense 客户端实例。¥
typesenseClient
: Typesense client instance.schemaName
:用于存储和搜索文档的 Typesense 架构名称。¥
schemaName
: Name of the Typesense schema in which documents will be stored and searched.searchParams
(可选):Typesense 搜索参数。默认为{ q: '*', per_page: 5, query_by: '' }
。¥
searchParams
(optional): Typesense search parameters. Default is{ q: '*', per_page: 5, query_by: '' }
.columnNames
(可选):列名配置。¥
columnNames
(optional): Column names configuration.vector
(可选):向量列名。默认为'vec'
。¥
vector
(optional): Vector column name. Default is'vec'
.pageContent
(可选):页面内容列名称。默认为'text'
。¥
pageContent
(optional): Page content column name. Default is'text'
.metadataColumnNames
(可选):元数据列名称。默认值为空数组[]
。¥
metadataColumnNames
(optional): Metadata column names. Default is an empty array[]
.
import
(可选):替换了将数据导入 Typesense 的默认导入函数。这可能会影响更新文档的功能。¥
import
(optional): Replace the default import function for importing data to Typesense. This can affect the functionality of updating documents.
方法
¥Methods
async addDocuments(documents: Document[]): Promise<void>
:将文档添加到向量存储。如果存在具有相同 ID 的文档,则会更新文档。¥
async addDocuments(documents: Document[]): Promise<void>
: Adds documents to the vector store. The documents will be updated if there is a document with the same ID.static async fromDocuments(docs: Document[], embeddings: Embeddings, config: TypesenseConfig): Promise<Typesense>
:根据文档列表创建 Typesense 向量存储。文档在构建过程中被添加到向量存储中。¥
static async fromDocuments(docs: Document[], embeddings: Embeddings, config: TypesenseConfig): Promise<Typesense>
: Creates a Typesense vector store from a list of documents. Documents are added to the vector store during construction.static async fromTexts(texts: string[], metadatas: object[], embeddings: Embeddings, config: TypesenseConfig): Promise<Typesense>
:根据文本和相关元数据列表创建 Typesense 向量存储。文本在构建过程中转换为文档并添加到向量存储中。¥
static async fromTexts(texts: string[], metadatas: object[], embeddings: Embeddings, config: TypesenseConfig): Promise<Typesense>
: Creates a Typesense vector store from a list of texts and associated metadata. Texts are converted to documents and added to the vector store during construction.async similaritySearch(query: string, k?: number, filter?: Record<string, unknown>): Promise<Document[]>
:根据查询搜索相似的文档。返回类似文档的数组。¥
async similaritySearch(query: string, k?: number, filter?: Record<string, unknown>): Promise<Document[]>
: Searches for similar documents based on a query. Returns an array of similar documents.async deleteDocuments(documentIds: string[]): Promise<void>
:根据文档的 ID 从向量存储中删除文档。¥
async deleteDocuments(documentIds: string[]): Promise<void>
: Deletes documents from the vector store based on their IDs.
相关
¥Related
向量存储 概念指南
¥Vector store conceptual guide
向量存储 操作指南
¥Vector store how-to guides