Supabase 混合搜索
¥Supabase Hybrid Search
Langchain 支持使用 Supabase Postgres 数据库进行混合搜索。混合搜索结合了 Postgres pgvector
扩展(相似性搜索)和全文搜索(关键字搜索)来检索文档。你可以通过 SupabaseVectorStore addDocuments
函数添加文档。SupabaseHybridKeyWordSearch 接受嵌入、supabase 客户端、相似度搜索结果数和关键词搜索结果数作为参数。getRelevantDocuments
函数生成一个已删除重复项并按相关性得分排序的文档列表。
¥Langchain supports hybrid search with a Supabase Postgres database. The hybrid search combines the postgres pgvector
extension (similarity search) and Full-Text Search (keyword search) to retrieve documents. You can add documents via SupabaseVectorStore addDocuments
function. SupabaseHybridKeyWordSearch accepts embedding, supabase client, number of results for similarity search, and number of results for keyword search as parameters. The getRelevantDocuments
function produces a list of documents that has duplicates removed and is sorted by relevance score.
设置
¥Setup
使用以下方式安装库
¥Install the library with
- npm
- Yarn
- pnpm
npm install -S @supabase/supabase-js
yarn add @supabase/supabase-js
pnpm add @supabase/supabase-js
在数据库中创建表和搜索函数
¥Create a table and search functions in your database
在你的数据库中运行此代码:
¥Run this in your database:
-- Enable the pgvector extension to work with embedding vectors
create extension vector;
-- Create a table to store your documents
create table documents (
id bigserial primary key,
content text, -- corresponds to Document.pageContent
metadata jsonb, -- corresponds to Document.metadata
embedding vector(1536) -- 1536 works for OpenAI embeddings, change if needed
);
-- Create a function to similarity search for documents
create function match_documents (
query_embedding vector(1536),
match_count int DEFAULT null,
filter jsonb DEFAULT '{}'
) returns table (
id bigint,
content text,
metadata jsonb,
similarity float
)
language plpgsql
as $$
#variable_conflict use_column
begin
return query
select
id,
content,
metadata,
1 - (documents.embedding <=> query_embedding) as similarity
from documents
where metadata @> filter
order by documents.embedding <=> query_embedding
limit match_count;
end;
$$;
-- Create a function to keyword search for documents
create function kw_match_documents(query_text text, match_count int)
returns table (id bigint, content text, metadata jsonb, similarity real)
as $$
begin
return query execute
format('select id, content, metadata, ts_rank(to_tsvector(content), plainto_tsquery($1)) as similarity
from documents
where to_tsvector(content) @@ plainto_tsquery($1)
order by similarity desc
limit $2')
using query_text, match_count;
end;
$$ language plpgsql;
用法
¥Usage
- npm
- Yarn
- pnpm
npm install @langchain/openai @langchain/community @langchain/core
yarn add @langchain/openai @langchain/community @langchain/core
pnpm add @langchain/openai @langchain/community @langchain/core
import { OpenAIEmbeddings } from "@langchain/openai";
import { createClient } from "@supabase/supabase-js";
import { SupabaseHybridSearch } from "@langchain/community/retrievers/supabase";
export const run = async () => {
const client = createClient(
process.env.SUPABASE_URL || "",
process.env.SUPABASE_PRIVATE_KEY || ""
);
const embeddings = new OpenAIEmbeddings();
const retriever = new SupabaseHybridSearch(embeddings, {
client,
// Below are the defaults, expecting that you set up your supabase table and functions according to the guide above. Please change if necessary.
similarityK: 2,
keywordK: 2,
tableName: "documents",
similarityQueryName: "match_documents",
keywordQueryName: "kw_match_documents",
});
const results = await retriever.invoke("hello bye");
console.log(results);
};
API Reference:
- OpenAIEmbeddings from
@langchain/openai
- SupabaseHybridSearch from
@langchain/community/retrievers/supabase
相关
¥Related
检索器 概念指南
¥Retriever conceptual guide
检索器 操作指南
¥Retriever how-to guides