如何创建时间加权检索器
¥How to create a time-weighted retriever
本指南假设你熟悉以下概念:
¥This guide assumes familiarity with the following concepts:
本指南介绍 TimeWeightedVectorStoreRetriever
,它结合使用了语义相似性和时间衰减。
¥This guide covers the TimeWeightedVectorStoreRetriever
,
which uses a combination of semantic similarity and a time decay.
评分算法如下:
¥The algorithm for scoring them is:
semantic_similarity + (1.0 - decay_rate) ^ hours_passed
需要注意的是,hours_passed
指的是自检索器中对象上次访问以来经过的小时数,而不是自创建以来经过的小时数。这意味着频繁访问的对象仍为 "新鲜。"
¥Notably, hours_passed
refers to the hours passed since the object in the retriever was last accessed, not since it was created. This means that frequently accessed objects remain "fresh."
let score = (1.0 - this.decayRate) ** hoursPassed + vectorRelevance;
this.decayRate
是一个可配置的 0 到 1 之间的十进制数。较低的数字意味着文档的 "remembered" 时间更长,而较高的数字则对最近访问的文档赋予更高的权重。
¥this.decayRate
is a configurable decimal number between 0 and 1. A lower number means that documents will be "remembered" for longer, while a higher number strongly weights more recently accessed documents.
请注意,将衰减率设置为 0 或 1 将使 hoursPassed
变得无关紧要,并使此检索器相当于标准向量查找。
¥Note that setting a decay rate of exactly 0 or 1 makes hoursPassed
irrelevant and makes this retriever equivalent to a standard vector lookup.
需要注意的是,由于必需的元数据,所有文档都必须使用检索器上的 addDocuments
方法添加到支持向量存储中,而不是向量存储本身。
¥It is important to note that due to required metadata, all documents must be added to the backing vector store using the addDocuments
method on the retriever, not the vector store itself.
- npm
- Yarn
- pnpm
npm install @langchain/openai @langchain/core
yarn add @langchain/openai @langchain/core
pnpm add @langchain/openai @langchain/core
import { TimeWeightedVectorStoreRetriever } from "langchain/retrievers/time_weighted";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings } from "@langchain/openai";
const vectorStore = new MemoryVectorStore(new OpenAIEmbeddings());
const retriever = new TimeWeightedVectorStoreRetriever({
vectorStore,
memoryStream: [],
searchKwargs: 2,
});
const documents = [
"My name is John.",
"My name is Bob.",
"My favourite food is pizza.",
"My favourite food is pasta.",
"My favourite food is sushi.",
].map((pageContent) => ({ pageContent, metadata: {} }));
// All documents must be added using this method on the retriever (not the vector store!)
// so that the correct access history metadata is populated
await retriever.addDocuments(documents);
const results1 = await retriever.invoke("What is my favourite food?");
console.log(results1);
/*
[
Document { pageContent: 'My favourite food is pasta.', metadata: {} }
]
*/
const results2 = await retriever.invoke("What is my favourite food?");
console.log(results2);
/*
[
Document { pageContent: 'My favourite food is pasta.', metadata: {} }
]
*/
API Reference:
- TimeWeightedVectorStoreRetriever from
langchain/retrievers/time_weighted
- MemoryVectorStore from
langchain/vectorstores/memory
- OpenAIEmbeddings from
@langchain/openai
后续步骤
¥Next steps
现在你已经了解了如何在执行检索时将时间作为一个因素。
¥You've now learned how to use time as a factor when performing retrieval.
接下来,查看 关于 RAG 的更广泛教程 或本节以了解如何执行 基于任何数据源创建你自己的自定义检索器。
¥Next, check out the broader tutorial on RAG, or this section to learn how to create your own custom retriever over any data source.