Skip to main content

如何创建时间加权检索器

¥How to create a time-weighted retriever

Prerequisites

本指南假设你熟悉以下概念:

¥This guide assumes familiarity with the following concepts:

本指南介绍 TimeWeightedVectorStoreRetriever,它结合使用了语义相似性和时间衰减。

¥This guide covers the TimeWeightedVectorStoreRetriever, which uses a combination of semantic similarity and a time decay.

评分算法如下:

¥The algorithm for scoring them is:

semantic_similarity + (1.0 - decay_rate) ^ hours_passed

需要注意的是,hours_passed 指的是自检索器中对象上次访问以来经过的小时数,而不是自创建以来经过的小时数。这意味着频繁访问的对象仍为 "新鲜。"

¥Notably, hours_passed refers to the hours passed since the object in the retriever was last accessed, not since it was created. This means that frequently accessed objects remain "fresh."

let score = (1.0 - this.decayRate) ** hoursPassed + vectorRelevance;

this.decayRate 是一个可配置的 0 到 1 之间的十进制数。较低的数字意味着文档的 "remembered" 时间更长,而较高的数字则对最近访问的文档赋予更高的权重。

¥this.decayRate is a configurable decimal number between 0 and 1. A lower number means that documents will be "remembered" for longer, while a higher number strongly weights more recently accessed documents.

请注意,将衰减率设置为 0 或 1 将使 hoursPassed 变得无关紧要,并使此检索器相当于标准向量查找。

¥Note that setting a decay rate of exactly 0 or 1 makes hoursPassed irrelevant and makes this retriever equivalent to a standard vector lookup.

需要注意的是,由于必需的元数据,所有文档都必须使用检索器上的 addDocuments 方法添加到支持向量存储中,而不是向量存储本身。

¥It is important to note that due to required metadata, all documents must be added to the backing vector store using the addDocuments method on the retriever, not the vector store itself.

npm install @langchain/openai @langchain/core
import { TimeWeightedVectorStoreRetriever } from "langchain/retrievers/time_weighted";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings } from "@langchain/openai";

const vectorStore = new MemoryVectorStore(new OpenAIEmbeddings());

const retriever = new TimeWeightedVectorStoreRetriever({
vectorStore,
memoryStream: [],
searchKwargs: 2,
});

const documents = [
"My name is John.",
"My name is Bob.",
"My favourite food is pizza.",
"My favourite food is pasta.",
"My favourite food is sushi.",
].map((pageContent) => ({ pageContent, metadata: {} }));

// All documents must be added using this method on the retriever (not the vector store!)
// so that the correct access history metadata is populated
await retriever.addDocuments(documents);

const results1 = await retriever.invoke("What is my favourite food?");

console.log(results1);

/*
[
Document { pageContent: 'My favourite food is pasta.', metadata: {} }
]
*/

const results2 = await retriever.invoke("What is my favourite food?");

console.log(results2);

/*
[
Document { pageContent: 'My favourite food is pasta.', metadata: {} }
]
*/

API Reference:

后续步骤

¥Next steps

现在你已经了解了如何在执行检索时将时间作为一个因素。

¥You've now learned how to use time as a factor when performing retrieval.

接下来,查看 关于 RAG 的更广泛教程 或本节以了解如何执行 基于任何数据源创建你自己的自定义检索器

¥Next, check out the broader tutorial on RAG, or this section to learn how to create your own custom retriever over any data source.