如何编写自定义检索器类

¥How to write a custom retriever class

Prerequisites

本指南假设你熟悉以下概念：

¥This guide assumes familiarity with the following concepts:

检索器
¥Retrievers

要创建你自己的检索器，你需要扩展 BaseRetriever 类并实现一个 _getRelevantDocuments 方法，该方法以 string 作为其第一个参数（以及一个可选的 runManager 用于跟踪）。此方法应返回从某个来源获取的 Document 数组。此过程可能涉及调用数据库、使用 fetch 调用 Web 或任何其他来源。请注意 _getRelevantDocuments() 前面的下划线。基类封装了非前缀版本，以便自动处理原始调用的追踪。

¥To create your own retriever, you need to extend the BaseRetriever class and implement a _getRelevantDocuments method that takes a string as its first parameter (and an optional runManager for tracing). This method should return an array of Documents fetched from some source. This process can involve calls to a database, to the web using fetch, or any other source. Note the underscore before _getRelevantDocuments(). The base class wraps the non-prefixed version in order to automatically handle tracing of the original call.

以下是返回静态文档的自定义检索器的示例：

¥Here's an example of a custom retriever that returns static documents:

import {
  BaseRetriever,
  type BaseRetrieverInput,
} from "@langchain/core/retrievers";
import type { CallbackManagerForRetrieverRun } from "@langchain/core/callbacks/manager";
import { Document } from "@langchain/core/documents";

export interface CustomRetrieverInput extends BaseRetrieverInput {}

export class CustomRetriever extends BaseRetriever {
  lc_namespace = ["langchain", "retrievers"];

  constructor(fields?: CustomRetrieverInput) {
    super(fields);
  }

  async _getRelevantDocuments(
    query: string,
    runManager?: CallbackManagerForRetrieverRun
  ): Promise<Document[]> {
    // Pass `runManager?.getChild()` when invoking internal runnables to enable tracing
    // const additionalDocs = await someOtherRunnable.invoke(params, runManager?.getChild());
    return [
      // ...additionalDocs,
      new Document({
        pageContent: `Some document pertaining to ${query}`,
        metadata: {},
      }),
      new Document({
        pageContent: `Some other document pertaining to ${query}`,
        metadata: {},
      }),
    ];
  }
}

然后，你可以按如下方式调用 .invoke()：

¥Then, you can call .invoke() as follows:

const retriever = new CustomRetriever({});

await retriever.invoke("LangChain docs");

[
  Document {
    pageContent: 'Some document pertaining to LangChain docs',
    metadata: {}
  },
  Document {
    pageContent: 'Some other document pertaining to LangChain docs',
    metadata: {}
  }
]

后续步骤

¥Next steps

现在你已经看到了一个实现自定义检索器的示例。

¥You've now seen an example of implementing your own custom retriever.

接下来，查看各个部分，深入了解特定检索器或关于 RAG 的更广泛教程。

¥Next, check out the individual sections for deeper dives on specific retrievers, or the broader tutorial on RAG.

如何编写自定义检索器类

后续步骤​

后续步骤