ArxivRetriever

arXiv Retriever 允许用户在 arXiv 数据库中查询学术文章。它支持全文档检索（PDF 解析）和基于摘要的检索。

¥The arXiv Retriever allows users to query the arXiv database for academic articles. It supports both full-document retrieval (PDF parsing) and summary-based retrieval.

有关所有 ArxivRetriever 功能和配置的详细文档，请前往 API 参考

¥For detailed documentation of all ArxivRetriever features and configurations, head to the API reference

功能

¥Features

查询灵活性：使用自然语言查询或特定的 arXiv ID 进行搜索。
¥Query Flexibility: Search using natural language queries or specific arXiv IDs.
全文档检索：用于获取和解析 PDF 的选项。
¥Full-Document Retrieval: Option to fetch and parse PDFs.
摘要转文档：检索摘要以更快地获得结果。
¥Summaries as Documents: Retrieve summaries for faster results.
可自定义选项：配置最大结果数和输出格式。
¥Customizable Options: Configure maximum results and output format.

集成详情

¥Integration details

Retriever	Source	Package
`ArxivRetriever`	Academic articles from arXiv	`@langchain/community`

设置

¥Setup

确保已安装以下依赖：

¥Ensure the following dependencies are installed:

pdf-parse 用于解析 PDF
¥pdf-parse for parsing PDFs
fast-xml-parser 提供解析来自 arXiv API 的 XML 响应的信息
¥fast-xml-parser for parsing XML responses from the arXiv API

npm install pdf-parse fast-xml-parser

实例化

¥Instantiation

const retriever = new ArxivRetriever({
  getFullDocuments: false, // Set to true to fetch full documents (PDFs)
  maxSearchResults: 5, // Maximum number of results to retrieve
});

用法

¥Usage

使用 invoke 方法在 arXiv 中搜索相关文章。你可以使用自然语言查询或特定的 arXiv ID。

¥Use the invoke method to search arXiv for relevant articles. You can use either natural language queries or specific arXiv IDs.

const query = "quantum computing";

const documents = await retriever.invoke(query);
documents.forEach((doc) => {
  console.log("Title:", doc.metadata.title);
  console.log("Content:", doc.pageContent); // Parsed PDF content
});

在链中使用

¥Use within a chain

与其他检索器一样，ArxivRetriever 可以通过链式调用集成到 LLM 应用中。下面是在链中使用检索器的示例：

¥Like other retrievers, ArxivRetriever can be incorporated into LLM applications via chains. Below is an example of using the retriever within a chain:

import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import {
  RunnablePassthrough,
  RunnableSequence,
} from "@langchain/core/runnables";
import { StringOutputParser } from "@langchain/core/output_parsers";
import type { Document } from "@langchain/core/documents";

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  temperature: 0,
});

const prompt = ChatPromptTemplate.fromTemplate(`
Answer the question based only on the context provided.

Context: {context}

Question: {question}`);

const formatDocs = (docs: Document[]) => {
  return docs.map((doc) => doc.pageContent).join("\n\n");
};

const ragChain = RunnableSequence.from([
  {
    context: retriever.pipe(formatDocs),
    question: new RunnablePassthrough(),
  },
  prompt,
  llm,
  new StringOutputParser(),
]);

await ragChain.invoke("What are the latest advances in quantum computing?");

API 参考

¥API reference

有关所有 ArxivRetriever 功能和配置的详细文档，请前往 API 参考

¥For detailed documentation of all ArxivRetriever features and configurations, head to the API reference

¥Related

检索器概念指南
¥Retriever conceptual guide
检索器操作指南
¥Retriever how-to guides

ArxivRetriever

功能​

集成详情​

设置​

实例化​

用法​

在链中使用​

API 参考​

相关​

功能

集成详情

设置

实例化

用法

在链中使用

API 参考

相关