ArxivRetriever
arXiv Retriever
允许用户在 arXiv 数据库中查询学术文章。它支持全文档检索(PDF 解析)和基于摘要的检索。
¥The arXiv Retriever
allows users to query the arXiv database for academic articles. It supports both full-document retrieval (PDF parsing) and summary-based retrieval.
有关所有 ArxivRetriever 功能和配置的详细文档,请前往 API 参考
¥For detailed documentation of all ArxivRetriever features and configurations, head to the API reference
功能
¥Features
查询灵活性:使用自然语言查询或特定的 arXiv ID 进行搜索。
¥Query Flexibility: Search using natural language queries or specific arXiv IDs.
全文档检索:用于获取和解析 PDF 的选项。
¥Full-Document Retrieval: Option to fetch and parse PDFs.
摘要转文档:检索摘要以更快地获得结果。
¥Summaries as Documents: Retrieve summaries for faster results.
可自定义选项:配置最大结果数和输出格式。
¥Customizable Options: Configure maximum results and output format.
集成详情
¥Integration details
Retriever | Source | Package |
---|---|---|
ArxivRetriever | Academic articles from arXiv | @langchain/community |
设置
¥Setup
确保已安装以下依赖:
¥Ensure the following dependencies are installed:
pdf-parse
用于解析 PDF¥
pdf-parse
for parsing PDFsfast-xml-parser
提供解析来自 arXiv API 的 XML 响应的信息¥
fast-xml-parser
for parsing XML responses from the arXiv API
npm install pdf-parse fast-xml-parser
实例化
¥Instantiation
const retriever = new ArxivRetriever({
getFullDocuments: false, // Set to true to fetch full documents (PDFs)
maxSearchResults: 5, // Maximum number of results to retrieve
});
用法
¥Usage
使用 invoke
方法在 arXiv 中搜索相关文章。你可以使用自然语言查询或特定的 arXiv ID。
¥Use the invoke
method to search arXiv for relevant articles. You can use either natural language queries or specific arXiv IDs.
const query = "quantum computing";
const documents = await retriever.invoke(query);
documents.forEach((doc) => {
console.log("Title:", doc.metadata.title);
console.log("Content:", doc.pageContent); // Parsed PDF content
});
在链中使用
¥Use within a chain
与其他检索器一样,ArxivRetriever
可以通过链式调用集成到 LLM 应用中。下面是在链中使用检索器的示例:
¥Like other retrievers, ArxivRetriever
can be incorporated into LLM applications via chains. Below is an example of using the retriever within a chain:
import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import {
RunnablePassthrough,
RunnableSequence,
} from "@langchain/core/runnables";
import { StringOutputParser } from "@langchain/core/output_parsers";
import type { Document } from "@langchain/core/documents";
const llm = new ChatOpenAI({
model: "gpt-4o-mini",
temperature: 0,
});
const prompt = ChatPromptTemplate.fromTemplate(`
Answer the question based only on the context provided.
Context: {context}
Question: {question}`);
const formatDocs = (docs: Document[]) => {
return docs.map((doc) => doc.pageContent).join("\n\n");
};
const ragChain = RunnableSequence.from([
{
context: retriever.pipe(formatDocs),
question: new RunnablePassthrough(),
},
prompt,
llm,
new StringOutputParser(),
]);
await ragChain.invoke("What are the latest advances in quantum computing?");
API 参考
¥API reference
有关所有 ArxivRetriever 功能和配置的详细文档,请前往 API 参考
¥For detailed documentation of all ArxivRetriever features and configurations, head to the API reference
相关
¥Related
检索器 概念指南
¥Retriever conceptual guide
检索器 操作指南
¥Retriever how-to guides