Cohere Rerank
对文档进行重新排序可以极大地改进任何 RAG 应用和文档检索系统。
¥Reranking documents can greatly improve any RAG application and document retrieval system.
从高层次来看,rerank API 是一种语言模型,它分析文档并根据其与给定查询的相关性对其进行重新排序。
¥At a high level, a rerank API is a language model which analyzes documents and reorders them based on their relevance to a given query.
Cohere 提供了用于对文档进行重新排序的 API。在本例中,我们将向你展示如何使用它。
¥Cohere offers an API for reranking documents. In this example we'll show you how to use it.
设置
¥Setup
- npm
- Yarn
- pnpm
npm install @langchain/cohere @langchain/core
yarn add @langchain/cohere @langchain/core
pnpm add @langchain/cohere @langchain/core
import { CohereRerank } from "@langchain/cohere";
import { Document } from "@langchain/core/documents";
const query = "What is the capital of the United States?";
const docs = [
new Document({
pageContent:
"Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274.",
}),
new Document({
pageContent:
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan.",
}),
new Document({
pageContent:
"Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas.",
}),
new Document({
pageContent:
"Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America.",
}),
new Document({
pageContent:
"Capital punishment (the death penalty) has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states. The federal government (including the United States military) also uses capital punishment.",
}),
];
const cohereRerank = new CohereRerank({
apiKey: process.env.COHERE_API_KEY, // Default
model: "rerank-english-v2.0",
});
const rerankedDocuments = await cohereRerank.rerank(docs, query, {
topN: 5,
});
console.log(rerankedDocuments);
/**
[
{ index: 3, relevanceScore: 0.9871293 },
{ index: 1, relevanceScore: 0.29961726 },
{ index: 4, relevanceScore: 0.27542195 },
{ index: 0, relevanceScore: 0.08977329 },
{ index: 2, relevanceScore: 0.041462272 }
]
*/
API Reference:
- CohereRerank from
@langchain/cohere - Document from
@langchain/core/documents
这里,我们可以看到 .rerank() 方法仅返回文档的索引(与输入文档的索引匹配)及其相关性得分。
¥Here, we can see the .rerank() method returns just the index of the documents (matching the indexes of the input documents) and their relevancy scores.
如果我们想从方法本身返回文档,我们可以使用 .compressDocuments() 方法。
¥If we'd like to have the documents returned from the method itself, we can use the .compressDocuments() method.
import { CohereRerank } from "@langchain/cohere";
import { Document } from "@langchain/core/documents";
const query = "What is the capital of the United States?";
const docs = [
new Document({
pageContent:
"Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274.",
}),
new Document({
pageContent:
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan.",
}),
new Document({
pageContent:
"Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas.",
}),
new Document({
pageContent:
"Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America.",
}),
new Document({
pageContent:
"Capital punishment (the death penalty) has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states. The federal government (including the United States military) also uses capital punishment.",
}),
];
const cohereRerank = new CohereRerank({
apiKey: process.env.COHERE_API_KEY, // Default
topN: 3, // Default
model: "rerank-english-v2.0",
});
const rerankedDocuments = await cohereRerank.compressDocuments(docs, query);
console.log(rerankedDocuments);
/**
[
Document {
pageContent: 'Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America.',
metadata: { relevanceScore: 0.9871293 }
},
Document {
pageContent: 'The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan.',
metadata: { relevanceScore: 0.29961726 }
},
Document {
pageContent: 'Capital punishment (the death penalty) has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states. The federal government (including the United States military) also uses capital punishment.',
metadata: { relevanceScore: 0.27542195 }
}
]
*/
API Reference:
- CohereRerank from
@langchain/cohere - Document from
@langchain/core/documents
从结果中,我们可以看到它返回了排名前 3 位的文档,并为每个文档分配了一个 relevanceScore。
¥From the results, we can see it returned the top 3 documents, and assigned a relevanceScore to each.
不出所料,relevanceScore 得分最高的文档是引用华盛顿特区的文档,得分为 98.7%!
¥As expected, the document with the highest relevanceScore is the one that references Washington, D.C., with a score of 98.7%!
与 CohereClient 结合使用
¥Usage with CohereClient
如果你在 Azure、AWS Bedrock 或独立实例上使用 Cohere,则可以使用 CohereClient 和你的端点创建 CohereRerank 实例。
¥If you are using Cohere on Azure, AWS Bedrock or a standalone instance you can use the CohereClient to create a CohereRerank instance with your endpoint.
import { CohereRerank } from "@langchain/cohere";
import { CohereClient } from "cohere-ai";
import { Document } from "@langchain/core/documents";
const query = "What is the capital of the United States?";
const docs = [
new Document({
pageContent:
"Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274.",
}),
new Document({
pageContent:
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan.",
}),
new Document({
pageContent:
"Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas.",
}),
new Document({
pageContent:
"Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America.",
}),
new Document({
pageContent:
"Capital punishment (the death penalty) has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states. The federal government (including the United States military) also uses capital punishment.",
}),
];
const client = new CohereClient({
token: process.env.COHERE_API_KEY,
environment: "<your-cohere-deployment-url>", // optional
// other params
});
const cohereRerank = new CohereRerank({
client, // apiKey will be ignored even if provided
model: "rerank-english-v2.0",
});
const rerankedDocuments = await cohereRerank.rerank(docs, query, {
topN: 5,
});
console.log(rerankedDocuments);
/*
[
{ index: 3, relevanceScore: 0.9871293 },
{ index: 1, relevanceScore: 0.29961726 },
{ index: 4, relevanceScore: 0.27542195 },
{ index: 0, relevanceScore: 0.08977329 },
{ index: 2, relevanceScore: 0.041462272 }
]
*/
API Reference:
- CohereRerank from
@langchain/cohere - Document from
@langchain/core/documents