如何按相似性选择示例
¥How to select examples by similarity
本指南假设你熟悉以下概念:
¥This guide assumes familiarity with the following concepts:
此对象根据与输入的相似性选择示例。它通过查找与输入具有最大余弦相似度的嵌入样本来实现这一点。
¥This object selects examples based on similarity to the inputs. It does this by finding the examples with the embeddings that have the greatest cosine similarity with the inputs.
示例对象的字段将用作参数,用于格式化传递给 FewShotPromptTemplate
的 examplePrompt
。因此,每个示例都应包含你正在使用的示例提示的所有必需字段。
¥The fields of the examples object will be used as parameters to format the examplePrompt
passed to the FewShotPromptTemplate
.
Each example should therefore contain all required fields for the example prompt you are using.
- npm
- Yarn
- pnpm
npm install @langchain/openai @langchain/community @langchain/core
yarn add @langchain/openai @langchain/community @langchain/core
pnpm add @langchain/openai @langchain/community @langchain/core
import { OpenAIEmbeddings } from "@langchain/openai";
import { HNSWLib } from "@langchain/community/vectorstores/hnswlib";
import { PromptTemplate, FewShotPromptTemplate } from "@langchain/core/prompts";
import { SemanticSimilarityExampleSelector } from "@langchain/core/example_selectors";
// Create a prompt template that will be used to format the examples.
const examplePrompt = PromptTemplate.fromTemplate(
"Input: {input}\nOutput: {output}"
);
// Create a SemanticSimilarityExampleSelector that will be used to select the examples.
const exampleSelector = await SemanticSimilarityExampleSelector.fromExamples(
[
{ input: "happy", output: "sad" },
{ input: "tall", output: "short" },
{ input: "energetic", output: "lethargic" },
{ input: "sunny", output: "gloomy" },
{ input: "windy", output: "calm" },
],
new OpenAIEmbeddings(),
HNSWLib,
{ k: 1 }
);
// Create a FewShotPromptTemplate that will use the example selector.
const dynamicPrompt = new FewShotPromptTemplate({
// We provide an ExampleSelector instead of examples.
exampleSelector,
examplePrompt,
prefix: "Give the antonym of every input",
suffix: "Input: {adjective}\nOutput:",
inputVariables: ["adjective"],
});
// Input is about the weather, so should select eg. the sunny/gloomy example
console.log(await dynamicPrompt.format({ adjective: "rainy" }));
/*
Give the antonym of every input
Input: sunny
Output: gloomy
Input: rainy
Output:
*/
// Input is a measurement, so should select the tall/short example
console.log(await dynamicPrompt.format({ adjective: "large" }));
/*
Give the antonym of every input
Input: tall
Output: short
Input: large
Output:
*/
API Reference:
- OpenAIEmbeddings from
@langchain/openai
- HNSWLib from
@langchain/community/vectorstores/hnswlib
- PromptTemplate from
@langchain/core/prompts
- FewShotPromptTemplate from
@langchain/core/prompts
- SemanticSimilarityExampleSelector from
@langchain/core/example_selectors
默认情况下,示例对象中的每个字段都会连接在一起、嵌入并存储在向量存储中,以便稍后针对用户查询进行相似性搜索。
¥By default, each field in the examples object is concatenated together, embedded, and stored in the vectorstore for later similarity search against user queries.
如果你只想嵌入特定的键(例如,你只想搜索与用户提供的查询类似的示例),则可以在最后的 options
参数中传递一个 inputKeys
数组。
¥If you only want to embed specific keys
(e.g., you only want to search for examples that have a similar query to the one the user provides), you can pass an inputKeys
array in the final options
parameter.
从现有向量存储加载
¥Loading from an existing vectorstore
你还可以通过将实例直接传递给 SemanticSimilarityExampleSelector
构造函数来使用预初始化的向量存储,如下所示。你还可以通过 addExample
方法添加更多示例:
¥You can also use a pre-initialized vector store by passing an instance to the SemanticSimilarityExampleSelector
constructor
directly, as shown below. You can also add more examples via the addExample
method:
// Ephemeral, in-memory vector store for demo purposes
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings, ChatOpenAI } from "@langchain/openai";
import { PromptTemplate, FewShotPromptTemplate } from "@langchain/core/prompts";
import { SemanticSimilarityExampleSelector } from "@langchain/core/example_selectors";
const embeddings = new OpenAIEmbeddings();
const memoryVectorStore = new MemoryVectorStore(embeddings);
const examples = [
{
query: "healthy food",
output: `galbi`,
},
{
query: "healthy food",
output: `schnitzel`,
},
{
query: "foo",
output: `bar`,
},
];
const exampleSelector = new SemanticSimilarityExampleSelector({
vectorStore: memoryVectorStore,
k: 2,
// Only embed the "query" key of each example
inputKeys: ["query"],
});
for (const example of examples) {
// Format and add an example to the underlying vector store
await exampleSelector.addExample(example);
}
// Create a prompt template that will be used to format the examples.
const examplePrompt = PromptTemplate.fromTemplate(`<example>
<user_input>
{query}
</user_input>
<output>
{output}
</output>
</example>`);
// Create a FewShotPromptTemplate that will use the example selector.
const dynamicPrompt = new FewShotPromptTemplate({
// We provide an ExampleSelector instead of examples.
exampleSelector,
examplePrompt,
prefix: `Answer the user's question, using the below examples as reference:`,
suffix: "User question: {query}",
inputVariables: ["query"],
});
const formattedValue = await dynamicPrompt.format({
query: "What is a healthy food?",
});
console.log(formattedValue);
/*
Answer the user's question, using the below examples as reference:
<example>
<user_input>
healthy
</user_input>
<output>
galbi
</output>
</example>
<example>
<user_input>
healthy
</user_input>
<output>
schnitzel
</output>
</example>
User question: What is a healthy food?
*/
const model = new ChatOpenAI({});
const chain = dynamicPrompt.pipe(model);
const result = await chain.invoke({ query: "What is a healthy food?" });
console.log(result);
/*
AIMessage {
content: 'A healthy food can be galbi or schnitzel.',
additional_kwargs: { function_call: undefined }
}
*/
API Reference:
- MemoryVectorStore from
langchain/vectorstores/memory
- OpenAIEmbeddings from
@langchain/openai
- ChatOpenAI from
@langchain/openai
- PromptTemplate from
@langchain/core/prompts
- FewShotPromptTemplate from
@langchain/core/prompts
- SemanticSimilarityExampleSelector from
@langchain/core/example_selectors
元数据过滤
¥Metadata filtering
添加示例时,每个字段都可以作为生成的文档中的元数据使用。如果你想进一步控制搜索空间,可以在示例中添加额外的字段,并在初始化选择器时传递 filter
参数:
¥When adding examples, each field is available as metadata in the produced document. If you would like further control over your
search space, you can add extra fields to your examples and pass a filter
parameter when initializing your selector:
// Ephemeral, in-memory vector store for demo purposes
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings, ChatOpenAI } from "@langchain/openai";
import { PromptTemplate, FewShotPromptTemplate } from "@langchain/core/prompts";
import { Document } from "@langchain/core/documents";
import { SemanticSimilarityExampleSelector } from "@langchain/core/example_selectors";
const embeddings = new OpenAIEmbeddings();
const memoryVectorStore = new MemoryVectorStore(embeddings);
const examples = [
{
query: "healthy food",
output: `lettuce`,
food_type: "vegetable",
},
{
query: "healthy food",
output: `schnitzel`,
food_type: "veal",
},
{
query: "foo",
output: `bar`,
food_type: "baz",
},
];
const exampleSelector = new SemanticSimilarityExampleSelector({
vectorStore: memoryVectorStore,
k: 2,
// Only embed the "query" key of each example
inputKeys: ["query"],
// Filter type will depend on your specific vector store.
// See the section of the docs for the specific vector store you are using.
filter: (doc: Document) => doc.metadata.food_type === "vegetable",
});
for (const example of examples) {
// Format and add an example to the underlying vector store
await exampleSelector.addExample(example);
}
// Create a prompt template that will be used to format the examples.
const examplePrompt = PromptTemplate.fromTemplate(`<example>
<user_input>
{query}
</user_input>
<output>
{output}
</output>
</example>`);
// Create a FewShotPromptTemplate that will use the example selector.
const dynamicPrompt = new FewShotPromptTemplate({
// We provide an ExampleSelector instead of examples.
exampleSelector,
examplePrompt,
prefix: `Answer the user's question, using the below examples as reference:`,
suffix: "User question:\n{query}",
inputVariables: ["query"],
});
const model = new ChatOpenAI({});
const chain = dynamicPrompt.pipe(model);
const result = await chain.invoke({
query: "What is exactly one type of healthy food?",
});
console.log(result);
/*
AIMessage {
content: 'One type of healthy food is lettuce.',
additional_kwargs: { function_call: undefined }
}
*/
API Reference:
- MemoryVectorStore from
langchain/vectorstores/memory
- OpenAIEmbeddings from
@langchain/openai
- ChatOpenAI from
@langchain/openai
- PromptTemplate from
@langchain/core/prompts
- FewShotPromptTemplate from
@langchain/core/prompts
- Document from
@langchain/core/documents
- SemanticSimilarityExampleSelector from
@langchain/core/example_selectors
自定义矢量存储检索器
¥Custom vectorstore retrievers
你还可以传递一个向量存储检索器,而不是向量存储。如果你除了相似性搜索之外还想使用检索方法(例如最大边际相关性),那么这种方法可能很有用:
¥You can also pass a vectorstore retriever instead of a vectorstore. One way this could be useful is if you want to use retrieval besides similarity search such as maximal marginal relevance:
/* eslint-disable @typescript-eslint/no-non-null-assertion */
// Requires a vectorstore that supports maximal marginal relevance search
import { Pinecone } from "@pinecone-database/pinecone";
import { OpenAIEmbeddings, ChatOpenAI } from "@langchain/openai";
import { PineconeStore } from "@langchain/pinecone";
import { PromptTemplate, FewShotPromptTemplate } from "@langchain/core/prompts";
import { SemanticSimilarityExampleSelector } from "@langchain/core/example_selectors";
const pinecone = new Pinecone();
const pineconeIndex = pinecone.Index(process.env.PINECONE_INDEX!);
/**
* Pinecone allows you to partition the records in an index into namespaces.
* Queries and other operations are then limited to one namespace,
* so different requests can search different subsets of your index.
* Read more about namespaces here: https://docs.pinecone.io/guides/indexes/use-namespaces
*
* NOTE: If you have namespace enabled in your Pinecone index, you must provide the namespace when creating the PineconeStore.
*/
const namespace = "pinecone";
const pineconeVectorstore = await PineconeStore.fromExistingIndex(
new OpenAIEmbeddings(),
{ pineconeIndex, namespace }
);
const pineconeMmrRetriever = pineconeVectorstore.asRetriever({
searchType: "mmr",
k: 2,
});
const examples = [
{
query: "healthy food",
output: `lettuce`,
food_type: "vegetable",
},
{
query: "healthy food",
output: `schnitzel`,
food_type: "veal",
},
{
query: "foo",
output: `bar`,
food_type: "baz",
},
];
const exampleSelector = new SemanticSimilarityExampleSelector({
vectorStoreRetriever: pineconeMmrRetriever,
// Only embed the "query" key of each example
inputKeys: ["query"],
});
for (const example of examples) {
// Format and add an example to the underlying vector store
await exampleSelector.addExample(example);
}
// Create a prompt template that will be used to format the examples.
const examplePrompt = PromptTemplate.fromTemplate(`<example>
<user_input>
{query}
</user_input>
<output>
{output}
</output>
</example>`);
// Create a FewShotPromptTemplate that will use the example selector.
const dynamicPrompt = new FewShotPromptTemplate({
// We provide an ExampleSelector instead of examples.
exampleSelector,
examplePrompt,
prefix: `Answer the user's question, using the below examples as reference:`,
suffix: "User question:\n{query}",
inputVariables: ["query"],
});
const model = new ChatOpenAI({});
const chain = dynamicPrompt.pipe(model);
const result = await chain.invoke({
query: "What is exactly one type of healthy food?",
});
console.log(result);
/*
AIMessage {
content: 'lettuce.',
additional_kwargs: { function_call: undefined }
}
*/
API Reference:
- OpenAIEmbeddings from
@langchain/openai
- ChatOpenAI from
@langchain/openai
- PineconeStore from
@langchain/pinecone
- PromptTemplate from
@langchain/core/prompts
- FewShotPromptTemplate from
@langchain/core/prompts
- SemanticSimilarityExampleSelector from
@langchain/core/example_selectors
后续步骤
¥Next steps
现在你已经了解了一些关于在示例选择器中使用相似度的知识。
¥You've now learned a bit about using similarity in an example selector.
接下来,查看本指南,了解如何使用 基于长度的示例选择器。
¥Next, check out this guide on how to use a length-based example selector.