如何使用提示来改进结果
¥How to use prompting to improve results
本指南假设你熟悉以下内容:
¥This guide assumes familiarity with the following:
在本指南中,我们将介绍用于改进 SQL 查询生成的提示策略。我们将主要关注在提示中获取相关数据库特定信息的方法。
¥In this guide we'll go over prompting strategies to improve SQL query generation. We'll largely focus on methods for getting relevant database-specific information in your prompt.
设置
¥Setup
首先,安装所需的包并设置环境变量。本示例将使用 OpenAI 作为 LLM。
¥First, install the required packages and set your environment variables. This example will use OpenAI as the LLM.
npm install @langchain/community @langchain/openai typeorm sqlite3
export OPENAI_API_KEY="your api key"
# Uncomment the below to use LangSmith. Not required.
# export LANGSMITH_API_KEY="your api key"
# export LANGSMITH_TRACING=true
# Reduce tracing latency if you are not in a serverless environment
# export LANGCHAIN_CALLBACKS_BACKGROUND=true
以下示例将使用与 Chinook 数据库的 SQLite 连接。按照以下 安装步骤 操作,在与本注意本相同的目录中创建 Chinook.db
:
¥The below example will use a SQLite connection with Chinook database. Follow these installation steps to create Chinook.db
in the same directory as this notebook:
将 this 文件保存为
Chinook_Sqlite.sql
¥Save this file as
Chinook_Sqlite.sql
运行 sqlite3
Chinook.db
¥Run sqlite3
Chinook.db
运行
.read Chinook_Sqlite.sql
¥Run
.read Chinook_Sqlite.sql
测试
SELECT * FROM Artist LIMIT 10;
¥Test
SELECT * FROM Artist LIMIT 10;
现在,Chinhook.db
已在我们的目录中,我们可以使用 Typeorm 驱动的 SqlDatabase
类与其进行交互:
¥Now, Chinhook.db
is in our directory and we can interface with it using the Typeorm-driven SqlDatabase
class:
import { SqlDatabase } from "langchain/sql_db";
import { DataSource } from "typeorm";
const datasource = new DataSource({
type: "sqlite",
database: "../../../../Chinook.db",
});
const db = await SqlDatabase.fromDataSourceParams({
appDataSource: datasource,
});
console.log(db.allTables.map((t) => t.tableName));
/**
[
'Album', 'Artist',
'Customer', 'Employee',
'Genre', 'Invoice',
'InvoiceLine', 'MediaType',
'Playlist', 'PlaylistTrack',
'Track'
]
*/
API Reference:
- SqlDatabase from
langchain/sql_db
特定于方言的提示
¥Dialect-specific prompting
我们可以做的最简单的事情之一就是使我们的提示特定于我们正在使用的 SQL 方言。使用内置 createSqlQueryChain
和 SqlDatabase
时,系统会为你处理以下任何一种方言:
¥One of the simplest things we can do is make our prompt specific to the SQL dialect we're using.
When using the built-in createSqlQueryChain
and SqlDatabase
, this is handled for you for any of the following dialects:
import { SQL_PROMPTS_MAP } from "langchain/chains/sql_db";
console.log({ SQL_PROMPTS_MAP: Object.keys(SQL_PROMPTS_MAP) });
/**
{
SQL_PROMPTS_MAP: [ 'oracle', 'postgres', 'sqlite', 'mysql', 'mssql', 'sap hana' ]
}
*/
// For example, using our current DB we can see that we’ll get a SQLite-specific prompt:
console.log({
sqlite: SQL_PROMPTS_MAP.sqlite,
});
/**
{
sqlite: PromptTemplate {
inputVariables: [ 'dialect', 'table_info', 'input', 'top_k' ],
template: 'You are a SQLite expert. Given an input question, first create a syntactically correct SQLite query to run, then look at the results of the query and return the answer to the input question.\n' +
'Unless the user specifies in the question a specific number of examples to obtain, query for at most {top_k} results using the LIMIT clause as per SQLite. You can order the results to return the most informative data in the database.\n' +
'Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in double quotes (") to denote them as delimited identifiers.\n' +
'Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.\n' +
'\n' +
'Use the following format:\n' +
'\n' +
'Question: "Question here"\n' +
'SQLQuery: "SQL Query to run"\n' +
'SQLResult: "Result of the SQLQuery"\n' +
'Answer: "Final answer here"\n' +
'\n' +
'Only use the following tables:\n' +
'{table_info}\n' +
'\n' +
'Question: {input}',
}
}
*/
API Reference:
- SQL_PROMPTS_MAP from
langchain/chains/sql_db
表定义和示例行
¥Table definitions and example rows
基本上在任何 SQL 链中,我们都需要向模型提供至少一部分数据库模式。如果没有这个,它将无法编写有效的查询。我们的数据库自带一些便捷方法,以便提供相关上下文。具体来说,我们可以获取表名、表结构以及每个表中的行样本:
¥In basically any SQL chain, we'll need to feed the model at least part of the database schema. Without this it won't be able to write valid queries. Our database comes with some convenience methods to give us the relevant context. Specifically, we can get the table names, their schemas, and a sample of rows from each table:
import { db } from "../db.js";
const context = await db.getTableInfo();
console.log(context);
/**
CREATE TABLE Album (
AlbumId INTEGER NOT NULL,
Title NVARCHAR(160) NOT NULL,
ArtistId INTEGER NOT NULL
)
SELECT * FROM "Album" LIMIT 3;
AlbumId Title ArtistId
1 For Those About To Rock We Salute You 1
2 Balls to the Wall 2
3 Restless and Wild 2
CREATE TABLE Artist (
ArtistId INTEGER NOT NULL,
Name NVARCHAR(120)
)
SELECT * FROM "Artist" LIMIT 3;
ArtistId Name
1 AC/DC
2 Accept
3 Aerosmith
CREATE TABLE Customer (
CustomerId INTEGER NOT NULL,
FirstName NVARCHAR(40) NOT NULL,
LastName NVARCHAR(20) NOT NULL,
Company NVARCHAR(80),
Address NVARCHAR(70),
City NVARCHAR(40),
State NVARCHAR(40),
Country NVARCHAR(40),
PostalCode NVARCHAR(10),
Phone NVARCHAR(24),
Fax NVARCHAR(24),
Email NVARCHAR(60) NOT NULL,
SupportRepId INTEGER
)
SELECT * FROM "Customer" LIMIT 3;
CustomerId FirstName LastName Company Address City State Country PostalCode Phone Fax Email SupportRepId
1 Luís Gonçalves Embraer - Empresa Brasileira de Aeronáutica S.A. Av. Brigadeiro Faria Lima,
2170 São José dos Campos SP Brazil 12227-000 +55 (12) 3923-5555 +55 (12) 3923-5566 luisg@embraer.com.br 3
2 Leonie Köhler null Theodor-Heuss-Straße 34 Stuttgart null Germany 70174 +49 0711 2842222 null leonekohler@surfeu.de 5
3 François Tremblay null 1498 rue Bélanger Montréal QC Canada H2G 1A7 +1 (514) 721-4711 null ftremblay@gmail.com 3
CREATE TABLE Employee (
EmployeeId INTEGER NOT NULL,
LastName NVARCHAR(20) NOT NULL,
FirstName NVARCHAR(20) NOT NULL,
Title NVARCHAR(30),
ReportsTo INTEGER,
BirthDate DATETIME,
HireDate DATETIME,
Address NVARCHAR(70),
City NVARCHAR(40),
State NVARCHAR(40),
Country NVARCHAR(40),
PostalCode NVARCHAR(10),
Phone NVARCHAR(24),
Fax NVARCHAR(24),
Email NVARCHAR(60)
)
SELECT * FROM "Employee" LIMIT 3;
EmployeeId LastName FirstName Title ReportsTo BirthDate HireDate Address City State Country PostalCode Phone Fax Email
1 Adams Andrew General Manager null 1962-02-18 00:00:00 2002-08-14 00:00:00 11120 Jasper Ave NW Edmonton AB Canada T5K 2N1 +1 (780) 428-9482 +1 (780) 428-3457 andrew@chinookcorp.com
2 Edwards Nancy Sales Manager 1 1958-12-08 00:00:00 2002-05-01 00:00:00 825 8 Ave SW Calgary AB Canada T2P 2T3 +1 (403) 262-3443 +1 (403) 262-3322 nancy@chinookcorp.com
3 Peacock Jane Sales Support Agent 2 1973-08-29 00:00:00 2002-04-01 00:00:00 1111 6 Ave SW Calgary AB Canada T2P 5M5 +1 (403) 262-3443 +1 (403) 262-6712 jane@chinookcorp.com
CREATE TABLE Genre (
GenreId INTEGER NOT NULL,
Name NVARCHAR(120)
)
SELECT * FROM "Genre" LIMIT 3;
GenreId Name
1 Rock
2 Jazz
3 Metal
CREATE TABLE Invoice (
InvoiceId INTEGER NOT NULL,
CustomerId INTEGER NOT NULL,
InvoiceDate DATETIME NOT NULL,
BillingAddress NVARCHAR(70),
BillingCity NVARCHAR(40),
BillingState NVARCHAR(40),
BillingCountry NVARCHAR(40),
BillingPostalCode NVARCHAR(10),
Total NUMERIC(10,2) NOT NULL
)
SELECT * FROM "Invoice" LIMIT 3;
InvoiceId CustomerId InvoiceDate BillingAddress BillingCity BillingState BillingCountry BillingPostalCode Total
1 2 2009-01-01 00:00:00 Theodor-Heuss-Straße 34 Stuttgart null Germany 70174 1.98
2 4 2009-01-02 00:00:00 Ullevålsveien 14 Oslo null Norway 0171 3.96
3 8 2009-01-03 00:00:00 Grétrystraat 63 Brussels null Belgium 1000 5.94
CREATE TABLE InvoiceLine (
InvoiceLineId INTEGER NOT NULL,
InvoiceId INTEGER NOT NULL,
TrackId INTEGER NOT NULL,
UnitPrice NUMERIC(10,2) NOT NULL,
Quantity INTEGER NOT NULL
)
SELECT * FROM "InvoiceLine" LIMIT 3;
InvoiceLineId InvoiceId TrackId UnitPrice Quantity
1 1 2 0.99 1
2 1 4 0.99 1
3 2 6 0.99 1
CREATE TABLE MediaType (
MediaTypeId INTEGER NOT NULL,
Name NVARCHAR(120)
)
SELECT * FROM "MediaType" LIMIT 3;
MediaTypeId Name
1 MPEG audio file
2 Protected AAC audio file
3 Protected MPEG-4 video file
CREATE TABLE Playlist (
PlaylistId INTEGER NOT NULL,
Name NVARCHAR(120)
)
SELECT * FROM "Playlist" LIMIT 3;
PlaylistId Name
1 Music
2 Movies
3 TV Shows
CREATE TABLE PlaylistTrack (
PlaylistId INTEGER NOT NULL,
TrackId INTEGER NOT NULL
)
SELECT * FROM "PlaylistTrack" LIMIT 3;
PlaylistId TrackId
1 3402
1 3389
1 3390
CREATE TABLE Track (
TrackId INTEGER NOT NULL,
Name NVARCHAR(200) NOT NULL,
AlbumId INTEGER,
MediaTypeId INTEGER NOT NULL,
GenreId INTEGER,
Composer NVARCHAR(220),
Milliseconds INTEGER NOT NULL,
Bytes INTEGER,
UnitPrice NUMERIC(10,2) NOT NULL
)
SELECT * FROM "Track" LIMIT 3;
TrackId Name AlbumId MediaTypeId GenreId Composer Milliseconds Bytes UnitPrice
1 For Those About To Rock (We Salute You) 1 1 1 Angus Young,
Malcolm Young,
Brian Johnson 343719 11170334 0.99
2 Balls to the Wall 2 2 1 U. Dirkschneider,
W. Hoffmann,
H. Frank,
P. Baltes,
S. Kaufmann,
G. Hoffmann 342562 5510424 0.99
3 Fast As a Shark 3 2 1 F. Baltes,
S. Kaufman,
U. Dirkscneider & W. Hoffman 230619 3990994 0.99
*/
API Reference:
快照示例
¥Few-shot examples
在提示中包含将自然语言问题转换为针对数据库的有效 SQL 查询的示例,通常会提高模型性能,尤其是对于复杂查询。
¥Including examples of natural language questions being converted to valid SQL queries against our database in the prompt will often improve model performance, especially for complex queries.
假设我们有以下示例:
¥Let's say we have the following examples:
export const examples = [
{ input: "List all artists.", query: "SELECT * FROM Artist;" },
{
input: "Find all albums for the artist 'AC/DC'.",
query:
"SELECT * FROM Album WHERE ArtistId = (SELECT ArtistId FROM Artist WHERE Name = 'AC/DC');",
},
{
input: "List all tracks in the 'Rock' genre.",
query:
"SELECT * FROM Track WHERE GenreId = (SELECT GenreId FROM Genre WHERE Name = 'Rock');",
},
{
input: "Find the total duration of all tracks.",
query: "SELECT SUM(Milliseconds) FROM Track;",
},
{
input: "List all customers from Canada.",
query: "SELECT * FROM Customer WHERE Country = 'Canada';",
},
{
input: "How many tracks are there in the album with ID 5?",
query: "SELECT COUNT(*) FROM Track WHERE AlbumId = 5;",
},
{
input: "Find the total number of invoices.",
query: "SELECT COUNT(*) FROM Invoice;",
},
{
input: "List all tracks that are longer than 5 minutes.",
query: "SELECT * FROM Track WHERE Milliseconds > 300000;",
},
{
input: "Who are the top 5 customers by total purchase?",
query:
"SELECT CustomerId, SUM(Total) AS TotalPurchase FROM Invoice GROUP BY CustomerId ORDER BY TotalPurchase DESC LIMIT 5;",
},
{
input: "Which albums are from the year 2000?",
query: "SELECT * FROM Album WHERE strftime('%Y', ReleaseDate) = '2000';",
},
{
input: "How many employees are there",
query: 'SELECT COUNT(*) FROM "Employee"',
},
];
API Reference:
我们可以像这样用它们创建一个简短的提示:
¥We can create a few-shot prompt with them like so:
import { FewShotPromptTemplate, PromptTemplate } from "@langchain/core/prompts";
import { examples } from "./examples.js";
const examplePrompt = PromptTemplate.fromTemplate(
`User input: {input}\nSQL Query: {query}`
);
const prompt = new FewShotPromptTemplate({
examples: examples.slice(0, 5),
examplePrompt,
prefix: `You are a SQLite expert. Given an input question, create a syntactically correct SQLite query to run.
Unless otherwise specified, do not return more than {top_k} rows.
Here is the relevant table info: {table_info}
Below are a number of examples of questions and their corresponding SQL queries.`,
suffix: "User input: {input}\nSQL query: ",
inputVariables: ["input", "top_k", "table_info"],
});
console.log(
await prompt.format({
input: "How many artists are there?",
top_k: "3",
table_info: "foo",
})
);
/**
You are a SQLite expert. Given an input question, create a syntactically correct SQLite query to run.
Unless otherwise specified, do not return more than 3 rows.
Here is the relevant table info: foo
Below are a number of examples of questions and their corresponding SQL queries.
User input: List all artists.
SQL Query: SELECT * FROM Artist;
User input: Find all albums for the artist 'AC/DC'.
SQL Query: SELECT * FROM Album WHERE ArtistId = (SELECT ArtistId FROM Artist WHERE Name = 'AC/DC');
User input: List all tracks in the 'Rock' genre.
SQL Query: SELECT * FROM Track WHERE GenreId = (SELECT GenreId FROM Genre WHERE Name = 'Rock');
User input: Find the total duration of all tracks.
SQL Query: SELECT SUM(Milliseconds) FROM Track;
User input: List all customers from Canada.
SQL Query: SELECT * FROM Customer WHERE Country = 'Canada';
User input: How many artists are there?
SQL query:
*/
API Reference:
- FewShotPromptTemplate from
@langchain/core/prompts
- PromptTemplate from
@langchain/core/prompts
动态小样本示例
¥Dynamic few-shot examples
如果我们有足够多的示例,我们可能只想在提示中包含最相关的示例,因为它们不适合模型的上下文窗口,或者因为示例的长尾会分散模型的注意力。具体来说,对于任何输入,我们都希望包含与该输入最相关的示例。
¥If we have enough examples, we may want to only include the most relevant ones in the prompt, either because they don't fit in the model's context window or because the long tail of examples distracts the model. And specifically, given any input we want to include the examples most relevant to that input.
我们可以使用 ExampleSelector 来实现这一点。在本例中,我们将使用 SemanticSimilarityExampleSelector
,它将示例存储在我们选择的向量数据库中。运行时,它会在输入和我们的示例之间执行相似性搜索,并返回语义最相似的示例:
¥We can do just this using an ExampleSelector. In this case we'll use a SemanticSimilarityExampleSelector
,
which will store the examples in the vector database of our choosing.
At runtime it will perform a similarity search between the input and our examples, and return the most semantically similar ones:
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { SemanticSimilarityExampleSelector } from "@langchain/core/example_selectors";
import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai";
import { FewShotPromptTemplate, PromptTemplate } from "@langchain/core/prompts";
import { createSqlQueryChain } from "langchain/chains/sql_db";
import { examples } from "./examples.js";
import { db } from "../db.js";
const exampleSelector = await SemanticSimilarityExampleSelector.fromExamples<
typeof MemoryVectorStore
>(examples, new OpenAIEmbeddings(), MemoryVectorStore, {
k: 5,
inputKeys: ["input"],
});
console.log(
await exampleSelector.selectExamples({ input: "how many artists are there?" })
);
/**
[
{ input: 'List all artists.', query: 'SELECT * FROM Artist;' },
{
input: 'How many employees are there',
query: 'SELECT COUNT(*) FROM "Employee"'
},
{
input: 'How many tracks are there in the album with ID 5?',
query: 'SELECT COUNT(*) FROM Track WHERE AlbumId = 5;'
},
{
input: 'Which albums are from the year 2000?',
query: "SELECT * FROM Album WHERE strftime('%Y', ReleaseDate) = '2000';"
},
{
input: "List all tracks in the 'Rock' genre.",
query: "SELECT * FROM Track WHERE GenreId = (SELECT GenreId FROM Genre WHERE Name = 'Rock');"
}
]
*/
// To use it, we can pass the ExampleSelector directly in to our FewShotPromptTemplate:
const examplePrompt = PromptTemplate.fromTemplate(
`User input: {input}\nSQL Query: {query}`
);
const prompt = new FewShotPromptTemplate({
exampleSelector,
examplePrompt,
prefix: `You are a SQLite expert. Given an input question, create a syntactically correct SQLite query to run.
Unless otherwise specified, do not return more than {top_k} rows.
Here is the relevant table info: {table_info}
Below are a number of examples of questions and their corresponding SQL queries.`,
suffix: "User input: {input}\nSQL query: ",
inputVariables: ["input", "top_k", "table_info"],
});
console.log(
await prompt.format({
input: "How many artists are there?",
top_k: "3",
table_info: "foo",
})
);
/**
You are a SQLite expert. Given an input question, create a syntactically correct SQLite query to run.
Unless otherwise specified, do not return more than 3 rows.
Here is the relevant table info: foo
Below are a number of examples of questions and their corresponding SQL queries.
User input: List all artists.
SQL Query: SELECT * FROM Artist;
User input: How many employees are there
SQL Query: SELECT COUNT(*) FROM "Employee"
User input: How many tracks are there in the album with ID 5?
SQL Query: SELECT COUNT(*) FROM Track WHERE AlbumId = 5;
User input: Which albums are from the year 2000?
SQL Query: SELECT * FROM Album WHERE strftime('%Y', ReleaseDate) = '2000';
User input: List all tracks in the 'Rock' genre.
SQL Query: SELECT * FROM Track WHERE GenreId = (SELECT GenreId FROM Genre WHERE Name = 'Rock');
User input: How many artists are there?
SQL query:
*/
// Now we can use it in a chain:
const llm = new ChatOpenAI({
temperature: 0,
});
const chain = await createSqlQueryChain({
db,
llm,
prompt,
dialect: "sqlite",
});
console.log(await chain.invoke({ question: "how many artists are there?" }));
/**
SELECT COUNT(*) FROM Artist;
*/
API Reference:
- MemoryVectorStore from
langchain/vectorstores/memory
- SemanticSimilarityExampleSelector from
@langchain/core/example_selectors
- ChatOpenAI from
@langchain/openai
- OpenAIEmbeddings from
@langchain/openai
- FewShotPromptTemplate from
@langchain/core/prompts
- PromptTemplate from
@langchain/core/prompts
- createSqlQueryChain from
langchain/chains/sql_db
后续步骤
¥Next steps
现在你已经学习了一些用于改进 SQL 生成的提示策略。
¥You've now learned about some prompting strategies to improve SQL generation.
接下来,查看本节中的其他指南,例如 如何查询大型数据库。
¥Next, check out some of the other guides in this section, like how to query over large databases.