Couchbase
Couchbase 是一个屡获殊荣的分布式 NoSQL 云数据库,可为你的所有云、移动、AI 和边缘计算应用提供无与伦比的多功能性、性能、可扩展性和经济价值。Couchbase 拥抱人工智能,为开发者提供编码帮助,并为他们的应用提供向量搜索。
¥Couchbase is an award-winning distributed NoSQL cloud database that delivers unmatched versatility, performance, scalability, and financial value for all of your cloud, mobile, AI, and edge computing applications. Couchbase embraces AI with coding assistance for developers and vector search for their applications.
向量搜索是 Couchbase 中 全文搜索服务(搜索服务)的一部分。
¥Vector Search is a part of the Full Text Search Service (Search Service) in Couchbase.
本教程讲解如何在 Couchbase 中使用向量搜索。你可以使用 Couchbase Capella 和你自行管理的 Couchbase 服务器。
¥This tutorial explains how to use Vector Search in Couchbase. You can work with both Couchbase Capella and your self-managed Couchbase Server.
安装
¥Installation
你需要 Couchbase 和 langchain 社区才能使用 Couchbase 向量存储。在本教程中,我们将使用 OpenAI 嵌入。
¥You will need couchbase and langchain community to use couchbase vector store. For this tutorial, we will use OpenAI embeddings
- npm
- Yarn
- pnpm
npm install couchbase @langchain/openai @langchain/community @langchain/core
yarn add couchbase @langchain/openai @langchain/community @langchain/core
pnpm add couchbase @langchain/openai @langchain/community @langchain/core
创建 Couchbase 连接对象
¥Create Couchbase Connection Object
我们首先创建与 Couchbase 集群的连接,然后将集群对象传递给向量存储。这里,我们使用用户名和密码进行连接。你还可以使用任何其他受支持的方式连接到你的集群。
¥We create a connection to the Couchbase cluster initially and then pass the cluster object to the Vector Store. Here, we are connecting using the username and password. You can also connect using any other supported way to your cluster.
有关在数据库级别创建索引的更多信息(例如参数要求),请参阅 Node SDK 文档。
¥For more information on connecting to the Couchbase cluster, please check the Node SDK documentation.
import { Cluster } from "couchbase";
const connectionString = "couchbase://localhost"; // or couchbases://localhost if you are using TLS
const dbUsername = "Administrator"; // valid database user with read access to the bucket being queried
const dbPassword = "Password"; // password for the database user
const couchbaseClient = await Cluster.connect(connectionString, {
username: dbUsername,
password: dbPassword,
configProfile: "wanDevelopment",
});
创建搜索索引
¥Create the Search Index
目前,搜索索引需要从 Couchbase Capella 或服务器 UI 或使用 REST 接口创建。
¥Currently, the Search index needs to be created from the Couchbase Capella or Server UI or using the REST interface.
在本例中,我们将使用 UI 上搜索服务的“导入索引”功能。
¥For this example, let us use the Import Index feature on the Search Service on the UI.
让我们在测试存储桶上定义一个名为 vector-index
的搜索索引。我们正在 _default
集合上为 testing
bucket 的 _default
作用域定义一个索引,向量字段设置为 embedding
(维度为 1536),文本字段设置为 text
。我们还将索引和存储文档中 metadata
下的所有字段,并将其作为动态映射,以适应不同的文档结构。相似度指标设置为 dot_product
。
¥Let us define a Search index with the name vector-index
on the testing bucket.
We are defining an index on the testing
bucket's _default
scope on the _default
collection with the vector field set to embedding
with 1536 dimensions and the text field set to text
.
We are also indexing and storing all the fields under metadata
in the document as a dynamic mapping to account for varying document structures. The similarity metric is set to dot_product
.
如何将索引导入全文搜索服务?
¥How to Import an Index to the Full Text Search service?
点击“搜索”->“添加索引”->“导入”
¥Click on Search -> Add Index -> Import
在“导入”屏幕中复制以下索引定义
¥Copy the following Index definition in the Import screen
点击“创建索引”创建索引。
¥Click on Create Index to create the index.
将以下索引定义复制到新文件
index.json
¥Copy the following index definition to a new file
index.json
按照文档中的说明将文件导入 Capella。
¥Import the file in Capella using the instructions in the documentation.
点击“创建索引”创建索引。
¥Click on Create Index to create the index.
索引定义
¥Index Definition
{
"name": "vector-index",
"type": "fulltext-index",
"params": {
"doc_config": {
"docid_prefix_delim": "",
"docid_regexp": "",
"mode": "type_field",
"type_field": "type"
},
"mapping": {
"default_analyzer": "standard",
"default_datetime_parser": "dateTimeOptional",
"default_field": "_all",
"default_mapping": {
"dynamic": true,
"enabled": true,
"properties": {
"metadata": {
"dynamic": true,
"enabled": true
},
"embedding": {
"enabled": true,
"dynamic": false,
"fields": [
{
"dims": 1536,
"index": true,
"name": "embedding",
"similarity": "dot_product",
"type": "vector",
"vector_index_optimized_for": "recall"
}
]
},
"text": {
"enabled": true,
"dynamic": false,
"fields": [
{
"index": true,
"name": "text",
"store": true,
"type": "text"
}
]
}
}
},
"default_type": "_default",
"docvalues_dynamic": false,
"index_dynamic": true,
"store_dynamic": true,
"type_field": "_type"
},
"store": {
"indexType": "scorch",
"segmentVersion": 16
}
},
"sourceType": "gocbcore",
"sourceName": "testing",
"sourceParams": {},
"planParams": {
"maxPartitionsPerPIndex": 103,
"indexPartitions": 10,
"numReplicas": 0
}
}
有关对话中内存处理的更多详细信息,请参阅 。
¥For more details on how to create a search index with support for Vector fields, please refer to the documentation:
要使用此向量存储,需要配置 CouchbaseVectorStoreArgs。textKey 和 embeddingKey 是可选字段,如果你想使用特定的键,则为必填项。
¥For using this vector store, CouchbaseVectorStoreArgs needs to be configured. textKey and embeddingKey are optional fields, required if you want to use specific keys
const couchbaseConfig: CouchbaseVectorStoreArgs = {
cluster: couchbaseClient,
bucketName: "testing",
scopeName: "_default",
collectionName: "_default",
indexName: "vector-index",
textKey: "text",
embeddingKey: "embedding",
};
创建向量存储
¥Create Vector Store
我们使用集群信息和搜索索引名称创建向量存储对象。
¥We create the vector store object with the cluster information and the search index name.
const store = await CouchbaseVectorStore.initialize(
embeddings, // embeddings object to create embeddings from text
couchbaseConfig
);
基本向量搜索示例
¥Basic Vector Search Example
以下示例展示了如何使用 CouchBase 向量搜索并执行相似性搜索。在本例中,我们将通过 TextLoader 加载 "state_of_the_union.txt" 文件,将文本分块成 500 个字符且不重叠的块,然后将所有这些块索引到 Couchbase 中。
¥The following example showcases how to use couchbase vector search and perform similarity search. For this example, we are going to load the "state_of_the_union.txt" file via the TextLoader, chunk the text into 500 character chunks with no overlaps and index all these chunks into Couchbase.
数据被索引后,我们会执行一个简单的查询,找出与查询 "总裁对 Ketanji Brown Jackson 的评价是什么?" 相似的前 4 个块。最后,它还展示了如何获取相似度得分。
¥After the data is indexed, we perform a simple query to find the top 4 chunks that are similar to the query "What did president say about Ketanji Brown Jackson". At the end, it also shows how to get similarity score
import { OpenAIEmbeddings } from "@langchain/openai";
import {
CouchbaseVectorStoreArgs,
CouchbaseVectorStore,
} from "@langchain/community/vectorstores/couchbase";
import { Cluster } from "couchbase";
import { TextLoader } from "langchain/document_loaders/fs/text";
import { CharacterTextSplitter } from "@langchain/textsplitters";
const connectionString =
process.env.COUCHBASE_DB_CONN_STR ?? "couchbase://localhost";
const databaseUsername = process.env.COUCHBASE_DB_USERNAME ?? "Administrator";
const databasePassword = process.env.COUCHBASE_DB_PASSWORD ?? "Password";
// Load documents from file
const loader = new TextLoader("./state_of_the_union.txt");
const rawDocuments = await loader.load();
const splitter = new CharacterTextSplitter({
chunkSize: 500,
chunkOverlap: 0,
});
const docs = await splitter.splitDocuments(rawDocuments);
const couchbaseClient = await Cluster.connect(connectionString, {
username: databaseUsername,
password: databasePassword,
configProfile: "wanDevelopment",
});
// Open AI API Key is required to use OpenAIEmbeddings, some other embeddings may also be used
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
});
const couchbaseConfig: CouchbaseVectorStoreArgs = {
cluster: couchbaseClient,
bucketName: "testing",
scopeName: "_default",
collectionName: "_default",
indexName: "vector-index",
textKey: "text",
embeddingKey: "embedding",
};
const store = await CouchbaseVectorStore.fromDocuments(
docs,
embeddings,
couchbaseConfig
);
const query = "What did president say about Ketanji Brown Jackson";
const resultsSimilaritySearch = await store.similaritySearch(query);
console.log("resulting documents: ", resultsSimilaritySearch[0]);
// Similarity Search With Score
const resultsSimilaritySearchWithScore = await store.similaritySearchWithScore(
query,
1
);
console.log("resulting documents: ", resultsSimilaritySearchWithScore[0][0]);
console.log("resulting scores: ", resultsSimilaritySearchWithScore[0][1]);
const result = await store.similaritySearch(query, 1, {
fields: ["metadata.source"],
});
console.log(result[0]);
指定要返回的字段
¥Specifying Fields to Return
你可以在搜索期间使用过滤器中的 fields
参数指定要从文档返回的字段。这些字段作为 metadata
对象的一部分返回。你可以获取存储在索引中的任何字段。文档的 textKey
会作为文档 pageContent
的一部分返回。
¥You can specify the fields to return from the document using fields
parameter in the filter during searches.
These fields are returned as part of the metadata
object. You can fetch any field that is stored in the index.
The textKey
of the document is returned as part of the document's pageContent
.
如果你未指定要获取的任何字段,则将返回索引中存储的所有字段。
¥If you do not specify any fields to be fetched, all the fields stored in the index are returned.
如果你想获取元数据中的某个字段,则需要使用 .
指定它。例如,要获取元数据中的 source
字段,则需要使用 metadata.source
。
¥If you want to fetch one of the fields in the metadata, you need to specify it using .
For example, to fetch the source
field in the metadata, you need to use metadata.source
.
const result = await store.similaritySearch(query, 1, {
fields: ["metadata.source"],
});
console.log(result[0]);
混合搜索
¥Hybrid Search
Couchbase 允许你通过将向量搜索结果与文档中非向量字段(例如 metadata
对象)的搜索相结合来进行混合搜索。
¥Couchbase allows you to do hybrid searches by combining vector search results with searches on non-vector fields of the document like the metadata
object.
结果将基于向量搜索和全文搜索服务支持的搜索结果的组合。将每个组件搜索的得分相加,得到结果的总得分。
¥The results will be based on the combination of the results from both vector search and the searches supported by full text search service. The scores of each of the component searches are added up to get the total score of the result.
要执行混合搜索,有一个可选键,即 fields
参数中的 searchOptions
,可以传递给所有相似性搜索。searchOptions
的不同搜索/查询可能性可以在 此处 中找到。
¥To perform hybrid searches, there is an optional key, searchOptions
in fields
parameter that can be passed to all the similarity searches.\ The different search/query possibilities for the searchOptions
can be found here.
为混合搜索创建多样化元数据
¥Create Diverse Metadata for Hybrid Search
为了模拟混合搜索,让我们从现有文档中创建一些随机元数据。我们统一向元数据添加三个字段:date
表示 2010 年至 2020 年之间的数据,rating
表示 1 至 5 年之间的数据,author
设置为 John Doe 或 Jane Doe。我们还将声明一些示例查询。
¥In order to simulate hybrid search, let us create some random metadata from the existing documents.
We uniformly add three fields to the metadata, date
between 2010 & 2020, rating
between 1 & 5 and author
set to either John Doe or Jane Doe.
We will also declare few sample queries.
for (let i = 0; i < docs.length; i += 1) {
docs[i].metadata.date = `${2010 + (i % 10)}-01-01`;
docs[i].metadata.rating = 1 + (i % 5);
docs[i].metadata.author = ["John Doe", "Jane Doe"][i % 2];
}
const store = await CouchbaseVectorStore.fromDocuments(
docs,
embeddings,
couchbaseConfig
);
const query = "What did the president say about Ketanji Brown Jackson";
const independenceQuery = "Any mention about independence?";
示例:按精确值搜索
¥Example: Search by Exact Value
我们可以在文本字段(例如 metadata
对象中的作者)上搜索完全匹配。
¥We can search for exact matches on a textual field like the author in the metadata
object.
const exactValueResult = await store.similaritySearch(query, 4, {
fields: ["metadata.author"],
searchOptions: {
query: { field: "metadata.author", match: "John Doe" },
},
});
console.log(exactValueResult[0]);
示例:按部分匹配搜索
¥Example: Search by Partial Match
我们可以通过指定搜索的模糊性来搜索部分匹配。当你想要搜索查询的细微变化或拼写错误时,这很有用。
¥We can search for partial matches by specifying a fuzziness for the search. This is useful when you want to search for slight variations or misspellings of a search query.
这里,"Johny" 与 "John Doe" 接近(模糊度为 1)。
¥Here, "Johny" is close (fuzziness of 1) to "John Doe".
const partialMatchResult = await store.similaritySearch(query, 4, {
fields: ["metadata.author"],
searchOptions: {
query: { field: "metadata.author", match: "Johny", fuzziness: 1 },
},
});
console.log(partialMatchResult[0]);
示例:按日期范围查询搜索
¥Example: Search by Date Range Query
我们可以在日期字段(例如 metadata.date
)上搜索日期范围内的文档。
¥We can search for documents that are within a date range query on a date field like metadata.date
.
const dateRangeResult = await store.similaritySearch(independenceQuery, 4, {
fields: ["metadata.date", "metadata.author"],
searchOptions: {
query: {
start: "2016-12-31",
end: "2017-01-02",
inclusiveStart: true,
inclusiveEnd: false,
field: "metadata.date",
},
},
});
console.log(dateRangeResult[0]);
示例:按数值范围查询搜索
¥Example: Search by Numeric Range Query
我们可以在数字字段(例如 metadata.rating
)上搜索一定范围内的文档。
¥We can search for documents that are within a range for a numeric field like metadata.rating
.
const ratingRangeResult = await store.similaritySearch(independenceQuery, 4, {
fields: ["metadata.rating"],
searchOptions: {
query: {
min: 3,
max: 5,
inclusiveMin: false,
inclusiveMax: true,
field: "metadata.rating",
},
},
});
console.log(ratingRangeResult[0]);
示例:组合多个搜索条件
¥Example: Combining Multiple Search Conditions
不同的查询可以使用 AND(连接)或 OR(分离)运算符进行组合。
¥Different queries can by combined using AND (conjuncts) or OR (disjuncts) operators.
在本例中,我们检查评分在 3 到 4 之间,且日期在 2015 到 2018 年之间的文档。
¥In this example, we are checking for documents with a rating between 3 & 4 and dated between 2015 & 2018.
const multipleConditionsResult = await store.similaritySearch(texts[0], 4, {
fields: ["metadata.rating", "metadata.date"],
searchOptions: {
query: {
conjuncts: [
{ min: 3, max: 4, inclusive_max: true, field: "metadata.rating" },
{ start: "2016-12-31", end: "2017-01-02", field: "metadata.date" },
],
},
},
});
console.log(multipleConditionsResult[0]);
其他查询
¥Other Queries
同样,你可以在 filter
参数的 searchOptions
键中使用任何受支持的查询方法,例如地理距离、多边形搜索、通配符、正则表达式等。有关可用查询方法及其语法的更多详细信息,请参阅文档。
¥Similarly, you can use any of the supported Query methods like Geo Distance, Polygon Search, Wildcard, Regular Expressions, etc in the searchOptions
Key of filter
parameter.
Please refer to the documentation for more details on the available query methods and their syntax.
常见问题
¥Frequently Asked Questions
问题:我是否应该在创建 CouchbaseVectorStore 对象之前创建搜索索引?
¥Question: Should I create the Search index before creating the CouchbaseVectorStore object?
是的,目前你需要在创建 CouchbaseVectorStore
对象之前创建搜索索引。
¥Yes, currently you need to create the Search index before creating the CouchbaseVectorStore
object.
问题:我没有在搜索结果中看到我指定的所有字段。
¥Question: I am not seeing all the fields that I specified in my search results.
在 Couchbase 中,我们只能返回存储在搜索索引中的字段。请确保你在搜索结果中尝试访问的字段是搜索索引的一部分。
¥In Couchbase, we can only return the fields stored in the Search index. Please ensure that the field that you are trying to access in the search results is part of the Search index.
处理此问题的一种方法是将文档的字段动态地索引并存储在索引中。
¥One way to handle this is to index and store a document's fields dynamically in the index.
在 Capella 中,你需要转到 "高级模式",然后在 V 形 "常规设置" 下勾选 "[X] 存储动态字段" 或 "[X] 索引动态字段"。
¥In Capella, you need to go to "Advanced Mode" then under the chevron "General Settings" you can check "[X] Store Dynamic Fields" or "[X] Index Dynamic Fields"
在 Couchbase Server 中,在索引编辑器(而非快速编辑器)中,你可以在 V 形 "高级" 下勾选 "[X] 存储动态字段" 或 "[X] 索引动态字段"。
¥In Couchbase Server, in the Index Editor (not Quick Editor) under the chevron "Advanced" you can check "[X] Store Dynamic Fields" or "[X] Index Dynamic Fields"
请注意,这些选项会增加索引的大小。
¥Note that these options will increase the size of the index.
有关连接到 Couchbase 集群的更多详细信息,请参阅 documentation。
¥For more details on dynamic mappings, please refer to the documentation.
问题:我无法在搜索结果中看到元数据对象。
¥Question: I am unable to see the metadata object in my search results.
这很可能是因为文档中的 metadata
字段未被 Couchbase Search 索引索引和/或存储。为了索引文档中的 metadata
字段,你需要将其作为子映射添加到索引中。
¥This is most likely due to the metadata
field in the document not being indexed and/or stored by the Couchbase Search index. In order to index the metadata
field in the document, you need to add it to the index as a child mapping.
如果你选择映射映射中的所有字段,则将能够按所有元数据字段进行搜索。或者,为了优化索引,你可以选择 metadata
对象中要索引的特定字段。你可以参考 docs 了解有关索引子映射的更多信息。
¥If you select to map all the fields in the mapping, you will be able to search by all metadata fields. Alternatively, to optimize the index, you can select the specific fields inside metadata
object to be indexed.
You can refer to the docs to learn more about indexing child mappings.
要创建子映射,你可以参考以下文档。 -
¥To create Child Mappings, you can refer to the following docs -
相关
¥Related
向量存储 概念指南
¥Vector store conceptual guide
向量存储 操作指南
¥Vector store how-to guides