Skip to main content

Google Vertex AI 匹配引擎

¥Google Vertex AI Matching Engine

Compatibility

仅在 Node.js 上可用。

¥Only available on Node.js.

Google Vertex AI 匹配引擎提供业界领先的高规模低延迟矢量数据库。这些向量数据库通常被称为向量相似性匹配或近似最近邻 (ANN) 服务。

¥The Google Vertex AI Matching Engine "provides the industry's leading high-scale low latency vector database. These vector databases are commonly referred to as vector similarity-matching or an approximate nearest neighbor (ANN) service."

设置

¥Setup

caution

此模块需要已创建端点和已部署的索引,因为创建时间需要近一小时。要了解更多信息,请参阅 LangChain Python 文档 创建索引并将其部署到终端

¥This module expects an endpoint and deployed index already created as the creation time takes close to one hour. To learn more, see the LangChain python documentation Create Index and deploy it to an Endpoint.

在运行此代码之前,你应该确保已在 Google Cloud 信息中心中为相关项目启用了 Vertex AI API,并且你已使用以下方法之一向 Google Cloud 进行身份验证:

¥Before running this code, you should make sure the Vertex AI API is enabled for the relevant project in your Google Cloud dashboard and that you've authenticated to Google Cloud using one of these methods:

  • 你已登录到该项目允许的账户(使用 gcloud auth application-default login)。

    ¥You are logged into an account (using gcloud auth application-default login) permitted to that project.

  • 你正在使用该项目允许的服务账户在一台机器上运行。

    ¥You are running on a machine using a service account that is permitted to the project.

  • 你已下载允许访问项目的服务账户的凭据,并将 GOOGLE_APPLICATION_CREDENTIALS 环境变量设置为此文件的路径。

    ¥You have downloaded the credentials for a service account that is permitted to the project and set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path of this file.

使用以下命令安装身份验证库:

¥Install the authentication library with:

npm install @langchain/community @langchain/core google-auth-library

匹配引擎不存储实际文档内容,只存储嵌入向量。因此,你需要一个文档库。以下示例使用 Google Cloud Storage,它需要以下内容:

¥The Matching Engine does not store the actual document contents, only embeddings. Therefore, you'll need a docstore. The below example uses Google Cloud Storage, which requires the following:

npm install @google-cloud/storage

用法

¥Usage

初始化引擎

¥Initializing the engine

创建 MatchingEngine 对象时,你需要一些有关匹配引擎配置的信息。你可以从匹配引擎的云控制台获取以下信息:

¥When creating the MatchingEngine object, you'll need some information about the matching engine configuration. You can get this information from the Cloud Console for Matching Engine:

  • 索引的 ID

    ¥The id for the Index

  • 索引端点的 ID

    ¥The id for the Index Endpoint

你还需要一个文档存储。虽然 InMemoryDocstore 可以用于初始测试,但你需要使用像 GoogleCloudStorageDocstore 这样的格式来更永久地存储它。

¥You will also need a document store. While an InMemoryDocstore is ok for initial testing, you will want to use something like a GoogleCloudStorageDocstore to store it more permanently.

import { MatchingEngine } from "@langchain/community/vectorstores/googlevertexai";
import { Document } from "langchain/document";
import { SyntheticEmbeddings } from "langchain/embeddings/fake";
import { GoogleCloudStorageDocstore } from "@langchain/community/stores/doc/gcs";

const embeddings = new SyntheticEmbeddings({
vectorSize: Number.parseInt(
process.env.SYNTHETIC_EMBEDDINGS_VECTOR_SIZE ?? "768",
10
),
});

const store = new GoogleCloudStorageDocstore({
bucket: process.env.GOOGLE_CLOUD_STORAGE_BUCKET!,
});

const config = {
index: process.env.GOOGLE_VERTEXAI_MATCHINGENGINE_INDEX!,
indexEndpoint: process.env.GOOGLE_VERTEXAI_MATCHINGENGINE_INDEXENDPOINT!,
apiVersion: "v1beta1",
docstore: store,
};

const engine = new MatchingEngine(embeddings, config);

添加文档

¥Adding documents

const doc = new Document({ pageContent: "this" });
await engine.addDocuments([doc]);

文档中的任何元数据都会转换为匹配引擎 "允许列表" 值,可用于在查询期间进行过滤。

¥Any metadata in a document is converted into Matching Engine "allow list" values that can be used to filter during a query.

const documents = [
new Document({
pageContent: "this apple",
metadata: {
color: "red",
category: "edible",
},
}),
new Document({
pageContent: "this blueberry",
metadata: {
color: "blue",
category: "edible",
},
}),
new Document({
pageContent: "this firetruck",
metadata: {
color: "red",
category: "machine",
},
}),
];

// Add all our documents
await engine.addDocuments(documents);

假定文档也包含 "id" 参数。如果未设置,则会分配一个 ID 并将其作为文档的一部分返回。

¥The documents are assumed to have an "id" parameter available as well. If this is not set, then an ID will be assigned and returned as part of the Document.

查询文档

¥Querying documents

使用以下任何标准方法都可以进行简单的 k 最近邻搜索并返回所有结果:

¥Doing a straightforward k-nearest-neighbor search which returns all results is done using any of the standard methods:

const results = await engine.similaritySearch("this");

使用过滤器/限制查询文档

¥Querying documents with a filter / restriction

我们可以根据为文档设置的元数据来限制返回的文档。因此,如果我们只想将结果限制为红色,我们可以这样做:

¥We can limit what documents are returned based on the metadata that was set for the document. So if we just wanted to limit the results to those with a red color, we can do:

import { Restriction } from `@langchain/community/vectorstores/googlevertexai`;

const redFilter: Restriction[] = [
{
namespace: "color",
allowList: ["red"],
},
];
const redResults = await engine.similaritySearch("this", 4, redFilter);

如果我们想要做一些更复杂的事情,例如红色但不可食用的东西:

¥If we wanted to do something more complicated, like things that are red, but not edible:

const filter: Restriction[] = [
{
namespace: "color",
allowList: ["red"],
},
{
namespace: "category",
denyList: ["edible"],
},
];
const results = await engine.similaritySearch("this", 4, filter);

删除文档

¥Deleting documents

使用 ID 删除文档。

¥Deleting documents are done using ID.

import { IdDocument } from `@langchain/community/vectorstores/googlevertexai`;

const oldResults: IdDocument[] = await engine.similaritySearch("this", 10);
const oldIds = oldResults.map( doc => doc.id! );
await engine.delete({ids: oldIds});

¥Related