Skip to main content

Llama CPP

Compatibility

仅在 Node.js 上可用。

¥Only available on Node.js.

此模块基于 llama.cppnode-llama-cpp Node.js 绑定,允许你使用本地运行的 LLM。这允许你使用能够在注意本电脑环境中运行的更小的量化模型,非常适合测试和临时填充想法,而无需支付费用!

¥This module is based on the node-llama-cpp Node.js bindings for llama.cpp, allowing you to work with a locally running LLM. This allows you to work with a much smaller quantized model capable of running on a laptop environment, ideal for testing and scratch padding ideas without running up a bill!

设置

¥Setup

你需要安装 node-llama-cpp 模块的主要版本 3 才能与本地模型通信。

¥You'll need to install major version 3 of the node-llama-cpp module to communicate with your local model.

npm install -S node-llama-cpp@3 @langchain/community @langchain/core

你还需要一个本地 Llama 3 模型(或 node-llama-cpp 支持的模型)。你需要将此模型的路径作为参数的一部分传递给 LlamaCpp 模块(参见示例)。

¥You will also need a local Llama 3 model (or a model supported by node-llama-cpp). You will need to pass the path to this model to the LlamaCpp module as a part of the parameters (see example).

开箱即用的 node-llama-cpp 已针对在 MacOS 平台上运行进行了调整,并支持 Apple M 系列处理器的 Metal GPU。如果你需要关闭此功能或需要 CUDA 架构支持,请参阅 node-llama-cpp 上的文档。

¥Out-of-the-box node-llama-cpp is tuned for running on a MacOS platform with support for the Metal GPU of Apple M-series of processors. If you need to turn this off or need support for the CUDA architecture then refer to the documentation at node-llama-cpp.

有关获取和准备 llama3 的建议,请参阅此模块的 LLM 版本文档。

¥For advice on getting and preparing llama3 see the documentation for the LLM version of this module.

致 LangChain.js 贡献者:如果你想运行与此模块相关的测试,则需要将本地模型的路径放入环境变量 LLAMA_PATH 中。

¥A note to LangChain.js contributors: if you want to run the tests associated with this module you will need to put the path to your local model in the environment variable LLAMA_PATH.

用法

¥Usage

基本用法

¥Basic use

在本例中,我们传入一个封装为消息的提示符,并期望得到响应。

¥In this case we pass in a prompt wrapped as a message and expect a response.

import { ChatLlamaCpp } from "@langchain/community/chat_models/llama_cpp";
import { HumanMessage } from "@langchain/core/messages";

const llamaPath = "/Replace/with/path/to/your/model/gguf-llama3-Q4_0.bin";

const model = await ChatLlamaCpp.initialize({ modelPath: llamaPath });

const response = await model.invoke([
new HumanMessage({ content: "My name is John." }),
]);
console.log({ response });

/*
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: 'Hello John.',
additional_kwargs: {}
},
lc_namespace: [ 'langchain', 'schema' ],
content: 'Hello John.',
name: undefined,
additional_kwargs: {}
}
*/

API Reference:

系统消息

¥System messages

我们还可以提供系统消息,请注意,使用 llama_cpp 模块时,系统消息将导致创建新会话。

¥We can also provide a system message, note that with the llama_cpp module a system message will cause the creation of a new session.

import { ChatLlamaCpp } from "@langchain/community/chat_models/llama_cpp";
import { SystemMessage, HumanMessage } from "@langchain/core/messages";

const llamaPath = "/Replace/with/path/to/your/model/gguf-llama3-Q4_0.bin";

const model = await ChatLlamaCpp.initialize({ modelPath: llamaPath });

const response = await model.invoke([
new SystemMessage(
"You are a pirate, responses must be very verbose and in pirate dialect, add 'Arr, m'hearty!' to each sentence."
),
new HumanMessage("Tell me where Llamas come from?"),
]);
console.log({ response });

/*
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: "Arr, m'hearty! Llamas come from the land of Peru.",
additional_kwargs: {}
},
lc_namespace: [ 'langchain', 'schema' ],
content: "Arr, m'hearty! Llamas come from the land of Peru.",
name: undefined,
additional_kwargs: {}
}
*/

API Reference:

¥Chains

此模块也可以与链一起使用,请注意,使用更复杂的链需要功能更强大的 llama3 版本,例如 70B 版本。

¥This module can also be used with chains, note that using more complex chains will require suitably powerful version of llama3 such as the 70B version.

import { ChatLlamaCpp } from "@langchain/community/chat_models/llama_cpp";
import { LLMChain } from "langchain/chains";
import { PromptTemplate } from "@langchain/core/prompts";

const llamaPath = "/Replace/with/path/to/your/model/gguf-llama3-Q4_0.bin";

const model = await ChatLlamaCpp.initialize({
modelPath: llamaPath,
temperature: 0.5,
});

const prompt = PromptTemplate.fromTemplate(
"What is a good name for a company that makes {product}?"
);
const chain = new LLMChain({ llm: model, prompt });

const response = await chain.invoke({ product: "colorful socks" });

console.log({ response });

/*
{
text: `I'm not sure what you mean by "colorful socks" but here are some ideas:\n` +
'\n' +
'- Sock-it to me!\n' +
'- Socks Away\n' +
'- Fancy Footwear'
}
*/

API Reference:

流式传输

¥Streaming

我们还可以使用 Llama CPP 进行流式传输,这可以使用原始 '单个提示' 字符串:

¥We can also stream with Llama CPP, this can be using a raw 'single prompt' string:

import { ChatLlamaCpp } from "@langchain/community/chat_models/llama_cpp";

const llamaPath = "/Replace/with/path/to/your/model/gguf-llama3-Q4_0.bin";

const model = await ChatLlamaCpp.initialize({
modelPath: llamaPath,
temperature: 0.7,
});

const stream = await model.stream("Tell me a short story about a happy Llama.");

for await (const chunk of stream) {
console.log(chunk.content);
}

/*

Once
upon
a
time
,
in
a
green
and
sunny
field
...
*/

API Reference:

  • ChatLlamaCpp from @langchain/community/chat_models/llama_cpp

或者你可以提供多条消息,请注意,这会获取输入,然后向模型提交 Llama3 格式的提示。

¥Or you can provide multiple messages, note that this takes the input and then submits a Llama3 formatted prompt to the model.

import { ChatLlamaCpp } from "@langchain/community/chat_models/llama_cpp";
import { SystemMessage, HumanMessage } from "@langchain/core/messages";

const llamaPath = "/Replace/with/path/to/your/model/gguf-llama3-Q4_0.bin";

const llamaCpp = await ChatLlamaCpp.initialize({
modelPath: llamaPath,
temperature: 0.7,
});

const stream = await llamaCpp.stream([
new SystemMessage(
"You are a pirate, responses must be very verbose and in pirate dialect."
),
new HumanMessage("Tell me about Llamas?"),
]);

for await (const chunk of stream) {
console.log(chunk.content);
}

/*

Ar
rr
r
,
me
heart
y
!

Ye
be
ask
in
'
about
llam
as
,
e
h
?
...
*/

API Reference:

使用 invoke 方法,我们还可以实现流生成,并使用 signal 中止生成。

¥Using the invoke method, we can also achieve stream generation, and use signal to abort the generation.

import { ChatLlamaCpp } from "@langchain/community/chat_models/llama_cpp";
import { SystemMessage, HumanMessage } from "@langchain/core/messages";

const llamaPath = "/Replace/with/path/to/your/model/gguf-llama3-Q4_0.bin";

const model = await ChatLlamaCpp.initialize({
modelPath: llamaPath,
temperature: 0.7,
});

const controller = new AbortController();

setTimeout(() => {
controller.abort();
console.log("Aborted");
}, 5000);

await model.invoke(
[
new SystemMessage(
"You are a pirate, responses must be very verbose and in pirate dialect."
),
new HumanMessage("Tell me about Llamas?"),
],
{
signal: controller.signal,
callbacks: [
{
handleLLMNewToken(token) {
console.log(token);
},
},
],
}
);
/*

Once
upon
a
time
,
in
a
green
and
sunny
field
...
Aborted

AbortError

*/

API Reference:

¥Related