Skip to main content

Llama CPP

Compatibility

仅在 Node.js 上可用。

¥Only available on Node.js.

此模块基于 llama.cppnode-llama-cpp Node.js 绑定,允许你使用本地运行的 LLM。这允许你使用能够在注意本电脑环境中运行的更小的量化模型,非常适合测试和临时填充想法,而无需支付费用!

¥This module is based on the node-llama-cpp Node.js bindings for llama.cpp, allowing you to work with a locally running LLM. This allows you to work with a much smaller quantized model capable of running on a laptop environment, ideal for testing and scratch padding ideas without running up a bill!

设置

¥Setup

你需要安装 node-llama-cpp 模块的主要版本 3 才能与本地模型通信。

¥You'll need to install major version 3 of the node-llama-cpp module to communicate with your local model.

npm install -S node-llama-cpp@3
npm install @langchain/community @langchain/core

你还需要一个本地 Llama 3 模型(或 node-llama-cpp 支持的模型)。你需要将此模型的路径作为参数的一部分传递给 LlamaCpp 模块(参见示例)。

¥You will also need a local Llama 3 model (or a model supported by node-llama-cpp). You will need to pass the path to this model to the LlamaCpp module as a part of the parameters (see example).

开箱即用的 node-llama-cpp 已针对在 MacOS 平台上运行进行了调整,并支持 Apple M 系列处理器的 Metal GPU。如果你需要关闭此功能或需要 CUDA 架构支持,请参阅 node-llama-cpp 上的文档。

¥Out-of-the-box node-llama-cpp is tuned for running on a MacOS platform with support for the Metal GPU of Apple M-series of processors. If you need to turn this off or need support for the CUDA architecture then refer to the documentation at node-llama-cpp.

致 LangChain.js 贡献者:如果你想运行与此模块相关的测试,则需要将本地模型的路径放入环境变量 LLAMA_PATH 中。

¥A note to LangChain.js contributors: if you want to run the tests associated with this module you will need to put the path to your local model in the environment variable LLAMA_PATH.

安装指南 Llama3

¥Guide to installing Llama3

在你的机器上运行本地 Llama3 模型是先决条件,因此本指南将指导你获取和构建 Llama 3.1-8B(最小版本),然后对其进行量化,使其能够在注意本电脑上流畅运行。为此,你的机器上需要安装 python3(建议使用 3.11 版本),此外还需要 gccmake,以便构建 llama.cpp

¥Getting a local Llama3 model running on your machine is a pre-req so this is a quick guide to getting and building Llama 3.1-8B (the smallest) and then quantizing it so that it will run comfortably on a laptop. To do this you will need python3 on your machine (3.11 is recommended), also gcc and make so that llama.cpp can be built.

获取 Llama3 模型

¥Getting the Llama3 models

要获取 Llama3 的副本,你需要访问 Meta AI 并请求访问其模型。一旦 Meta AI 授予你访问权限,你将收到一封包含访问文件的唯一 URL 的电子邮件,后续步骤将需要此 URL。现在创建一个工作目录,例如:

¥To get a copy of Llama3 you need to visit Meta AI and request access to their models. Once Meta AI grant you access, you will receive an email containing a unique URL to access the files, this will be needed in the next steps. Now create a directory to work in, for example:

mkdir llama3
cd llama3

现在我们需要转到 Meta AI llama-models 仓库,可以在 此处 中找到。在代码库中,有下载你所选模型的说明,你应该使用电子邮件中收到的唯一 URL。本教程的其余部分假设你已下载 Llama3.1-8B,但此后的任何模型都应该可以运行。下载模型后,请务必保存模型下载路径,以便稍后使用。

¥Now we need to go to the Meta AI llama-models repo, which can be found here. In the repo, there are instructions to download the model of your choice, and you should use the unique URL that was received in your email. The rest of the tutorial assumes that you have downloaded Llama3.1-8B, but any model from here on out should work. Upon downloading the model, make sure to save the model download path, this will be used for later.

转换和量化模型

¥Converting and quantizing the model

在此步骤中,我们需要使用 llama.cpp,因此我们需要下载该代码库。

¥In this step we need to use llama.cpp so we need to download that repo.

cd ..
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp

现在我们需要构建 llama.cpp 工具并设置我们的 python 环境。在这些步骤中,假设你安装的 Python 可以使用 python3 运行,并且虚拟环境可以称为 llama3,请根据你自己的情况进行调整。

¥Now we need to build the llama.cpp tools and set up our python environment. In these steps it's assumed that your install of python can be run using python3 and that the virtual environment can be called llama3, adjust accordingly for your own situation.

cmake -B build
cmake --build build --config Release
python3 -m venv llama3
source llama3/bin/activate

激活 llama3 环境后,你应该会在命令提示符前看到 (llama3) 前缀,以告知你这是活动环境。注意:如果你需要返回构建另一个模型或重新量化模型,请不要忘记再次激活环境。如果你更新 llama.cpp,则需要重建工具,并可能安装新的或更新的依赖!现在我们已经有一个活动的 Python 环境,我们需要安装 Python 依赖。

¥After activating your llama3 environment you should see (llama3) prefixing your command prompt to let you know this is the active environment. Note: if you need to come back to build another model or re-quantize the model don't forget to activate the environment again also if you update llama.cpp you will need to rebuild the tools and possibly install new or updated dependencies! Now that we have an active python environment, we need to install the python dependencies.

python3 -m pip install -r requirements.txt

完成此课程后,我们可以开始转换和量化 Llama3 模型,以便通过 llama.cpp 在本地使用。需要转换为 Hugging Face 模型,然后再转换为 GGUF 模型。首先,我们需要使用以下脚本 convert_llama_weights_to_hf.py 找到路径。将此脚本复制并粘贴到你当前的工作目录中。请注意,使用该脚本可能需要你使用 pip 安装额外的依赖,请根据需要执行此操作。然后,我们需要转换模型。在转换之前,我们需要创建目录来存储 Hugging Face 转换结果和最终模型。

¥Having done this, we can start converting and quantizing the Llama3 model ready for use locally via llama.cpp. A conversion to a Hugging Face model is needed, followed by a conversion to a GGUF model. First, we need to locate the path with the following script convert_llama_weights_to_hf.py. Copy and paste this script into your current working directory. Note that using the script may need you to pip install extra dependencies, do so as needed. Then, we need to convert the model, prior to the conversion let's create directories to store our Hugging Face conversion and our final model.

mkdir models/8B
mkdir models/8B-GGUF
python3 convert_llama_weights_to_hf.py --model_size 8B --input_dir <dir-to-your-model> --output_dir models/8B --llama_version 3
python3 convert_hf_to_gguf.py --outtype f16 --outfile models/8B-GGUF/gguf-llama3-f16.bin models/8B

这会在我们创建的目录中创建一个转换后的 Hugging Face 模型和最终的 GGUF 模型。请注意,这只是一个转换后的模型,因此其大小约为 16GB。下一步,我们将把它量化到 4GB 左右。

¥This should create a converted Hugging Face model and the final GGUF model in the directories we have created. Note that this is just a converted model so it is also around 16Gb in size, in the next step we will quantize it down to around 4Gb.

./build/bin/llama-quantize ./models/8B-GGUF/gguf-llama3-f16.bin ./models/8B-GGUF/gguf-llama3-Q4_0.bin Q4_0

运行此命令将在 models\8B-GGUF 目录中创建一个新模型,名为 gguf-llama3-Q4_0.bin,这是我们可以在 langchain 中使用的模型。你可以使用 llama.cpp 工具进行测试,以验证此模型是否正常工作。

¥Running this should result in a new model being created in the models\8B-GGUF directory, this one called gguf-llama3-Q4_0.bin, this is the model we can use with langchain. You can validate this model is working by testing it using the llama.cpp tools.

./build/bin/llama-cli -m ./models/8B-GGUF/gguf-llama3-Q4_0.bin -cnv -p "You are a helpful assistant"

运行此命令将启动聊天会话模型。顺便说一句,如果你的磁盘空间不足,这个小模型是我们唯一需要的,因此你可以备份和/或删除原始和转换后的 13.5GB 模型。

¥Running this command fires up the model for a chat session. BTW if you are running out of disk space this small model is the only one we need, so you can backup and/or delete the original and converted 13.5Gb models.

用法

¥Usage

import { LlamaCpp } from "@langchain/community/llms/llama_cpp";

const llamaPath = "/Replace/with/path/to/your/model/gguf-llama3-Q4_0.bin";
const question = "Where do Llamas come from?";

const model = await LlamaCpp.initialize({ modelPath: llamaPath });

console.log(`You: ${question}`);
const response = await model.invoke(question);
console.log(`AI : ${response}`);

API Reference:

  • LlamaCpp from @langchain/community/llms/llama_cpp

流式传输

¥Streaming

import { LlamaCpp } from "@langchain/community/llms/llama_cpp";

const llamaPath = "/Replace/with/path/to/your/model/gguf-llama3-Q4_0.bin";

const model = await LlamaCpp.initialize({
modelPath: llamaPath,
temperature: 0.7,
});

const prompt = "Tell me a short story about a happy Llama.";

const stream = await model.stream(prompt);

for await (const chunk of stream) {
console.log(chunk);
}

/*


Once
upon
a
time
,
in
the
rolling
hills
of
Peru
...
*/

API Reference:

  • LlamaCpp from @langchain/community/llms/llama_cpp

¥Related