前言

大部分的情况下我们可以通过 ollama 来加载其自带的模型，但如果我们所需的模型 ollama 没有就需要我们自己来 finetune 微调后的模型，这里我们选择使用 Llama.cpp 来量化自己的模型为 Ollama 可以运行的格式。

下载模型的方式有多种，国内的环境可以参考D003-节省时间：AI 模型靠谱下载方案汇总的魔塔下载方式。

下载 phi-2 模型

docker pull python:3.10-slim
docker run -d --name=downloader -v `pwd`:/models python:3.10-slim tail -f /etc/hosts
sed -i 's/snapshot.debian.org/mirrors.tuna.tsinghua.edu.cn/g' /etc/apt/sources.list.d/debian.sources
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
pip install modelscope
pip install huggingface-cli
python -c "from modelscope import snapshot_download;snapshot_download('AI-ModelScope/phi-2', cache_dir='./models/')"

构建新版本的 llama.cpp

配置系统基础环境

sed -i 's/snapshot.debian.org/mirrors.tuna.tsinghua.edu.cn/g' /etc/apt/sources.list.d/debian.sources
sed -i 's/deb.debian.org/mirrors.tuna.tsinghua.edu.cn/g' /etc/apt/sources.list.d/debian.sources
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
apt update

下载并编译 llama.cpp

下载代码
切换工作目录
常规模式构建 llama.cpp

git clone https://github.com/ggerganov/llama.cpp.git --depth=1
cd llama.cpp
cmake -B build
cmake --build build --config Release

# 如果你是 macOS，希望使用 Apple Metal
GGML_NO_METAL=1 cmake --build build --config Release

# 如果你使用 Nvidia GPU
apt install nvidia-cuda-toolkit -y
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release

通过 llama.cpp 转换模型

apt install python3-full
pip install numpy pyyaml torch safetensors  transformers sentencepiece 
./convert_hf_to_gguf.py /data/models/models/AI-ModelScope/phi-2/
./convert_hf_to_gguf.py /data/models/models/LLM-Research/Meta-Llama-3___1-8B-Instruct/

INFO:hf-to-gguf:Loading model: Meta-Llama-3___1-8B-Instruct
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:rope_freqs.weight, torch.float32 --> F32, shape = {64}
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00004.safetensors'
INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {4096, 128256}
INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.0.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.0.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.0.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.0.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.bfloat16 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.1.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.1.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.1.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.1.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.bfloat16 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.2.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.2.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.2.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.2.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.2.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.3.ffn_down.weight, torch.bfloat16 --> F16, shape = {14336, 4096}

需要注意：torch 这个组件比较大，接近 700M

空间快捷链接

页面树结构

前言

下载 phi-2 模型

构建新版本的 llama.cpp

配置系统基础环境

下载并编译 llama.cpp

通过 llama.cpp 转换模型

空间快捷链接

页面树结构

D004-使用 llama.cpp 转换模型程序

前言

下载 phi-2 模型

构建新版本的 llama.cpp

配置系统基础环境

下载并编译 llama.cpp

通过 llama.cpp 转换模型