正在查看旧版本。 查看 当前版本.

与当前比较 查看页面历史记录

版本 1 下一个 »

前言

大部分的情况下我们可以通过 ollama 来加载其自带的模型,但如果我们所需的模型 ollama 没有就需要我们自己来 finetune 微调后的模型,这里我们选择使用 Llama.cpp 来量化自己的模型为 Ollama 可以运行的格式。

下载模型的方式有多种,国内的环境可以参考D003-节省时间:AI 模型靠谱下载方案汇总 的魔塔下载方式。

下载 phi-2 模型

docker pull python:3.10-slim
docker run -d --name=downloader -v `pwd`:/models python:3.10-slim tail -f /etc/hosts
sed -i 's/snapshot.debian.org/mirrors.tuna.tsinghua.edu.cn/g' /etc/apt/sources.list.d/debian.sources
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
pip install modelscope
pip install huggingface-cli
python -c "from modelscope import snapshot_download;snapshot_download('AI-ModelScope/phi-2', cache_dir='./models/')"

构建新版本的 llama.cpp

配置系统基础环境

sed -i 's/snapshot.debian.org/mirrors.tuna.tsinghua.edu.cn/g' /etc/apt/sources.list.d/debian.sources
sed -i 's/deb.debian.org/mirrors.tuna.tsinghua.edu.cn/g' /etc/apt/sources.list.d/debian.sources
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
apt update

下载并编译 llama.cpp

  • 下载代码
  • 切换工作目录
  • 常规模式构建 llama.cpp
git clone https://github.com/ggerganov/llama.cpp.git --depth=1
cd llama.cpp
cmake -B build
cmake --build build --config Release

# 如果你是 macOS,希望使用 Apple Metal
GGML_NO_METAL=1 cmake --build build --config Release

# 如果你使用 Nvidia GPU
apt install nvidia-cuda-toolkit -y
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release

通过 llama.cpp 转换模型

apt install python3-full
pip install numpy pyyaml torch safetensors  transformers sentencepiece 
./convert_hf_to_gguf.py /data/models/models/AI-ModelScope/phi-2/
./convert_hf_to_gguf.py /data/models/models/LLM-Research/Meta-Llama-3___1-8B-Instruct/

INFO:hf-to-gguf:Loading model: Meta-Llama-3___1-8B-Instruct

INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only

INFO:hf-to-gguf:Exporting model...

INFO:hf-to-gguf:rope_freqs.weight,           torch.float32 --> F32, shape = {64}

INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'

INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00004.safetensors'

INFO:hf-to-gguf:token_embd.weight,           torch.bfloat16 --> F16, shape = {4096, 128256}

INFO:hf-to-gguf:blk.0.attn_norm.weight,      torch.bfloat16 --> F32, shape = {4096}

INFO:hf-to-gguf:blk.0.ffn_down.weight,       torch.bfloat16 --> F16, shape = {14336, 4096}

INFO:hf-to-gguf:blk.0.ffn_gate.weight,       torch.bfloat16 --> F16, shape = {4096, 14336}

INFO:hf-to-gguf:blk.0.ffn_up.weight,         torch.bfloat16 --> F16, shape = {4096, 14336}

INFO:hf-to-gguf:blk.0.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {4096}

INFO:hf-to-gguf:blk.0.attn_k.weight,         torch.bfloat16 --> F16, shape = {4096, 1024}

INFO:hf-to-gguf:blk.0.attn_output.weight,    torch.bfloat16 --> F16, shape = {4096, 4096}

INFO:hf-to-gguf:blk.0.attn_q.weight,         torch.bfloat16 --> F16, shape = {4096, 4096}

INFO:hf-to-gguf:blk.0.attn_v.weight,         torch.bfloat16 --> F16, shape = {4096, 1024}

INFO:hf-to-gguf:blk.1.attn_norm.weight,      torch.bfloat16 --> F32, shape = {4096}

INFO:hf-to-gguf:blk.1.ffn_down.weight,       torch.bfloat16 --> F16, shape = {14336, 4096}

INFO:hf-to-gguf:blk.1.ffn_gate.weight,       torch.bfloat16 --> F16, shape = {4096, 14336}

INFO:hf-to-gguf:blk.1.ffn_up.weight,         torch.bfloat16 --> F16, shape = {4096, 14336}

INFO:hf-to-gguf:blk.1.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {4096}

INFO:hf-to-gguf:blk.1.attn_k.weight,         torch.bfloat16 --> F16, shape = {4096, 1024}

INFO:hf-to-gguf:blk.1.attn_output.weight,    torch.bfloat16 --> F16, shape = {4096, 4096}

INFO:hf-to-gguf:blk.1.attn_q.weight,         torch.bfloat16 --> F16, shape = {4096, 4096}

INFO:hf-to-gguf:blk.1.attn_v.weight,         torch.bfloat16 --> F16, shape = {4096, 1024}

INFO:hf-to-gguf:blk.2.attn_norm.weight,      torch.bfloat16 --> F32, shape = {4096}

INFO:hf-to-gguf:blk.2.ffn_down.weight,       torch.bfloat16 --> F16, shape = {14336, 4096}

INFO:hf-to-gguf:blk.2.ffn_gate.weight,       torch.bfloat16 --> F16, shape = {4096, 14336}

INFO:hf-to-gguf:blk.2.ffn_up.weight,         torch.bfloat16 --> F16, shape = {4096, 14336}

INFO:hf-to-gguf:blk.2.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {4096}

INFO:hf-to-gguf:blk.2.attn_k.weight,         torch.bfloat16 --> F16, shape = {4096, 1024}

INFO:hf-to-gguf:blk.2.attn_output.weight,    torch.bfloat16 --> F16, shape = {4096, 4096}

INFO:hf-to-gguf:blk.2.attn_q.weight,         torch.bfloat16 --> F16, shape = {4096, 4096}

INFO:hf-to-gguf:blk.2.attn_v.weight,         torch.bfloat16 --> F16, shape = {4096, 1024}

INFO:hf-to-gguf:blk.3.attn_norm.weight,      torch.bfloat16 --> F32, shape = {4096}

INFO:hf-to-gguf:blk.3.ffn_down.weight,       torch.bfloat16 --> F16, shape = {14336, 4096}

需要注意:torch 这个组件比较大,接近 700M


  • 无标签