Qwen大模型Lora微调-Windows

帅东

已于 2024-11-23 16:29:02 修改

阅读量1.2k

点赞数 7

文章标签：语言模型 Lora 微调

于 2024-11-23 16:12:46 首次发布

本文链接：https://blog.csdn.net/PROGRAM_anywhere/article/details/143992946

版权

环境要求

python 3.8 and above
pytorch 1.12 and above, 2.0 and above are recommended
transformers 4.32 and above
CUDA 11.4 and above are recommended (this is for GPU users, flash-attention users, etc.)

微调步骤

1. 资源下载

Qwen：https://github.com/QwenLM/Qwen
qwen1_8B模型：https://modelscope.cn/models/Qwen/Qwen-1_8B-Chat
torch：https://download.pytorch.org/whl/torch_stable.html
flash-attention：https://github.com/Dao-AILab/flash-attention/releases/

2. 环境安装

conda create -n qwen python==3.10.1

# 安装torch
pip install "F:\llm\ptorch\torch-2.1.2+cu121-cp310-cp310-win_amd64.whl"

# 依赖
cd F:\github\Qwen
pip install -r requirements.txt

# 模型推理 web依赖包 图形化界面
pip install -r requirements_web_demo.txt

# 直接安装如果有问题，那就手动下载，本地安装
pip install "peft<0.8.0" deepspeed

# 非必须，模型加速，使用上面的连接下载到本地然后安装，手动编译我3个小时没编译完
pip install F:\llm\flash_attn-2.4.1+cu121torch2.1cxx11abiFALSE-cp310-cp310-win_amd64.whl

# 模型
git clone https://www.modelscope.cn/Qwen/Qwen-1_8B-Chat.git

3. 准备微调数据

看官网的微调格式：

[{"id":"identity_0","conversations":[{"from":"user","value":"你好"},{"from":"assistant","value":"我是一个语言模型，我叫通义千问。"}]}]

准备数据如下：
DISC-Law-SFT-Triplet-released-Qwen.json

4. 修改微调参数

单GPU Lora训练
源代码在：Qwen/finetune/finetune_lora_single_gpu.sh
因为要在windows上运行，所以改成.bat文件

set CUDA_DEVICE_MAX_CONNECTIONS=1
set CUDA_VISIBLE_DEVICES=0

python finetune.py ^
  --model_name_or_path F:\github\Qwen-1_8B-Chat ^
  --data_path F:\llm\data\DISC-Law-SFT\DISC-Law-SFT-Triplet-released-Qwen.json ^
  --bf16 True ^
  --output_dir output_qwen_lora\law ^
  --num_train_epochs 1 ^
  --per_device_train_batch_size 8 ^
  --per_device_eval_batch_size 1 ^
  --gradient_accumulation_steps 8 ^
  --evaluation_strategy "no" ^
  --save_strategy "steps" ^
  --save_steps 1000 ^
  --save_total_limit 10 ^
  --learning_rate 3e-4 ^
  --weight_decay 0.1 ^
  --adam_beta2 0.95 ^
  --warmup_ratio 0.01 ^
  --lr_scheduler_type "cosine" ^
  --logging_steps 1 ^
  --report_to "none" ^
  --model_max_length 500 ^
  --lazy_preprocess True ^
  --gradient_checkpointing ^
  --use_lora

参数介绍：
MODEL：模型路径
DATA：自定义数据集路径
output_dir：输出模型路径
num_train_epochs: 设置训练的轮数
model_max_length：模型处理序列长度，根据自身数据定义
per_device_train_batch_size: 训练批处理大小设置
save_steps: 模型每n步保存一次

5. Lora模型训练

.\finetune\finetune_lora_single_gpu.bat

在这里插入图片描述

一共有1.9亿参数，我们要训练的参数有500w，占比2.8%
我们只训练一轮，能看出微调效果就行，不需要实际效果好，默认训练5轮，需要3小时50分钟，设置1轮只需要1个小时17分钟，如果不开启flash-attention，可能还要多花20~30分钟
显存不够的，可以降低一点批训练处理的大小
上面有个警告：sequence length is longer than the specified maximum sequence length for this model (649 > 512)
是训练数据太长了，代码默认会截取，只影响效果，但是扩大长度会导致训练时间变长一倍，所以不用管它

在这里插入图片描述
4060Ti 16G 的显卡基本跑满了

6. 合并模型

使用下面代码进行模型合并

import os
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
def save_model_and_tokenizer(path_to_adapter, new_model_directory):
    if not os.path.exists(path_to_adapter):
        raise FileNotFoundError(f"路径不存在。")
    if not os.path.exists(new_model_directory):
        os.makedirs(new_model_directory, exist_ok=True)
    try:
        model = AutoPeftModelForCausalLM.from_pretrained(
            path_to_adapter,
            device_map="auto",
            trust_remote_code=True
        ).eval()
        merged_model = model.merge_and_unload()
        merged_model.save_pretrained(
            new_model_directory, 
            max_shard_size="2048MB", 
            safe_serialization=True
        )
        tokenizer = AutoTokenizer.from_pretrained(
            path_to_adapter,
            trust_remote_code=True
        )
        save_tokenizer(tokenizer, new_model_directory)
    except Exception as e:
        print(f"{e}")
        raise
def save_tokenizer(tokenizer, directory):
    tokenizer.save_pretrained(directory)
if __name__=="__main__":
    lora_model_path="F:\\github\\Qwen\\output_qwen_lora\\law"
    new_model_directory = "F:\\github\\Qwen\\output_qwen_merge\\Qwen-1_8B-Chat_law_merge"
    save_model_and_tokenizer(lora_model_path, new_model_directory)

python .\qwen_lora_merge.py

7. 验证微调模型

针对性测试，我们直接根据训练数据进行提问

基于下列案件，推测可能的判决结果。\n被告人白某某在大东区小河沿公交车站乘坐被害人张某某驾驶的133路公交车，被告人白某某因未能下车而与司机张某某发生争执，并在该公交车行驶中用手拉拽档杆，被证人韩某某拉开后，被告人白某某又用手拉拽司机张某某的右胳膊，导致该车失控撞向右侧马路边停放的轿车和一个路灯杆，路灯杆折断后将福锅记炖品店的牌匾砸坏。经鉴定，公交车受损价值人民币5,189.9元，轿车受损价值人民币1,449.57元，路灯杆受损价值人民币2,927.15元，福锅记饭店牌匾受损价值人民币9,776元，本案损失价值共计人民币19,342.6元。

（1）老模型运行

 python web_demo.py --server-name 0.0.0.0 -c F:\github\Qwen-1_8B-Chat

打开浏览器：http://localhost:8000/ 进行对话

在这里插入图片描述

（2）新模型运行

python web_demo.py --server-name 0.0.0.0 -c  F:\github\Qwen\output_qwen_merge\Qwen-1_8B-Chat_law_merge

打开浏览器：http://localhost:8000/ 进行对话
在这里插入图片描述
可以看出来回答的格式已经变了，开头都是【根据《xxx》xxx的规定】回答的内容也有点像模像样，不过仔细看，其实规定条款找错了，模型可以多训练几轮，然后模型输入的最大长度调整为700，甚至样本数据再多一点，最后再看效果。

环境问题

运行微调脚本报错

在这里插入图片描述
我的环境和官网有差异，Accelerator这个函数没有dispatch_batches这个参数，手动注释掉
Lib\site-packages\transformers\trainer.py

        self.accelerator = Accelerator(
            #dispatch_batches=self.args.dispatch_batches,
            split_batches=self.args.split_batches,
            deepspeed_plugin=self.args.deepspeed_plugin,
            gradient_accumulation_plugin=gradient_accumulation_plugin,
        )

整体环境配置

torch                         2.1.2+cu121
flash_attn                    2.4.1
deepspeed                     0.15.5+unknown
peft                          0.7.1

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Sep_12_02:55:00_Pacific_Daylight_Time_2024
Cuda compilation tools, release 12.6, V12.6.77
Build cuda_12.6.r12.6/compiler.34841621_0