window环境 本地部署qwen2.5-vl-3B


部署环境

python 3.9.6
显卡:笔记本4080 12g显存
内存:32g
cpu:i9-14900hx

一、部署过程

1.git clone https://github.com/QwenLM/Qwen2.5-VL  //下载源码,下不了就直接下载压缩包解压缩

2.pip install git+https://github.com/huggingface/transformers accelerate

3.pip install qwen-vl-utils[decord]==0.0.8

4.pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu121  //安装支持 CUDA 的 PyTorch,不然跑模型不使用你的GPU

5.pip install -U gradio gradio_client  //更新Gradio和Gradio Client

6.进入Qwen2.5-VL-main根目录,打开cmd终端运行:
python web_demo_mm.py --checkpoint-path "Qwen/Qwen2.5-VL-3B-Instruct"
   
下载完模型后浏览器打开http://localhost:7860就能访问到WebUi界面:

在这里插入图片描述

二、注意的地方

1.部署过程中可能发生很多依赖冲突或者找不到模块错误,只能缺什么安装什么,或者降版本,下面贴上我的pip list以供参考:

accelerate          1.0.1
aiofiles            23.2.1
annotated-types     0.7.0
anyio               4.6.2
av                  12.3.0
certifi             2025.1.31
charset-normalizer  3.4.1
click               8.1.8
colorama            0.4.6
contourpy           1.1.1
cycler              0.12.1
decord              0.6.0
exceptiongroup      1.2.2
fastapi             0.115.11
ffmpy               0.5.0
filelock            3.16.1
fonttools           4.56.0
fsspec              2025.3.0
gradio              4.44.1
gradio_client       1.3.0
h11                 0.14.0
httpcore            1.0.7
httpx               0.28.1
huggingface-hub     0.29.3
idna                3.10
importlib_metadata  8.5.0
importlib_resources 6.4.5
intel-openmp        2021.4.0
Jinja2              3.1.6
joblib              1.4.2
kiwisolver          1.4.7
markdown-it-py      3.0.0
MarkupSafe          2.1.5
matplotlib          3.9.4
mdurl               0.1.2
mkl                 2021.4.0
mpmath              1.3.0
networkx            3.1
nltk                3.9.1
numpy               1.26.4
orjson              3.10.10
packaging           24.2
pandas              2.2.3
Pillow              9.5.0
pip                 25.0.1
psutil              7.0.0
pydantic            2.6.2
pydantic_core       2.16.3
pydub               0.25.1
pygame              2.6.1
Pygments            2.19.1
pyparsing           3.1.4
python-dateutil     2.9.0.post0
python-multipart    0.0.20
pytz                2025.1
PyYAML              6.0.2
qwen-vl-utils       0.0.8
regex               2024.5.15
requests            2.32.3
rich                13.9.4
ruff                0.11.0
sacremoses          0.1.1
safetensors         0.5.3
semantic-version    2.10.0
setuptools          75.3.2
setuptools-rust     1.10.2
shellingham         1.5.4
six                 1.17.0
sniffio             1.3.1
starlette           0.44.0
sympy               1.13.3
tbb                 2021.13.1
tokenizers          0.21.1
tomlkit             0.12.0
torch               2.3.0+cu121
torchaudio          2.3.0+cu121
torchvision         0.18.0+cu121
tqdm                4.67.1
transformers        4.50.0.dev0
typer               0.15.2
typing_extensions   4.12.2
tzdata              2025.1
urllib3             2.2.3
uvicorn             0.33.0
websockets          12.0
wheel               0.45.1
zipp                3.20.2

2.如果Huggingface连不上去,可以编辑web_demo_mm.py,在头部加上:

import os
os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"

3.如果运行后发生queue.Empty错误,可以打开web_demo_mm.py设置超时时间长一点:

streamer = TextIteratorStreamer(tokenizer, timeout=300.0, skip_prompt=True, skip_special_tokens=True)

最试运行后发现我的电脑运行3B模型都有点吃力,一次回答需要几十秒,分辨率大的图片会直接爆显存,但个人玩玩还是没问题的~

### Qwen2.5-VL Model Multi-GPU Deployment Guide For deploying the Qwen2.5-VL model across multiple GPUs, one can leverage frameworks and libraries designed to facilitate distributed deep learning tasks efficiently[^1]. The PyTorch framework offers robust support for multi-GPU operations through its `torch.nn.DataParallel` or more advanced modules like `torch.distributed`. For optimal performance with large models such as Qwen2.5-VL, using `DeepSpeed`, a Microsoft library that enhances training efficiency on multiple GPUs by optimizing memory usage and communication overhead, is recommended. Below demonstrates how to deploy Qwen2.5-VL utilizing DeepSpeed: #### Environment Setup Ensure all necessary dependencies are installed within an environment suitable for running deep learning applications. This includes installing specific versions of CUDA, cuDNN compatible with your GPU hardware alongside Python packages required by both PyTorch and DeepSpeed. ```bash pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117 pip install deepspeed transformers ``` #### Code Implementation Example Here’s a simplified example showing key aspects when setting up Qwen2.5-VL for multi-GPU operation via DeepSpeed configuration file (`ds_config.json`) and script execution command line options. ##### ds_config.json Configuration File Create this JSON formatted configuration file which specifies parameters related to zero redundancy optimizer (ZeRO), gradient accumulation steps among others critical settings needed during initialization phase before starting actual inference/training process. ```json { "fp16": { "enabled": true, "loss_scale": 0, "initial_scale_power": 16, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 }, "optimizer": { "type": "AdamW", "params": { "lr": 0.00005, "betas": [ 0.9, 0.999 ], "eps": 1e-8, "weight_decay": 3e-7 } }, "zero_optimization": { "stage": 2, "allgather_partitions": true, "allgather_bucket_size": 2e8, "reduce_scatter": true, "reduce_bucket_size": 2e8, "overlap_comm": true, "load_from_fp32_weights": true, "cpu_offload": false }, "gradient_accumulation_steps": 1, "steps_per_print": 2000, "wall_clock_breakdown": false } ``` ##### Running Inference Script With DeepSpeed Command Line Options Use the following commands to start the application while specifying paths pointing towards previously created configurations along with other essential arguments depending upon particular use case requirements. ```bash deepspeed run_inference_script.py \ --model_name_or_path path_to_qwen_2_5_vl_model \ --config_file ./ds_config.json \ --include tokenization_args dataset_loading_params etc. ``` This setup allows leveraging multiple GPUs effectively not only improving throughput but also reducing latency associated with processing complex queries against multimodal datasets supported natively under Qwen architecture design principles outlined in relevant documentation resources available online including tutorials listed elsewhere[^2].
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值