DiTFastAttn：高效压缩扩散变换器模型注意力机制-CSDN博客

本文链接：https://blog.csdn.net/gitblog_00236/article/details/147308166

DiTFastAttn：高效压缩扩散变换器模型注意力机制

DiTFastAttn 项目地址: https://gitcode.com/gh_mirrors/di/DiTFastAttn

项目介绍

DiTFastAttn 是一种针对扩散变换器模型（Diffusion Transformers, DiT）的注意力压缩方法。DiT 模型在图像和视频生成方面表现出色，但由于自注意力机制的二次复杂度，其计算挑战较大。DiTFastAttn 提出了一种新颖的后训练压缩方法，旨在缓解 DiT 的计算瓶颈。

项目技术分析

DiT 模型在推理过程中，自注意力计算存在三种关键冗余：

空间冗余：许多注意力头专注于局部信息。
时间冗余：相邻步骤的注意力输出具有高度相似性。
条件冗余：条件推理和无条件推理之间表现出显著的相似性。

为了解决这些冗余，DiTFastAttn 提出了以下三种技术：

窗口注意力与残差缓存：减少空间冗余。
时间相似性降低：利用步骤之间的相似性。
条件冗余消除：在条件生成过程中跳过冗余计算。

更多详细信息，请参阅相关论文。

项目及技术应用场景

DiTFastAttn 主要应用于以下场景：

图像生成：DiT 模型在生成高分辨率图像时，可以通过 DiTFastAttn 提高推理速度，减少计算资源消耗。
视频生成：DiT 模型在生成视频序列时，DiTFastAttn 可以有效降低时间冗余，提高视频生成的效率。
实时应用：在实时图像和视频处理中，DiTFastAttn 可以帮助减少计算复杂度，实现更快、更高效的实时处理。

项目特点

1. 高效性

DiTFastAttn 通过减少自注意力计算中的冗余，有效降低了 DiT 模型的计算复杂度，提高了推理速度。

2. 灵活性

DiTFastAttn 提供了多种压缩技术，用户可以根据实际应用需求选择合适的技术进行优化。

3. 可扩展性

DiTFastAttn 可以轻松扩展到其他具有自注意力机制的模型，如 GPT、BERT 等。

4. 易用性

DiTFastAttn 提供了简洁的 Python 接口，用户可以轻松集成到现有的 DiT 模型中。

实施与使用

安装

conda create -n difa python=3.10

pip install torch numpy packaging matplotlib scikit-image ninja
pip install git+https://github.com/huggingface/diffusers
pip install thop pytorch_fid torchmetrics accelerate torchmetrics[image] beautifulsoup4 ftfy flash-attn transformers SentencePiece

准备数据集

从 ImageNet 中抽取真实图像到 data/real_images，用于计算 IS 和 FID：

python data/sample_real_images.py <imagenet_path>

如果使用 Pixart，将 coco 数据集放置到 data/mscoco。

使用

所有实验代码均位于 experiments/ 文件夹中。

DiT 压缩

python run_dit.py --n_calib 8 --n_steps 50 --window_size 128 --threshold 0.05 --eval_n_images 5000

PixArt 1k 压缩

python run_pixart.py --n_calib 6 --n_steps 50 --window_size 512 --threshold 0.0725 --eval_n_images 5000

Opensora 压缩

python run_opensora.py --threshold 0.05 --window_size 50 --n_calib 4 --use_cache

注意：在使用 opensora 之前，请根据其 README 安装 opensora。如果遇到问题，请切换到 opensora 的 ea41df3d6cc5f38 提交。

更新与引用

请查看 CHANGELOGS.md 获取更新信息。

引用时，请使用以下格式：

@inproceedings{
  yuan2024ditfastattn,
  title={Di{TF}astAttn: Attention Compression for Diffusion Transformer Models},
  author={Zhihang Yuan and Hanling Zhang and Lu Pu and Xuefei Ning and Linfeng Zhang and Tianchen Zhao and Shengen Yan and Guohao Dai and Yu Wang},
  booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
  year={2024},
  url={https://openreview.net/forum?id=51HQpkQy3t}
}

通过 DiTFastAttn，您可以在保持模型性能的同时，大幅提高 DiT 模型的推理速度，为图像和视频生成领域带来更多可能性。欢迎广大开发者尝试使用 DiTFastAttn，共同推动人工智能技术的发展。

DiTFastAttn 项目地址: https://gitcode.com/gh_mirrors/di/DiTFastAttn

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考