We provide the code for TaskDiffusion, a novel multi-task dense prediction framework based on diffusion models. Our code is implemented on PASCAL-Context and NYUD-v2 based on ViT.
- TaskDiffusion builds a novel decoder module based on diffusion model that can capture the underlying conditional distribution of the prediction.
- To further unlock the potential of diffusion models in solving multi-task dense predictions, TaskDiffusion introduces a novel joint denoising diffusion process to capture the task relations during denoising.
- Our proposed TaskDiffusion achieves a new state-of-the-art (SOTA) performance with superior efficiency on PASCAL-Context and NYUD-v2.
Please check the paper for more details.
Framework overview of the proposed TaskDiffusion for multi-task scene understanding.
You can use the following command to prepare your environment.
conda create -n taskdiffusion python=3.7
conda activate taskdiffusion
pip install tqdm Pillow==9.5 easydict pyyaml imageio scikit-image tensorboard six
pip install opencv-python==4.7.0.72 setuptools==59.5.0
pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install timm==0.5.4 einops==0.4.1You can download the PASCAL-Context and NYUD-v2 from ATRC's repository in PASCALContext.tar.gz, NYUDv2.tar.gz:
tar xfvz PASCALContext.tar.gztar xfvz NYUDv2.tar.gzAttention: you need to specify the root directory of your own datasets as db_root variable in configs/mypath.py.
You can train your own model by using the following commands. PASCAL-Context:
bash run_TaskDiffusion_pascal.shNYUD-v2
bash run_TaskDiffusion_nyud.shIf you want to train your model based on ViT-Base, you can modify the --config_exp in .sh file.
You can also modify the output directory in ./configs.
The training script itself includes evaluation. For inferring with pre-trained models, you can use the following commands. PASCAL-Context:
bash infer_TaskDiffusion_pascal.shNYUD-v2
bash infer_TaskDiffusion_nyud.shFor the evaluation of boundary, you can use the evaluation tools in this repo following TaskPrompter.
We provide the pretrained models on PASCAL-Context and NYUD-v2.
| Version | Dataset | Download | Depth (RMSE) | Segmentation (mIoU) | Human parsing (mIoU) | Saliency (maxF) | Normals (mErr) | Boundary (odsF) |
|---|---|---|---|---|---|---|---|---|
| TaskDiffusion (ViT-L) | PASCAL-Context | Link (Extraction code: j9u5) | - | 81.21 | 69.62 | 84.94 | 13.55 | 74.89 |
| TaskDiffusion /w MLoRE (ViT-L) | PASCAL-Context | Link (Extraction code: gwhp) | - | 81.58 | 71.30 | 85.05 | 13.43 | 76.07 |
| TaskDiffusion (ViT-B) | PASCAL-Context | Link (Extraction code: xidm) | - | 78.83 | 67.40 | 85.31 | 13.38 | 74.68 |
| TaskDiffusion (ViT-L) | NYUD-v2 | Link (Extraction code: ngfp) | 0.5020 | 55.65 | - | - | 18.43 | 78.64 |
| TaskDiffusion /w MLoRE (ViT-L) | NYUD-v2 | Link (Extraction code: fx2m) | 0.5033 | 56.66 | - | - | 18.13 | 78.89 |
To evaluate the pre-trained models, you can change the --trained_model MODEL_PATH in infer.sh to load the specified model.
If you find our work helpful, please cite: BibTex:
@inproceedings{yangmulti,
title={Multi-Task Dense Predictions via Unleashing the Power of Diffusion},
author={Yang, Yuqi and Jiang, Peng-Tao and Hou, Qibin and Zhang, Hao and Chen, Jinwei and Li, Bo},
booktitle={The Thirteenth International Conference on Learning Representations}
}
If you have any questions, please feel free to contact Me(yangyq2000 AT mail DOT nankai DOT edu DOT cn).
This repository is built upon the nice framework provided by TaskPrompter and InvPT.