A comprehensive and structured list of research papers about Large-Language-Diffusion-Models (dLLMs).
Last major update: April 2026 — added latest 2026 arxiv works, NeurIPS 2025 spotlights, ICLR 2026 papers, frontier-scale dLLMs (LLaDA2.0/2.1, Dream-VLA, etc.), and a new section for Agentic / Tool-Use behavior.
- Surveys & Useful Resources
- Core Methodologies
- Reasoning & Policy Optimization
- Token Ordering & Generation Strategies
- System Efficiency & Acceleration
- Multi-modal & Physical AI
- Agentic & Tool-Use dLLMs
- Theory, Guidance & Applications
- Seminal Diffusion Papers
- Gemini Diffusion
- Mercury (Inception Labs)
- Dream-7B
- DreamOn
- LLaDA2.X (InclusionAI / Ant Group)
- What are Diffusion Language Models? (Lilian Weng)
- Generative Modeling by Estimating Gradients (Yang Song)
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| Diffusion Models for Non-autoregressive Text Generation: A Survey | 2023.03 | IJCAI | Early NAR-text survey |
| A Survey of Diffusion Models in NLP | 2023.05 | Arxiv | Early NLP survey |
| Discrete Diffusion in Large Language and Multimodal Models: A Survey | 2025.06 | Arxiv | dLLM + dMLLM |
| Diffusion-based Large Language Models Survey | 2025.08 | TechRxiv | - |
| A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models | 2025.08 | Arxiv | - |
| A Survey on Diffusion Language Models | 2025.08 | Arxiv | VILA-Lab; comprehensive |
| Efficient Diffusion Language Models: A Comprehensive Survey | 2026.01 | Efficiency-focused | |
| Top 10 Open Challenges Steering the Future of Diffusion Language Model and Its Variants | 2026.01 | Arxiv | Perspective / roadmap |
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| Scaling Diffusion Language Models via Adaptation from Autoregressive Models (DiffuLLM) | 2024.10 | ICLR | >7B, GPT2/LLaMA2 Adaptation |
| Large Language Models to Diffusion Finetuning | 2025.01 | ICML | >7B |
| TESS 2: A Large-Scale Generalist Diffusion Language Model | 2025.02 | ACL | >7B, Adapted from Mistral |
| SDAR: A Synergistic Diffusion-AutoRegression Paradigm | 2025.10 | Arxiv | >7B, Qwen3-based BD |
| From Next-Token to Next-Block: Principled Adaptation Path | 2025.11 | Arxiv | >7B, Adaptation Path |
| Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed | 2025.12 | Arxiv | >7B |
| LLaDA2.0: Scaling Up Diffusion Language Models to 100B | 2025.12 | Arxiv | >7B, AR→dLLM at 100B |
A new section: hybrids that interleave block-level AR with intra-block diffusion, or "forcing" approaches that retain causal masks for KV-cache reuse.
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| Beyond Autoregression: Fast LLMs via Self-Distillation Through Time | 2024.10 | ICLR | <7B, Distillation |
| Sparse-dLLM: Accelerating Diffusion LLMs with Dynamic Cache Eviction | 2025.08 | Arxiv | >7B, Sparsity |
| DLLMQuant: Quantizing Diffusion-based Large Language Models | 2025.08 | Arxiv | >7B, Quantization |
| Quantization Meets dLLMs: Post-training Quantization Study | 2025.08 | Arxiv | >7B, Quantization |
| FS-DFM: Few-Step Diffusion Language Model | 2025.09 | Arxiv | >7B |
| SparseD: Sparse Attention for Diffusion Language Models | 2025.09 | Arxiv | >7B, Sparsity |
| LLaDA-MoE: A Sparse MoE Diffusion Language Model | 2025.09 | Arxiv | >7B, MoE |
| Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct | 2025.10 | Arxiv | >7B, Distillation |
| CDLM: Consistency Diffusion Language Models For Faster Sampling | 2025.11 | Arxiv | >7B, Consistency |
New section: production-grade frameworks and runtime engineering for dLLMs.
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| dInfer: An Efficient Inference Framework for Diffusion Language Models | 2025.10 | Arxiv | Modular framework, >1100 TPS |
| JetEngine (SDAR) | 2025.10 | Repo | Lightweight engine for SDAR (3700+ TPS on H200) |
| Mercury: Ultra-Fast Language Models Based on Diffusion | 2025.06 | Arxiv | Inception Labs commercial dLLM |
| Seed Diffusion: Large-Scale dLLM with High-Speed Inference | 2025.08 | Arxiv | ByteDance code-focused dLLM |
Scope note: this section covers VLA models that use a diffusion/masked-diffusion language model as the backbone (dVLM-based VLA) or apply discrete diffusion as the action-decoding mechanism (not continuous diffusion action heads grafted onto an AR VLM). Pure continuous-diffusion-policy VLAs such as DiVLA (Wen et al., 2024), HybridVLA, and ProgressVLA are intentionally excluded because their language model is autoregressive — only the action head is diffusion-based.
(a) dVLM-backbone VLA — language backbone itself is a diffusion language model.
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| LLaDA-VLA: Vision Language Diffusion Action Models | 2025.06 | Arxiv | First LLaDA(d-VLM)-based VLA |
| dVLA: Diffusion VLA with Multimodal Chain-of-Thought | 2025.09 | Arxiv | dLLM backbone + multimodal CoT |
| Dream-VLA: Open Vision-Language-Action Model with Diffusion Backbone | 2025.12 | Arxiv | dVLA from Dream-7B; first dLLM pretrained VLA |
| MMaDA-VLA: Large Diffusion VLA with Unified Multi-Modal Instruction and Generation | 2026.03 | Arxiv | Native discrete-diffusion VLA from MMaDA |
(b) Discrete-diffusion action decoding — language backbone may still be AR-VLM, but action chunks are decoded via discrete diffusion. Closely tied to dLLM literature for inference techniques.
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| Discrete Diffusion VLA: Action Decoding in VLA Policies | 2025.08 | Arxiv | Unified-transformer + discrete-diffusion actions |
| E0: Enhancing Generalization and Fine-Grained Control in VLA Models via Tweedie Discrete Diffusion | 2025.11 | Arxiv | AR-VLM backbone + Tweedie discrete diffusion on action tokens |
Scope note: works that apply discrete diffusion / masked-diffusion language modeling to driving trajectories, action codebooks, or tokenized world states. Continuous trajectory-diffusion planners (e.g., classical Diffusion Policy applied to driving) are out of scope.
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion | 2023.11 | ICLR | Discrete diffusion on tokenized point-cloud world model |
| ReflectDrive: Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving | 2025.09 | Arxiv | dLLM finetuned on discretized 2D driving space |
| Efficient and Explainable End-to-End Autonomous Driving via Masked Vision-Language-Action Diffusion | 2026.02 | Arxiv | Discrete action codebook + masked diffusion |
New section — emerging line: how dLLMs behave as agents (planning, multi-turn, tool calling). Critical for connecting dLLMs to robotics and physical-AI agent stacks.
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| The Bitter Lesson of Diffusion Language Models for Agentic Workflows: A Comprehensive Reality Check | 2026.01 | Arxiv | Embodied + tool-call eval |
| Agents of Diffusion: Enhancing Diffusion Language Models with Multi-Agent Reinforcement Learning for Structured Data Generation | 2026.01 | Arxiv | Multi-agent RL |
| DLLM Agent: See Farther, Run Faster | 2026.02 | Arxiv | dLLM-as-agent comparison |
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| Deep Unsupervised Learning using Nonequilibrium Thermodynamics | 2015.03 | ICML | Formulation |
| Denoising Diffusion Probabilistic Models (DDPM) | 2020.06 | NeurIPS | - |
| Denoising Diffusion Implicit Models (DDIM) | 2020.10 | ICLR | - |
| Score-Based Generative Modeling through SDEs | 2020.11 | ICLR | - |
| Diffusion Models Beat GANs on Image Synthesis | 2021.05 | NeurIPS | CG |
| Structured Denoising Diffusion in Discrete State-Spaces (D3PM) | 2021.07 | NeurIPS | Discrete |
| Vector Quantized Diffusion Model (VQ-Diffusion) | 2021.11 | CVPR | VQ |
| High-Resolution Image Synthesis with Latent Diffusion (LDM) | 2021.12 | CVPR | - |
| Progressive Distillation for Fast Sampling | 2022.02 | ICLR | Distillation |
| DPM-Solver: Fast ODE Solver for Sampling | 2022.06 | NeurIPS | - |
| Classifier-Free Diffusion Guidance | 2022.07 | NeurIPS | CFG |
| Analog Bits: Generating Discrete Data using Diffusion | 2022.08 | ICLR | Self-conditioning |
| Scalable Diffusion Models with Transformers (DiT) | 2022.12 | ICCV | Scalable focus |
| Consistency Models | 2023.03 | ICML | - |
- Maintainers: jake630@snu.ac.kr / wjk9904@snu.ac.kr
- Contributions via Pull Requests are welcome!