Skip to content

AIDASLab/Awesome-Diffusion-LLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

99 Commits
 
 

Repository files navigation

Awesome-Large-Language-Diffusion-Models

Awesome Maintained

A comprehensive and structured list of research papers about Large-Language-Diffusion-Models (dLLMs).

Last major update: April 2026 — added latest 2026 arxiv works, NeurIPS 2025 spotlights, ICLR 2026 papers, frontier-scale dLLMs (LLaDA2.0/2.1, Dream-VLA, etc.), and a new section for Agentic / Tool-Use behavior.


⚙️ Framework (Taxonomy)

  1. Surveys & Useful Resources
  2. Core Methodologies
  3. Reasoning & Policy Optimization
  4. Token Ordering & Generation Strategies
  5. System Efficiency & Acceleration
  6. Multi-modal & Physical AI
  7. Agentic & Tool-Use dLLMs
  8. Theory, Guidance & Applications
  9. Seminal Diffusion Papers

1. Surveys & Useful Resources

📚 Blogs & Reports

📝 Survey & Perspective Papers

Paper Title Year Venue Remark
Diffusion Models for Non-autoregressive Text Generation: A Survey 2023.03 IJCAI Early NAR-text survey
A Survey of Diffusion Models in NLP 2023.05 Arxiv Early NLP survey
Discrete Diffusion in Large Language and Multimodal Models: A Survey 2025.06 Arxiv dLLM + dMLLM
Diffusion-based Large Language Models Survey 2025.08 TechRxiv -
A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models 2025.08 Arxiv -
A Survey on Diffusion Language Models 2025.08 Arxiv VILA-Lab; comprehensive
Efficient Diffusion Language Models: A Comprehensive Survey 2026.01 Efficiency-focused
Top 10 Open Challenges Steering the Future of Diffusion Language Model and Its Variants 2026.01 Arxiv Perspective / roadmap

2. Core Methodologies

2.1 Discrete & Masked Diffusion

Paper Title Year Venue Remark
DiffusER: Discrete Diffusion via Edit-based Reconstruction 2022.10 ICLR <7B
SSD-LM: Semi-autoregressive Simplex-based Diffusion for Modular Control 2022.10 ACL <7B, Simplex
DiffusionBERT: Improving Generative Masked Language Models 2022.11 ACL <7B, Masked
A Reparameterized Discrete Diffusion Model for Text Generation 2023.02 COLM <7B
David helps Goliath: Inference-Time Collaboration Between Small and Large Diffusion LMs 2023.05 NAACL >7B, Scale-collaboration
TESS: Text-to-Text Self-Conditioned Simplex Diffusion 2023.05 EACL <7B, Simplex
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning 2023.08 Arxiv >7B, Scaling
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution (SEDD) 2023.10 ICML <7B, Discrete
Simplified and Generalized Masked Diffusion for Discrete Data (MD4) 2024.06 NeurIPS -
Simple and Effective Masked Diffusion Language Models (MDLM) 2024.06 NeurIPS <7B, Masked
Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data (RADD) 2024.06 ICLR <7B, Masked
Scaling up Masked Diffusion Models on Text (SMDM) 2024.10 ICLR <7B, 1.1B Scaling
Energy-Based Diffusion Language Models for Text Generation (EDLM) 2024.10 ICLR <7B
Conditional MASK Discrete Diffusion Language Model 2024.11 EMNLP <7B
Non-Markovian Discrete Diffusion with Causal Language Models 2025.02 NeurIPS <7B
Large Language Diffusion Models (LLaDA) 2025.02 NeurIPS >7B, LLaDA-8B
Anchored Diffusion Language Model (ADLM) 2025.05 NeurIPS >7B; ANELBO objective
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs 2025.06 Arxiv >7B, Context Scaling
Esoteric Language Models (Eso-LMs) 2025.06 Arxiv AR + MDM hybrid
Dream 7B: Diffusion Large Language Models 2025.08 Arxiv >7B, Dream-7B
Sequential Diffusion Language Models 2025.09 Arxiv >7B
LLaDA-MoE: A Sparse MoE Diffusion Language Model 2025.09 Arxiv >7B, 7B-A1B MoE from scratch
UltraLLaDA: Scaling Context to 128K 2025.10 Arxiv >7B, Context Scaling
Next Semantic Scale Prediction via Hierarchical Diffusion Language Models 2025.10 NeurIPS -
Masked Diffusion Models as Energy Minimization 2025.10 NeurIPS <7B
Soft-Masked Diffusion Language Models 2025.10 Arxiv <7B
Variational Masked Diffusion Models 2025.10 Arxiv <7B
Diffusion LLM with Native Variable Generation Lengths: Let [EOS] Lead the Way 2025.10 Arxiv >7B, Variable Length
Diffusion Language Models are Super Data Learners 2025.11 Arxiv Data efficiency
DiffuMamba: High-Throughput Diffusion LMs with Mamba Backbone 2025.11 Arxiv Non-Transformer Backbone
TiDAR: Think in Diffusion, Talk in Autoregression 2025.11 Arxiv >7B
C2DLM: Causal Concept-Guided Diffusion Large Language Models 2025.11 Arxiv >7B
Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models 2026.01 ACL Soft tokens, Masked
LLaDA2.0: Scaling Up Diffusion Language Models to 100B 2025.12 Arxiv >100B, MoE; Ant Group
LLaDA2.1: Speeding Up Text Diffusion via Token Editing 2026.02 Arxiv Editable State Evolution
Introspective Diffusion Language Models (I-DLM) 2026.04 Arxiv Introspective consistency

2.2 Continuous & Latent Space Diffusion

Paper Title Year Venue Remark
Diffusion-LM Improves Controllable Text Generation 2022.05 NeurIPS <7B, Embedding
DiffuSeq: Sequence to Sequence Text Generation 2022.10 ICLR <7B, Embedding
Latent Diffusion for Language Generation 2022.12 NeurIPS <7B, Latent
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning 2022.12 NAACL <7B
Empowering Diffusion Models on the Embedding Space for Text Generation 2022.12 NAACL <7B, Embedding
Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise 2022.12 ICML <7B, Embedding
DINOISER: Diffused Conditional Sequence Learning by Manipulating Noises 2023.02 TACL <7B, Embedding
Likelihood-Based Diffusion Language Models (Plaid) 2023.05 NeurIPS <7B, Plaid 1B
PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model 2023.06 NeurIPS <7B, Latent
Edit Flows: Flow Matching with Edit Operations 2025.06 Arxiv -
Coevolutionary Continuous Discrete Diffusion: Latent Reasoner 2025.10 Arxiv >7B; CCDD

2.3 AR-to-Diffusion Adaptation

Paper Title Year Venue Remark
Scaling Diffusion Language Models via Adaptation from Autoregressive Models (DiffuLLM) 2024.10 ICLR >7B, GPT2/LLaMA2 Adaptation
Large Language Models to Diffusion Finetuning 2025.01 ICML >7B
TESS 2: A Large-Scale Generalist Diffusion Language Model 2025.02 ACL >7B, Adapted from Mistral
SDAR: A Synergistic Diffusion-AutoRegression Paradigm 2025.10 Arxiv >7B, Qwen3-based BD
From Next-Token to Next-Block: Principled Adaptation Path 2025.11 Arxiv >7B, Adaptation Path
Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed 2025.12 Arxiv >7B
LLaDA2.0: Scaling Up Diffusion Language Models to 100B 2025.12 Arxiv >7B, AR→dLLM at 100B

2.4 Hybrid AR-Diffusion (Block / Forcing)

A new section: hybrids that interleave block-level AR with intra-block diffusion, or "forcing" approaches that retain causal masks for KV-cache reuse.

Paper Title Year Venue Remark
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models (BD3-LM) 2025.03 ICLR <7B, Interpolation
D2F: Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing 2025.08 ICLR >7B, Faster-than-AR
Blockwise SFT for Diffusion Language Models: Reconciling Bidirectional Attention and Autoregressive Decoding 2025.08 Arxiv >7B
SDAR: Synergistic Diffusion-AutoRegression Paradigm 2025.10 Arxiv >7B, Block hybrid
Encoder-Decoder Block Diffusion Language Models for Efficient Training and Inference (E2D2) 2025.10 NeurIPS Block Enc-Dec
Fast-dLLM v2: Efficient Block-Diffusion LLM 2025.09 Arxiv >7B, Block Decoding
WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention for Fast Inference 2025.12 Arxiv Causal-attn diffusion
ReFusion: Diffusion LLM with Parallel Autoregressive Decoding 2025.12 Arxiv Slot-level interleaving
Swordsman: Entropy-Driven Adaptive Block Partition for Efficient Diffusion Language Models 2026.02 Arxiv Adaptive block
DFlash: Block Diffusion for Flash Speculative Decoding 2026.02 Arxiv Block + speculative

3. Reasoning & Policy Optimization

3.1 Reasoning & Planning

Paper Title Year Venue Remark
Diffusion of Thought: Chain-of-Thought Reasoning in dLLMs 2024.02 NeurIPS <7B, CoT Foundation
Beyond Autoregression: Discrete Diffusion for Complex Reasoning 2024.10 ICLR <7B
Tree Reward-Aligned Search for TReASURe in Masked Diffusion Language Models 2024.10 Arxiv Planning
d1: Scaling Reasoning in dLLMs via RL 2025.04 NeurIPS >7B, Reasoning scaling
Reinforcing the Diffusion Chain of Lateral Thought 2025.05 NeurIPS >7B
Thinking Inside the Mask: In-Place Prompting in dLLMs 2025.08 Arxiv >7B
Reinforced Context Order Recovery for Adaptive Reasoning 2025.08 Arxiv <7B, Planning
d2: Improved Techniques for Training Reasoning dLLMs 2025.09 Arxiv >7B
LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning 2025.10 Arxiv >7B
Beyond Surface Reasoning: Unveiling Long CoT Capacity 2025.10 Arxiv >7B
Coevolutionary Continuous Discrete Diffusion: Latent Reasoner 2025.10 Arxiv >7B
On the Reasoning Abilities of Masked Diffusion Language Models 2025.10 Arxiv >7B
Planner and Executor: Collaboration between Discrete Diffusion And Autoregressive Models in Reasoning 2025.10 Arxiv Collaboration
Diffuse Thinking: Exploring Diffusion Language Models as Efficient Thought Proposers for Reasoning 2025.10 Arxiv >7B

3.2 Alignment & Reinforcement Learning

Paper Title Year Venue Remark
Preference-Based Alignment of Discrete Diffusion Models 2025.03 Arxiv >7B
DiFFPO: Training dLLMs to Reason Fast and Furious via RL 2025.05 Arxiv >7B, Direct Preference
LLaDA 1.5: Variance-Reduced Preference Optimization 2025.05 Arxiv >7B
wd1: Weighted Policy Optimization for Reasoning 2025.07 Arxiv >7B
Where to Start Alignment? Diffusion Large Language Model May Demand a Distinct Position 2025.08 Arxiv >7B, Safety
Jailbreaking Large Language Diffusion Models: Revealing Hidden Safety Flaws in Diffusion-Based Text Generation 2025.07 Arxiv Safety
The Devil behind the mask: An emergent safety vulnerability 2025.07 Arxiv Safety
MDPO: Overcoming the Training-Inference Divide 2025.08 Arxiv >7B
Reward-Weighted Sampling: Enhancing Non-Autoregressive Characteristics in Masked Diffusion LLMs 2025.08 EMNLP >7B
Inpainting-Guided Policy Optimization for dLLMs 2025.09 Arxiv >7B
Taming Masked Diffusion via Consistency Trajectory RL 2025.09 Arxiv >7B
TR2-D2: Tree Search Guided Trajectory-Aware Fine-Tuning 2025.09 Arxiv >7B
Revolutionizing RL Framework for Diffusion Large Language Models 2025.09 Arxiv >7B
A2D: Any-Order, Any-Step Safety Alignment for Diffusion Language Models 2025.09 Arxiv Safety
DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models 2025.09 Arxiv Safety
RFG: Test-Time Scaling for Diffusion Large Language Model Reasoning with Reward-Free Guidance 2025.09 Arxiv >7B
AGRPO: Simple Policy Gradients for Reasoning with Diffusion Language Models 2025.10 Arxiv >7B
Improving Reasoning via Group Diffusion Policy Optimization (GDPO) 2025.10 Arxiv >7B
Step-Aware Policy Optimization for Reasoning 2025.10 Arxiv >7B
MRO: Enhancing Reasoning via Multi-Reward Optimization 2025.10 Arxiv >7B
Enhancing Reasoning via Distribution Matching Policy Optimization 2025.10 Arxiv >7B
Boundary-Guided Policy Optimization for Memory-efficient RL 2025.10 Arxiv >7B
SPG: Sandwiched Policy Gradient for Masked Diffusion 2025.10 Arxiv >7B
Improving Discrete Diffusion Unmasking Policies Beyond Explicit Reference Policies 2025.10 Arxiv >7B
Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States 2025.10 Arxiv >7B
Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective 2025.12 Arxiv >7B
d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Models 2025.12 Arxiv >7B
DARE: Diffusion Large Language Models Alignment and Reinforcement Executor 2026.04 Arxiv Unified RL framework
DiRL: An Efficient Post-Training Framework for Diffusion Language Models 2025.12 Arxiv Post-training
Efficient and Stable Reinforcement Learning for Diffusion Language Models 2026.02 Arxiv Variance reduction
Agents of Diffusion: Enhancing Diffusion Language Models with Multi-Agent Reinforcement Learning for Structured Data Generation 2026.01 Arxiv Multi-agent RL

4. Token Ordering & Generation Strategies

Paper Title Year Venue Remark
SSD-LM: Semi-autoregressive Simplex-based Diffusion for Modular Control 2022.10 ACL <7B, Blockwise
AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation 2023.05 NeurIPS <7B, AR-like noise
Train for the Worst, Plan for the Best: Understanding Token Ordering 2025.02 ICML <7B, Ordering Analysis
Block Diffusion: Interpolating Between Autoregressive and Diffusion LMs 2025.03 ICLR <7B, Interpolation
Review, Remask, Refine (R3): Process-Guided Block Diffusion 2025.07 ICML MOSS >7B, Block-wise
Any-Order Flexible Length Masked Diffusion 2025.09 Arxiv <7B, Order Flexibility
Don't Settle Too Early: Self-Reflective Remasking for Diffusion Language Models 2025.09 Arxiv >7B, Remasking
Don't Let It Fade: Preserving Edits via Token Timestep Allocation 2025.10 NeurIPS <7B, Edit preservation
Finish First, Perfect Later: Test-Time Token-Level Cross-Validation for Diffusion Large Language Models 2025.10 Arxiv >7B, Unmasking
Improving Discrete Diffusion Unmasking Policies Beyond Explicit Reference Policies 2025.10 Arxiv >7B, Unmasking
Parallel Sampling from Masked Diffusion Models via Conditional Independence Testing 2025.10 Arxiv >7B, Unmasking
Diffusion Language Model Inference with Monte Carlo Tree Search 2025.12 Arxiv >7B, MCTS
Optimizing Decoding Paths in Masked Diffusion Models by Quantifying Uncertainty 2025.12 Arxiv >7B, Unmasking
Adaptation to Intrinsic Dependence in Diffusion Language Models 2026.02 Arxiv Distribution-agnostic schedule
Efficient Self-Evaluation for Diffusion Language Models via Sequence Regeneration 2026.03 ACL Self-evaluation, Flexible length
D5P4: Partition Determinantal Point Process for Diversity in Parallel Discrete Diffusion Decoding 2026.03 Arxiv Diversity-aware decoding

5. System Efficiency & Acceleration

5.1 Caching & Memory Strategy

Paper Title Year Venue Remark
dKV-Cache: The Cache for Diffusion Language Models 2025.05 NeurIPS >7B
FlashDLM: Accelerating Diffusion Language Model Inference via Efficient KV Caching and Guided Diffusion 2025.05 Arxiv >7B
Fast-dLLM: Training-free Acceleration via KV Cache + Parallel Decoding 2025.05 Arxiv NVIDIA; KV cache + parallel
dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching 2025.06 Arxiv >7B
d^2Cache: Accelerating via Dual Adaptive Caching 2025.09 Arxiv >7B
Attention Is All You Need for KV Cache in dLLMs 2025.10 Arxiv >7B
Attention Sinks in Diffusion Language Models 2025.10 Arxiv >7B
WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention for Fast Inference 2025.12 Arxiv >7B, Causal cache
Stop the Flip-Flop: Context-Preserving Verification for Fast Revocable Diffusion Decoding (COVER) 2026.02 Arxiv KV-override verification
Focus-dLLM: Accelerating Long-Context Diffusion LLM Inference via Confidence-Guided Context Focusing 2026.02 Arxiv Long-context sparsity
Mosaic: Unlocking Long-Context Inference for Diffusion LLMs via Global Memory Planning and Dynamic Peak Taming 2026.01 Arxiv Long-context memory
Residual Context Diffusion Language Models 2026.01 Arxiv Recycle discarded tokens

5.2 Decoding & Sampling

Paper Title Year Venue Remark
Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion 2024.08 NAACL <7B, Speculative Decoding
Wide-In, Narrow-Out: Revokable Decoding for Effective dLLMs 2025.07 Arxiv >7B
Accelerating Diffusion LLMs via Adaptive Parallel Decoding (APD) 2025.05 NeurIPS >7B
DLM-One: Diffusion Language Models for One-Step Generation 2025.06 Arxiv <7B
Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles 2025.06 Arxiv >7B
Plan for Speed: Dilated Scheduling for Masked Diffusion Language Models 2025.06 Arxiv >7B
Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models 2025.08 Arxiv >7B
DPad: Efficient Diffusion Language Models with Suffix Dropout 2025.08 Arxiv >7B
Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning 2025.09 NeurIPS >7B
AdaBlock-dLLM: Semantic-Aware Inference via Adaptive Block Size 2025.09 Arxiv >7B
dParallel: Learnable Parallel Decoding for dLLMs 2025.09 Arxiv >7B
Learning to Parallel: Accelerating dLLMs via Learnable Parallel Decoding 2025.09 Arxiv >7B
Spiffy: Multiplying Acceleration via Lossless Speculative Decoding 2025.09 Arxiv >7B, Speculative Decoding
DiffuSpec: Unlocking dLLMs for Speculative Decoding 2025.09 Arxiv >7B, Speculative Decoding
Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall 2025.10 Arxiv >7B
Saber: Efficient Sampling with Backtracking Enhanced Remasking 2025.10 Arxiv >7B
CreditDecoding: Parallel Decoding with Trace Credits 2025.10 Arxiv >7B
Accelerating dLLM Inference via Local Determinism Propagation 2025.10 Arxiv >7B
Self Speculative Decoding for Diffusion Large Language Models 2025.10 Arxiv >7B, Speculative Decoding
SpecDiff-2: Scaling Diffusion Drafter Alignment 2025.11 Arxiv >7B, Speculative Decoding
Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models 2025.11 Arxiv >7B
Beyond Confidence: Adaptive and Coherent Decoding for Diffusion Language Models 2025.11 Arxiv >7B
Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs 2025.12 Arxiv >7B, Speculative Decoding
Fast-Decoding via Progress-Aware Confidence Schedules 2025.12 Arxiv >7B
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding 2025.12 Arxiv >7B
Context-Aware Initialization for Reducing Generative Path Length in Diffusion Language Models 2025.12 Arxiv >7B
DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference 2026.01 Arxiv Speculative drafting
DFlash: Block Diffusion for Flash Speculative Decoding 2026.02 Arxiv Block + speculative
Swordsman: Entropy-Driven Adaptive Block Partition for Efficient Diffusion Language Models 2026.02 Arxiv Entropy-adaptive blocks

5.3 Distillation, Quantization & Sparsity

Paper Title Year Venue Remark
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time 2024.10 ICLR <7B, Distillation
Sparse-dLLM: Accelerating Diffusion LLMs with Dynamic Cache Eviction 2025.08 Arxiv >7B, Sparsity
DLLMQuant: Quantizing Diffusion-based Large Language Models 2025.08 Arxiv >7B, Quantization
Quantization Meets dLLMs: Post-training Quantization Study 2025.08 Arxiv >7B, Quantization
FS-DFM: Few-Step Diffusion Language Model 2025.09 Arxiv >7B
SparseD: Sparse Attention for Diffusion Language Models 2025.09 Arxiv >7B, Sparsity
LLaDA-MoE: A Sparse MoE Diffusion Language Model 2025.09 Arxiv >7B, MoE
Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct 2025.10 Arxiv >7B, Distillation
CDLM: Consistency Diffusion Language Models For Faster Sampling 2025.11 Arxiv >7B, Consistency

5.4 Inference Frameworks & Systems

New section: production-grade frameworks and runtime engineering for dLLMs.

Paper Title Year Venue Remark
dInfer: An Efficient Inference Framework for Diffusion Language Models 2025.10 Arxiv Modular framework, >1100 TPS
JetEngine (SDAR) 2025.10 Repo Lightweight engine for SDAR (3700+ TPS on H200)
Mercury: Ultra-Fast Language Models Based on Diffusion 2025.06 Arxiv Inception Labs commercial dLLM
Seed Diffusion: Large-Scale dLLM with High-Speed Inference 2025.08 Arxiv ByteDance code-focused dLLM

6. Multi-modal & Physical AI

6.1 Multi-modal dLLMs

Paper Title Year Venue Remark
Dual Diffusion for Unified Image Generation and Understanding 2025.01 Arxiv Unified Task
Unified Multimodal Discrete Diffusion (UniDisc) 2025.03 Arxiv Unified Diffusion
LaViDa: A Large Diffusion LLM for Multimodal Understanding 2025.05 NeurIPS Spotlight Understanding
MMaDA: Multimodal Large Diffusion Language Models 2025.05 NeurIPS Native Multimodal
Dimple: Discrete Diffusion Multimodal LLM with Parallel Decoding 2025.05 Arxiv Parallel Multimodal
LLaDA-V: Diffusion LLMs with Visual Instruction Tuning 2025.06 Arxiv Visual Tuning
Muddit: Liberating Generation Beyond Text-to-Image 2025.05 Arxiv Multi-modal
Show-o2: Improved Native Unified Multimodal Models 2025.06 Arxiv Unified Generation
Diffuse Everything: Multimodal Diffusion on Arbitrary Spaces 2025.06 ICML Arbitrary Spaces
TBAC-UniImage: Unified Understanding and Generation by Ladder-Side Diffusion Tuning 2025.08 Arxiv Tencent ladder-side tuning
Lumina-DiMOO: Omni Diffusion LLM for Generation 2025.10 Arxiv Omni-generation
MMaDA-Parallel: Thinking-Aware Editing and Generation 2025.11 Arxiv Parallel Multimodal
DiffusionVL: Translating AR Models into Diffusion VL Models 2025.12 Arxiv VL Adaptation
SDAR-VL: Stable and Efficient Block-wise Diffusion for Vision-Language Understanding 2025.12 Arxiv Block-diffusion VL
Dream-VL: Open Vision-Language Model with Diffusion Backbone 2025.12 Arxiv dVLM from Dream-7B
LaViDa-R1: Advancing Reasoning for Unified Multimodal Diffusion Language Models 2026.02 Arxiv Unified RL post-training
Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion 2026.03 Arxiv Any-to-any (text/speech/image)
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation 2026.04 Arxiv SigLIP-VQ + block diffusion
Fast-dVLM: Efficient Block-Diffusion VLM via Direct Conversion from Autoregressive VLM 2026.04 Arxiv AR-VLM → block-diffusion VLM
Analyzing Diffusion and Autoregressive VLMs in Multimodal Embedding Space 2026.02 Arxiv Embedding analysis
Dynin-Omni: Omnimodal Unified Large Diffusion Language Model 2026.03 Arxiv Omnimodal

6.2 Vision-Language-Action (VLA)

Scope note: this section covers VLA models that use a diffusion/masked-diffusion language model as the backbone (dVLM-based VLA) or apply discrete diffusion as the action-decoding mechanism (not continuous diffusion action heads grafted onto an AR VLM). Pure continuous-diffusion-policy VLAs such as DiVLA (Wen et al., 2024), HybridVLA, and ProgressVLA are intentionally excluded because their language model is autoregressive — only the action head is diffusion-based.

(a) dVLM-backbone VLA — language backbone itself is a diffusion language model.

Paper Title Year Venue Remark
LLaDA-VLA: Vision Language Diffusion Action Models 2025.06 Arxiv First LLaDA(d-VLM)-based VLA
dVLA: Diffusion VLA with Multimodal Chain-of-Thought 2025.09 Arxiv dLLM backbone + multimodal CoT
Dream-VLA: Open Vision-Language-Action Model with Diffusion Backbone 2025.12 Arxiv dVLA from Dream-7B; first dLLM pretrained VLA
MMaDA-VLA: Large Diffusion VLA with Unified Multi-Modal Instruction and Generation 2026.03 Arxiv Native discrete-diffusion VLA from MMaDA

(b) Discrete-diffusion action decoding — language backbone may still be AR-VLM, but action chunks are decoded via discrete diffusion. Closely tied to dLLM literature for inference techniques.

Paper Title Year Venue Remark
Discrete Diffusion VLA: Action Decoding in VLA Policies 2025.08 Arxiv Unified-transformer + discrete-diffusion actions
E0: Enhancing Generalization and Fine-Grained Control in VLA Models via Tweedie Discrete Diffusion 2025.11 Arxiv AR-VLM backbone + Tweedie discrete diffusion on action tokens

6.3 Autonomous Driving / World Models

Scope note: works that apply discrete diffusion / masked-diffusion language modeling to driving trajectories, action codebooks, or tokenized world states. Continuous trajectory-diffusion planners (e.g., classical Diffusion Policy applied to driving) are out of scope.

Paper Title Year Venue Remark
Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion 2023.11 ICLR Discrete diffusion on tokenized point-cloud world model
ReflectDrive: Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving 2025.09 Arxiv dLLM finetuned on discretized 2D driving space
Efficient and Explainable End-to-End Autonomous Driving via Masked Vision-Language-Action Diffusion 2026.02 Arxiv Discrete action codebook + masked diffusion

7. Agentic & Tool-Use dLLMs

New section — emerging line: how dLLMs behave as agents (planning, multi-turn, tool calling). Critical for connecting dLLMs to robotics and physical-AI agent stacks.

Paper Title Year Venue Remark
The Bitter Lesson of Diffusion Language Models for Agentic Workflows: A Comprehensive Reality Check 2026.01 Arxiv Embodied + tool-call eval
Agents of Diffusion: Enhancing Diffusion Language Models with Multi-Agent Reinforcement Learning for Structured Data Generation 2026.01 Arxiv Multi-agent RL
DLLM Agent: See Farther, Run Faster 2026.02 Arxiv dLLM-as-agent comparison

8. Theory, Guidance & Applications

8.1 Theory & Analysis

Paper Title Year Venue Remark
Can Diffusion Model Achieve Better Performance in Text Generation? Bridging the Gap between Training and Inference! 2023.05 ACL Findings <7B
TEncDM: Understanding the Properties of the Diffusion Model in the Space of Language Model Encodings 2024.02 AAAI <7B
Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling 2024.10 ICLR <7B
Theoretical Benefit and Limitation of Diffusion Language Model 2025.02 NeurIPS TER vs SER analysis
Generalized Interpolating Discrete Diffusion (GIDD) 2025.03 ICML Noising
Understanding the Quality-Diversity Trade-off in Diffusion Language Models 2025.03 ICML Quality-Diversity Trade-off
Unifying Continuous and Discrete Text Diffusion with Non-simultaneous Diffusion Processes 2025.05 ACL <7B
The Diffusion Duality 2025.06 ICML <7B, Theoretical Duality
Your Absorbing Discrete Diffusion Secretly Models the Bayesian Posterior 2025.07 Arxiv <7B
Time Is a Feature: Exploiting Temporal Dynamics in dLLMs 2025.08 Arxiv Temporal focus
Diffusion LLMs Know the Answer Before Decoding 2025.08 Arxiv Semantic focus
What Makes Diffusion Language Models Super Data Learners? 2025.10 Arxiv Data efficiency
Why mask diffusion does not work 2025.10 Arxiv Failure analysis
Empirical Analysis of Decoding Biases in Masked Diffusion Models 2025.10 Arxiv Decoding Bias
Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models 2025.10 Arxiv Speed Analysis
On the Role of Discreteness in Diffusion LLMs 2025.12 Arxiv Speed Analysis
ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs 2025.10 ICLR
Efficient Self-Evaluation for Diffusion Language Models via Sequence Regeneration 2026.03 ACL Self-evaluation, Generalization analysis
On the Role of Discreteness in Diffusion LLMs 2025.12 Arxiv Discreteness analysis
ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs 2025.10 ICLR Benchmark
Diffusion Language Models are Super Data Learners 2025.11 Arxiv Data learner analysis
Adaptation to Intrinsic Dependence in Diffusion Language Models 2026.02 Arxiv Distribution-agnostic schedule theory
Confidence-Based Decoding is Provably Efficient for Diffusion Language Models 2026.03 Arxiv First theory of confidence-based decoding

8.2 Guidance & Downstream Applications

Paper Title Year Venue Remark
DiffusEmp: A Diffusion Model-Based Framework with Multi-Grained Control for Empathetic Response Generation 2023.06 ACL Dialogue
DiffuDetox: A Mixed Diffusion Model for Text Detoxification 2023.06 ACL Findings Detoxification
PoetryDiffusion: Towards Joint Semantic and Metrical Manipulation in Poetry Generation 2023.06 AAAI Poetry Generation
ParaGuide: Guided Diffusion Paraphrasers for Plug-and-Play Textual Style Transfer 2023.08 AAAI Text Style Transfer
P^3SUM: Preserving Author's Perspective in News Summarization with Diffusion Language Models 2023.11 NAACL Summarization
DiffuCOMET: Contextual Commonsense Knowledge Diffusion 2024.02 ACL Commonsense
DiffusionDialog: A Diffusion Model for Diverse Dialog Generation with Latent Space 2024.04 LREC-COLING Dialogue
Diffusion Guided Language Modeling 2024.08 ACL Findings Control
DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models 2024.11 ACL Findings Data Synthesis
Segment-Level Diffusion: A Framework for Controllable Long-Form Generation with Diffusion Language Models 2024.12 ACL Text Segmentation
EdiText: Controllable Coarse-to-Fine Text Editing with Diffusion Language Models 2025.02 ACL Text Editing
Constrained Discrete Diffusion 2025.03 NeurIPS Constraint
Planning with Diffusion Models for Target-Oriented Dialogue Systems 2025.04 ACL Dialogue
CtrlDiff: Boosting dLLMs with Dynamic Block Prediction 2025.05 Arxiv Control
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective 2025.05 Arxiv Embedding
DINGO: Constrained Inference for Diffusion LLMs 2025.05 Arxiv Constrained Decoding
Inference-Time Scaling of Discrete Diffusion Models via Importance Weighting and Optimal Proposal Design 2025.05 ICLR SMC test-time scaling
Mercury: Ultra-Fast Language Models Based on Diffusion 2025.06 Arxiv Code
DiffuCoder: Improving Masked Diffusion for Code Generation 2025.06 Arxiv Code
Unveiling the Potential of Diffusion Large Language Model in Controllable Generation 2025.07 Arxiv Control
Arg-LLaDA: Argument Summarization via Large Language Diffusion Models and Sufficiency-Aware Refinement 2025.07 Arxiv Summarization
Improving Text Style Transfer using Masked Diffusion Language Models with Inference-time Scaling 2025.08 Arxiv Text Style Transfer
Seed Diffusion: Large-Scale dLLM with High-Speed Inference 2025.08 Arxiv Code
TreeDiff: AST-Guided Code Generation with Diffusion LLMs 2025.08 Arxiv Code (syntax-aware)
Beyond Autoregression: Empirical Study for Code Generation 2025.09 Arxiv Code
Tree Reward-Aligned Search for TReASURe in Masked Diffusion Language Models 2024.10 Arxiv Control
Syntax-Guided Diffusion Language Models with User-Integrated Personalization 2025.10 Arxiv Personalization
TraceDet: Hallucination Detection from the Decoding Trace of Diffusion Large Language Models 2025.10 Arxiv Hallucination
Don't Let It Fade: Preserving Edits via Token Timestep Allocation 2025.10 NeurIPS Control
Diffusion Language Models for Speech Recognition 2026.04 Arxiv ASR rescoring (MDLM/USDM)
CAGenMol: Condition-Aware Diffusion Language Model for Goal-Directed Molecular Generation 2026.04 Arxiv Molecular generation

9. Seminal Diffusion Papers

Paper Title Year Venue Remark
Deep Unsupervised Learning using Nonequilibrium Thermodynamics 2015.03 ICML Formulation
Denoising Diffusion Probabilistic Models (DDPM) 2020.06 NeurIPS -
Denoising Diffusion Implicit Models (DDIM) 2020.10 ICLR -
Score-Based Generative Modeling through SDEs 2020.11 ICLR -
Diffusion Models Beat GANs on Image Synthesis 2021.05 NeurIPS CG
Structured Denoising Diffusion in Discrete State-Spaces (D3PM) 2021.07 NeurIPS Discrete
Vector Quantized Diffusion Model (VQ-Diffusion) 2021.11 CVPR VQ
High-Resolution Image Synthesis with Latent Diffusion (LDM) 2021.12 CVPR -
Progressive Distillation for Fast Sampling 2022.02 ICLR Distillation
DPM-Solver: Fast ODE Solver for Sampling 2022.06 NeurIPS -
Classifier-Free Diffusion Guidance 2022.07 NeurIPS CFG
Analog Bits: Generating Discrete Data using Diffusion 2022.08 ICLR Self-conditioning
Scalable Diffusion Models with Transformers (DiT) 2022.12 ICCV Scalable focus
Consistency Models 2023.03 ICML -

🤝 Contact

About

A comprehensive list of papers about Large-Language-Diffusion-Models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors