Skip to content

SqueezeAILab/CDLM

Repository files navigation

CDLM: Consistency Diffusion Language Models for Faster Sampling

License: MIT Paper: arXiv HuggingFace: CDLM-Dream HuggingFace: CDLM-LLaDA

📘 Introduction

Diffusion Language Models (DLMs) offer a promising parallel generation paradigm but suffer from slow inference due to numerous refinement steps and the inability to use standard KV caching. We introduce CDLM (Consistency Diffusion Language Models), a training-based acceleration method that simultaneously tackles both bottlenecks. CDLM integrates consistency modeling to drastically reduce the number of required sampling steps by enabling multi-token finalization. Furthermore, we enforce a block-wise causal attention mask during fine-tuning, making the model fully compatible with KV caching. Experiments show that CDLM achieves 3.6×–12.8× lower latency while maintaining competitive accuracy on math and coding tasks.

main_video.mp4

Visualization of inference.

CDLM method overview

Overview of CDLM training.

🔥 Updates

  • [25.11.24] arXiv paper is available.
  • [25.11.18] CDLM-Dream and CDLM-LLaDA are now available on Hugging Face.

🚀 User Guide

1. Installation

Clone the repository and set up the environment.

Environment Configuration

# Create and activate a conda environment
conda create -n cdlm python=3.12
conda activate cdlm

# Install dependencies
pip install -r requirements.txt

2. Training CDLM

Navigate to the training directory.

cd CDLM-train

Training consists of two phases: offline trajectory extraction and training. For the first phase, update config/acc_config_preproc, config/dream_eagle.yaml, and config/llada.yaml to match your environment.

Start preprocessing:

CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --config_file config/acc_config_preproc --num_processes 4 --main_process_port 29577 preprocess.py --config config/llada.yaml

For the actual training, update config/acc_config, config/dream_eagle.yaml, and config/llada.yaml to match your environment.

Start training:

CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --config_file config/acc_config --num_processes 4 --main_process_port 29578 train.py --config config/llada.yaml

Alternatively, you can use train.sh.

3. Evaluation

Evaluation scripts for CDLM and naive LLaDA are available in the CDLM-eval directory.

Example:

cd CDLM-eval
./eval_cdlm_dream.sh gsm8k # <gsm8k|humaneval|mbpp|math>
./eval_cdlm_llada.sh humaneval # <gsm8k|humaneval|mbpp|math>

Evaluation scripts for dLLM-cache are available in the dLLM-cache-eval/scripts directory.

Example:

cd dLLM-cache-eval/scripts
./run_Dream_gsm8k_Instruct.sh

Evaluation scripts for naive Dream and Fast-dLLM are available in the FastdLLM-eval directory.

Example:

cd FastdLLM-eval/dream
./eval_gsm8k.sh

❗️ Important Notice for HumanEval

The HumanEval and HumanEval-Instruct benchmarks require a post-processing step to sanitize the generated code and calculate the final pass@1 score. After the evaluation script finishes, run:

python postprocess_code.py {path/to/your/samples_humaneval_xxx.jsonl}

Replace the placeholder with the actual path to your generated samples file in your configured output_path.

❗️ Important Notice for MBPP

The MBPP-Instruct benchmark requires a post-processing step to sanitize the generated code and calculate the final pass@1 score. After the evaluation script finishes, run:

python postprocess_code_mbpp.py {path/to/your/samples_mbpp_xxx.jsonl}

Replace the placeholder with the actual path to your generated samples file in your configured output_path.

🙏 Acknowledgements

Our work and codebase are deeply inspired by open-source projects, especially D2F. We also acknowledge Dream, LLaDA, Fast-dLLM, and dLLM-Cache, from which we leveraged components for evaluation.

📚 Citation

@article{kim2025cdlm,
  title   = {CDLM: Consistency Diffusion Language Models for Faster Sampling},
  author  = {Kim, Minseo and Xu, Chenfeng and Hooper, Coleman and Singh, Harman 
             and Athiwaratkun, Ben and Zhang, Ce and Keutzer, Kurt and Gholami, Amir},
  journal = {arXiv preprint arXiv:2511.19269},
  year    = {2025},
  url     = {https://arxiv.org/abs/2511.19269}
}

About

CDLM: Consistency Diffusion Language Models for Faster Sampling

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published