Diffusion Language Models (DLMs) offer a promising parallel generation paradigm but suffer from slow inference due to numerous refinement steps and the inability to use standard KV caching. We introduce CDLM (Consistency Diffusion Language Models), a training-based acceleration method that simultaneously tackles both bottlenecks. CDLM integrates consistency modeling to drastically reduce the number of required sampling steps by enabling multi-token finalization. Furthermore, we enforce a block-wise causal attention mask during fine-tuning, making the model fully compatible with KV caching. Experiments show that CDLM achieves 3.6×–12.8× lower latency while maintaining competitive accuracy on math and coding tasks.
main_video.mp4
Visualization of inference.
Overview of CDLM training.
- [25.11.24] arXiv paper is available.
- [25.11.18] CDLM-Dream and CDLM-LLaDA are now available on Hugging Face.
Clone the repository and set up the environment.
# Create and activate a conda environment
conda create -n cdlm python=3.12
conda activate cdlm
# Install dependencies
pip install -r requirements.txtNavigate to the training directory.
cd CDLM-trainTraining consists of two phases: offline trajectory extraction and training.
For the first phase, update config/acc_config_preproc, config/dream_eagle.yaml, and config/llada.yaml to match your environment.
Start preprocessing:
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --config_file config/acc_config_preproc --num_processes 4 --main_process_port 29577 preprocess.py --config config/llada.yamlFor the actual training, update config/acc_config, config/dream_eagle.yaml, and config/llada.yaml to match your environment.
Start training:
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --config_file config/acc_config --num_processes 4 --main_process_port 29578 train.py --config config/llada.yamlAlternatively, you can use train.sh.
Evaluation scripts for CDLM and naive LLaDA are available in the CDLM-eval directory.
Example:
cd CDLM-eval
./eval_cdlm_dream.sh gsm8k # <gsm8k|humaneval|mbpp|math>
./eval_cdlm_llada.sh humaneval # <gsm8k|humaneval|mbpp|math>Evaluation scripts for dLLM-cache are available in the dLLM-cache-eval/scripts directory.
Example:
cd dLLM-cache-eval/scripts
./run_Dream_gsm8k_Instruct.shEvaluation scripts for naive Dream and Fast-dLLM are available in the FastdLLM-eval directory.
Example:
cd FastdLLM-eval/dream
./eval_gsm8k.shThe
HumanEvalandHumanEval-Instructbenchmarks require a post-processing step to sanitize the generated code and calculate the finalpass@1score. After the evaluation script finishes, run:python postprocess_code.py {path/to/your/samples_humaneval_xxx.jsonl}Replace the placeholder with the actual path to your generated samples file in your configured
output_path.
The
MBPP-Instructbenchmark requires a post-processing step to sanitize the generated code and calculate the finalpass@1score. After the evaluation script finishes, run:python postprocess_code_mbpp.py {path/to/your/samples_mbpp_xxx.jsonl}Replace the placeholder with the actual path to your generated samples file in your configured
output_path.
Our work and codebase are deeply inspired by open-source projects, especially D2F. We also acknowledge Dream, LLaDA, Fast-dLLM, and dLLM-Cache, from which we leveraged components for evaluation.
@article{kim2025cdlm,
title = {CDLM: Consistency Diffusion Language Models for Faster Sampling},
author = {Kim, Minseo and Xu, Chenfeng and Hooper, Coleman and Singh, Harman
and Athiwaratkun, Ben and Zhang, Ce and Keutzer, Kurt and Gholami, Amir},
journal = {arXiv preprint arXiv:2511.19269},
year = {2025},
url = {https://arxiv.org/abs/2511.19269}
}