菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-09
📄 Abstract - DMax: Aggressive Parallel Decoding for dLLMs

We present DMax, a new paradigm for efficient diffusion language models (dLLMs). It mitigates error accumulation in parallel decoding, enabling aggressive decoding parallelism while preserving generation quality. Unlike conventional masked dLLMs that decode through a binary mask-to-token transition, DMax reformulates decoding as a progressive self-refinement from mask embeddings to token embeddings. At the core of our approach is On-Policy Uniform Training, a novel training strategy that efficiently unifies masked and uniform dLLMs, equipping the model to recover clean tokens from both masked inputs and its own erroneous predictions. Building on this foundation, we further propose Soft Parallel Decoding. We represent each intermediate decoding state as an interpolation between the predicted token embedding and the mask embedding, enabling iterative self-revising in embedding space. Extensive experiments across a variety of benchmarks demonstrate the effectiveness of DMax. Compared with the original LLaDA-2.0-mini, our method improves TPF on GSM8K from 2.04 to 5.47 while preserving accuracy. On MBPP, it increases TPF from 2.71 to 5.86 while maintaining comparable performance. On two H200 GPUs, our model achieves an average of 1,338 TPS at batch size 1. Code is available at: this https URL

顶级标签: llm model training natural language processing
详细标签: parallel decoding diffusion language models efficient inference self-refinement training strategy 或 搜索:

DMax:面向扩散语言模型的激进并行解码方法 / DMax: Aggressive Parallel Decoding for dLLMs


1️⃣ 一句话总结

这篇论文提出了一种名为DMax的新方法,它通过将解码过程重新定义为从掩码嵌入到词嵌入的渐进式自我精炼,并采用新颖的训练策略,使得扩散语言模型能够进行激进的并行解码,在显著提升推理速度的同时,有效控制了错误累积,保持了文本生成的质量。

源自 arXiv: 2604.08302