菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-01-12
📄 Abstract - Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models

Diffusion Language Models (DLMs) offer a promising alternative for language modeling by enabling parallel decoding through iterative refinement. However, most DLMs rely on hard binary masking and discrete token assignments, which hinder the revision of early decisions and underutilize intermediate probabilistic representations. In this paper, we propose EvoToken-DLM, a novel diffusion-based language modeling approach that replaces hard binary masks with evolving soft token distributions. EvoToken-DLM enables a progressive transition from masked states to discrete outputs, supporting revisable decoding. To effectively support this evolution, we introduce continuous trajectory supervision, which aligns training objectives with iterative probabilistic updates. Extensive experiments across multiple benchmarks show that EvoToken-DLM consistently achieves superior performance, outperforming strong diffusion-based and masked DLM baselines. Project webpage: this https URL.

顶级标签: natural language processing model training machine learning
详细标签: diffusion language models progressive decoding soft token distributions iterative refinement parallel decoding 或 搜索:

超越硬掩码:扩散语言模型的渐进式词元演化 / Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models


1️⃣ 一句话总结

这篇论文提出了一种名为EvoToken-DLM的新方法,它用可演化的软词元分布替代了传统扩散语言模型中的硬二值掩码,从而支持可修正的解码过程,并在多个基准测试中取得了更优的性能。

源自 arXiv: 2601.07351