← 返回列表

🤖 系统

📄 Abstract - Diffusion Language Models are Super Data Learners

Under strictly controlled pre-training settings, we observe a Crossover: when unique data is limited, diffusion language models (DLMs) consistently surpass autoregressive (AR) models by training for more epochs. The crossover shifts later with more or higher-quality data, earlier with larger models, and persists across dense and sparse architectures. We attribute the gains to three compounding factors: (1) any-order modeling, (2) super-dense compute from iterative bidirectional denoising, and (3) built-in Monte Carlo augmentation; input or parameter noise improves AR under data constraint but cannot close the gap. At scale, a 1.7B DLM trained with a ~1.5T-token compute budget on 10B unique Python tokens overtakes an AR coder trained with strictly matched settings. In addition, a 1B-parameter DLM achieves > 56% accuracy on HellaSwag and > 33% on MMLU using only 1B tokens, without any special tricks, just by repeating standard pre-training data. We also show that rising validation cross-entropy does not imply degraded downstream performance in this regime.

顶级标签: natural language processing model training machine learning

📄 论文总结

扩散语言模型是超级数据学习者 / Diffusion Language Models are Super Data Learners

1️⃣ 一句话总结

这项研究发现，在数据有限的情况下，扩散语言模型通过多轮训练能持续超越自回归模型，这得益于其任意顺序建模、密集计算和内置数据增强能力，即使在小规模数据上也能取得优异的下游任务表现。

📄 打开原文 PDF

← 返回列表

菜单

🤖 AI 深度阅读

📄 论文总结

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

🤖 AI 深度阅读

📄 论文总结

1️⃣ 一句话总结

获取最新论文摘要