扩散对偶性,第二章:Ψ采样器与高效课程学习 / The Diffusion Duality, Chapter II: $Ψ$-Samplers and Efficient Curriculum
1️⃣ 一句话总结
这篇论文提出了一种新的预测-校正采样器家族和高效训练方法,显著提升了离散扩散模型在文本和图像生成上的性能,使其采样质量能随步骤增加而持续改进,并大幅降低了训练所需的时间和内存。
Uniform-state discrete diffusion models excel at few-step generation and guidance due to their ability to self-correct, making them preferred over autoregressive or Masked diffusion models in these settings. However, their sampling quality plateaus with ancestral samplers as the number of steps increases. We introduce a family of Predictor-Corrector (PC) samplers for discrete diffusion that generalize prior methods and apply to arbitrary noise processes. When paired with uniform-state diffusion, our samplers outperform ancestral sampling on both language and image modeling, achieving lower generative perplexity at matched unigram entropy on OpenWebText and better FID/IS scores on CIFAR10. Crucially, unlike conventional samplers, our PC methods continue to improve with more sampling steps. Taken together, these findings call into question the assumption that Masked diffusion is the inevitable future of diffusion-based language modeling. Beyond sampling, we develop a memory-efficient curriculum for the Gaussian relaxation training phase, reducing training time by 25% and memory by 33% compared to Duo while maintaining comparable perplexity on OpenWebText and LM1B and strong downstream performance. We release code, checkpoints, and a video-tutorial on: this https URL
扩散对偶性,第二章:Ψ采样器与高效课程学习 / The Diffusion Duality, Chapter II: $Ψ$-Samplers and Efficient Curriculum
这篇论文提出了一种新的预测-校正采样器家族和高效训练方法,显著提升了离散扩散模型在文本和图像生成上的性能,使其采样质量能随步骤增加而持续改进,并大幅降低了训练所需的时间和内存。
源自 arXiv: 2602.21185