菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-11
📄 Abstract - Accelerating Speculative Diffusions via Block Verification

Speculative decoding speeds up LLM inference by using a draft model to generate tokens, with an acceptance-rejection scheme that ensures that the output matches the target distribution. Adapting this to continuous diffusions is difficult because speculative sampling requires drawing from a residual distribution. While straightforward in discrete spaces, efficiently sampling this residual in continuous space is non-trivial. Consequently, existing diffusion adaptations either use computationally inefficient sampling techniques or rely on an alternative scheme. In this work, we introduce a novel scheme that efficiently implements the original speculative sampling mechanism for diffusion models. Our approach offers a critical advantage over current methods: it enables us to adapt block verification from LLMs to diffusions -- which provably improves the acceptance rate of drafts. Furthermore, we formalize and analyze the Free Drafter, a heuristic self-speculative drafter for diffusions that requires no training. By enabling block verification, our Free Drafter yields up to a 6.3% speedup over existing speculative methods with no additional training and negligible overhead beyond the existing parallel verification pass.

顶级标签: llm model training
详细标签: speculative decoding diffusion models block verification inference acceleration self-speculative 或 搜索:

通过区块验证加速推测性扩散 / Accelerating Speculative Diffusions via Block Verification


1️⃣ 一句话总结

本文针对扩散模型提出了一种新的推测性采样机制,通过借鉴大语言模型中的区块验证方法,显著提高了草稿的接受率,并在此基础上引入无需额外训练的“自由草稿器”,在不增加计算负担的前提下实现了最高6.3%的速度提升。

源自 arXiv: 2606.13426