Accelerating Speculative Diffusions via Block Verification

📄 Abstract - Accelerating Speculative Diffusions via Block Verification

Speculative decoding speeds up LLM inference by using a draft model to generate tokens, with an acceptance-rejection scheme that ensures that the output matches the target distribution. Adapting this to continuous diffusions is difficult because speculative sampling requires drawing from a residual distribution. While straightforward in discrete spaces, efficiently sampling this residual in continuous space is non-trivial. Consequently, existing diffusion adaptations either use computationally inefficient sampling techniques or rely on an alternative scheme. In this work, we introduce a novel scheme that efficiently implements the original speculative sampling mechanism for diffusion models. Our approach offers a critical advantage over current methods: it enables us to adapt block verification from LLMs to diffusions -- which provably improves the acceptance rate of drafts. Furthermore, we formalize and analyze the Free Drafter, a heuristic self-speculative drafter for diffusions that requires no training. By enabling block verification, our Free Drafter yields up to a 6.3% speedup over existing speculative methods with no additional training and negligible overhead beyond the existing parallel verification pass.

通过区块验证加速推测性扩散 / Accelerating Speculative Diffusions via Block Verification

1️⃣ 一句话总结

本文针对扩散模型提出了一种新的推测性采样机制，通过借鉴大语言模型中的区块验证方法，显著提高了草稿的接受率，并在此基础上引入无需额外训练的“自由草稿器”，在不增加计算负担的前提下实现了最高6.3%的速度提升。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要