Spec-AUF: Accept-Until-Fail Training under Train-Inference Misalignment for Masked Block Drafters

📄 Abstract - Spec-AUF: Accept-Until-Fail Training under Train-Inference Misalignment for Masked Block Drafters

Speculative decoding accelerates autoregressive generation by drafting a block of tokens that the target model verifies left-to-right, committing only the longest accepted prefix. Block (DLM-style) drafters predict the whole block in parallel, which is fast but trained with a full-block cross-entropy that supervises every position against the gold continuation -- even though inference discards every token after the first rejection. Recent acceptance-aware objectives patch this by reweighting the full-block loss; we instead use teacher-forced learning as a motivation for how supervision should concentrate on the accepted prefix. A mask-only block drafter has no input-side channel for gold-prefix conditioning, so AUF approximates that prefix-sensitive supervision on the loss side by keeping the cross-entropy support only through the drafter's first predicted failure. AUF is a single, detached change to the CE support -- no auxiliary objective, no verifier rollouts, and no change to the inference pipeline or the exactness contract. Within fixed drafter backbones and serving settings on Qwen3-8B, AUF raises the DFlash drafter's average emitted length $\tau$, averaged over six benchmarks, from 2.40 to 2.61, with a gain on every benchmark, and transfers to Domino's two-branch head (2.56 to 2.68). Two findings sharpen the picture: the decay-only baseline reaches higher token accuracy on the shared block mask yet decodes worse, and on DFlash, once AUF truncates the support, the standard exponential position-decay weighting becomes empirically inert.

Spec-AUF：面向掩码块起草器在训练与推理不一致下的“接受直到失败”训练方法 / Spec-AUF: Accept-Until-Fail Training under Train-Inference Misalignment for Masked Block Drafters

1️⃣ 一句话总结

该论文提出了一种名为AUF的简单训练方法，通过仅保留草案块中直到第一个预测错误位置的交叉熵损失，解决了块式起草器在训练时对所有位置同等监督、而推理时只保留被接受前缀的矛盾，从而在不改变推理流程的前提下，显著提升了多个基准上的平均生成长度。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要