📄
Abstract - Spec-AUF: Accept-Until-Fail Training under Train-Inference Misalignment for Masked Block Drafters
Speculative decoding accelerates autoregressive generation by drafting a block of tokens that the target model verifies left-to-right, committing only the longest accepted prefix. Block (DLM-style) drafters predict the whole block in parallel, which is fast but trained with a full-block cross-entropy that supervises every position against the gold continuation -- even though inference discards every token after the first rejection. Recent acceptance-aware objectives patch this by reweighting the full-block loss; we instead use teacher-forced learning as a motivation for how supervision should concentrate on the accepted prefix. A mask-only block drafter has no input-side channel for gold-prefix conditioning, so AUF approximates that prefix-sensitive supervision on the loss side by keeping the cross-entropy support only through the drafter's first predicted failure. AUF is a single, detached change to the CE support -- no auxiliary objective, no verifier rollouts, and no change to the inference pipeline or the exactness contract. Within fixed drafter backbones and serving settings on Qwen3-8B, AUF raises the DFlash drafter's average emitted length $\tau$, averaged over six benchmarks, from 2.40 to 2.61, with a gain on every benchmark, and transfers to Domino's two-branch head (2.56 to 2.68). Two findings sharpen the picture: the decay-only baseline reaches higher token accuracy on the shared block mask yet decodes worse, and on DFlash, once AUF truncates the support, the standard exponential position-decay weighting becomes empirically inert.
Spec-AUF:面向掩码块起草器在训练与推理不一致下的“接受直到失败”训练方法 /
Spec-AUF: Accept-Until-Fail Training under Train-Inference Misalignment for Masked Block Drafters
1️⃣ 一句话总结
该论文提出了一种名为AUF的简单训练方法,通过仅保留草案块中直到第一个预测错误位置的交叉熵损失,解决了块式起草器在训练时对所有位置同等监督、而推理时只保留被接受前缀的矛盾,从而在不改变推理流程的前提下,显著提升了多个基准上的平均生成长度。