菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-04
📄 Abstract - AdaPLD: Adaptive Retrieval and Reuse for Efficient Model-Free Speculative Decoding

Speculative decoding accelerates generation by verifying multiple drafted tokens in a single target-model forward pass, reducing sequential decoding iterations. Model-free variants avoid auxiliary draft models by reusing text and model states already available during generation, but their speedup depends on the reliability of the constructed drafts. We identify two limitations of existing reuse-based methods: lexically anchored retrieval has limited recall under surface-form variation, and deterministic span copying can be brittle when the retrieved context does not uniquely determine the continuation. We propose \emph{AdaPLD}, a training-free method that adaptively improves both retrieval and draft construction. AdaPLD preserves high-precision lexical reuse while using semantic similarity to recover additional reuse opportunities when lexical matching fails. It further constructs branched reuse hypotheses to account for continuation uncertainty, rather than relying on a single copied span. Across diverse benchmarks, AdaPLD reduces target-model forward passes and achieves up to $3.10\times$ decoding speedup.

顶级标签: llm natural language processing model evaluation
详细标签: speculative decoding draft construction retrieval augmentation inference acceleration reuse-based methods 或 搜索:

自适应检索与重用:高效的无模型推测解码方法 / AdaPLD: Adaptive Retrieval and Reuse for Efficient Model-Free Speculative Decoding


1️⃣ 一句话总结

这篇论文提出了一种名为AdaPLD的无训练推理加速方法,通过结合语义相似度检索和假设分支生成,智能地复用已生成文本和模型状态来构建候选词序列,从而在无需额外模型的情况下将大语言模型的生成速度提升最高3.1倍。

源自 arXiv: 2606.05742