菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-10
📄 Abstract - VIA-SD: Verification via Intra-Model Routing for Speculative Decoding

Speculative decoding (SD) addresses the high inference costs of LLMs by having lightweight drafters generate candidates for large verifiers to validate in parallel. Existing draft-verify methods use binary decisions: accept or fully recompute. Yet we find that many rejected tokens can be verified correctly by a slim submodel derived from the full verifier via intra-model routing, instead of the full verifier. This motivates our slim-verifier to handle tokens requiring moderate verification resources, reducing expensive large-model calls. We propose Verification via Intra-Model Routing for Speculative Decoding (VIA-SD), a multi-tier framework using a routed slim-verifier. Draft tokens are processed hierarchically: direct acceptance for high-confidence cases, slim-verifier regeneration for medium-confidence cases, and full-model verification for uncertain cases. Across four representative tasks and multiple model families, VIA-SD reduces rejection rates by 0.10-0.22 and delivers 10-20% speedups over strong SD baselines, while achieving 2.5-3x acceleration over non-drafting decoding. Moreover, VIA-SD is compatible with existing SD frameworks without modifying their training procedures. Our results suggest multi-tier SD as a general paradigm for scalable and efficient LLM inference. Project page: this https URL

顶级标签: llm model training systems
详细标签: speculative decoding inference acceleration intra-model routing verification 或 搜索:

基于模型内部路由的投机解码验证方法 / VIA-SD: Verification via Intra-Model Routing for Speculative Decoding


1️⃣ 一句话总结

本文提出一种名为VIA-SD的多层验证框架,通过在大型语言模型内部提取一个轻量子模型来处理中等置信度的候选词,取代传统的“全接受或全重算”二值策略,有效降低了投机解码中的拒绝率并实现了10-20%的速度提升。

源自 arXiv: 2606.12243