菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-19
📄 Abstract - SpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative Decoding

Large language models incur high inference latency due to sequential autoregressive decoding. Speculative decoding alleviates this bottleneck by using a lightweight draft model to propose multiple tokens for batched verification. However, its adoption has been limited by the lack of high-quality draft models and scalable training infrastructure. We introduce SpecForge, an open-source, production-oriented framework for training speculative decoding models with full support for EAGLE-3. SpecForge incorporates target-draft decoupling, hybrid parallelism, optimized training kernels, and integration with production-grade inference engines, enabling up to 9.9x faster EAGLE-3 training for Qwen3-235B-A22B. In addition, we release SpecBundle, a suite of production-grade EAGLE-3 draft models trained with SpecForge for mainstream open-source LLMs. Through a systematic study of speculative decoding training recipes, SpecBundle addresses the scarcity of high-quality drafts in the community, and our draft models achieve up to 4.48x end-to-end inference speedup on SGLang, establishing SpecForge as a practical foundation for real-world speculative decoding deployment.

顶级标签: llm systems model training
详细标签: speculative decoding training framework inference acceleration eagle-3 open-source 或 搜索:

SpecForge:一个用于推测解码的灵活高效开源训练框架 / SpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative Decoding


1️⃣ 一句话总结

本文提出了一个名为SpecForge的开源框架,它通过优化训练流程和提供高质量预训练模型,有效解决了大语言模型推理速度慢的难题,能显著提升文本生成效率。

源自 arXiv: 2603.18567