面向大语言模型的分布校正离线数据蒸馏 / Distribution Corrected Offline Data Distillation for Large Language Models
1️⃣ 一句话总结
本文提出了一种离线推理蒸馏方法,通过自适应地强调与模型自生成分布更一致的教师监督信号,来修正传统离线蒸馏中教师与学生之间的分布偏差,从而在不依赖在线采样的前提下提升小模型在数学推理任务上的准确性和稳定性。
Distilling reasoning traces from strong large language models into smaller ones is a promising route to improve intelligence in resource-constrained settings. Existing approaches face a fundamental trade-off: offline distillation from teacher-generated traces provides high-quality, sample-efficient supervision but suffers from distributional drift: during training, the student model conditions on teacher-generated prefixes, whereas during inference the student autoregresses on self-generated prefixes, leading to compounding errors over long reasoning trajectories. Meanwhile, on-policy or self-distillation methods better match the student's inference-time distribution, but require costly online sampling and often produce low-quality traces in early training. We propose a principled offline reasoning distillation framework that preserves the efficiency and supervision quality of offline teacher-generated data while correcting teacher-student distribution drift. It adaptively emphasizes teacher supervision that is better aligned with the student's on-policy distribution. Evaluations on mathematical reasoning benchmarks of GSM8K, MATH, MATH500, and harder held-out competition-style tasks, including AMC, AIME, and OlympiadBench, show that our method improves reasoning accuracy over prior offline distillation algorithms and yields more stable reasoning traces while preserving instruction-following capabilities. Our work shows that lightweight, distribution-correction-aware training can substantially strengthen offline reasoning distillation without online rollouts.
面向大语言模型的分布校正离线数据蒸馏 / Distribution Corrected Offline Data Distillation for Large Language Models
本文提出了一种离线推理蒸馏方法,通过自适应地强调与模型自生成分布更一致的教师监督信号,来修正传统离线蒸馏中教师与学生之间的分布偏差,从而在不依赖在线采样的前提下提升小模型在数学推理任务上的准确性和稳定性。
源自 arXiv: 2605.14071