掩码教师与强化学生:用于蒸馏视觉语言模型 / Masking Teacher and Reinforcing Student for Distilling Vision-Language Models
1️⃣ 一句话总结
这篇论文提出了一种名为Masters的新方法,通过逐步掩码大模型(教师)的非关键部分并结合强化学习奖励,来更稳定、高效地将大视觉语言模型的知识压缩到小模型(学生)中,解决了因模型尺寸差距大而导致的知识蒸馏效果不佳的问题。
Large-scale vision-language models (VLMs) have recently achieved remarkable multimodal understanding, but their massive size makes them impractical for deployment on mobile or edge devices. This raises the need for compact yet capable VLMs that can efficiently learn from powerful large teachers. However, distilling knowledge from a large teacher to a small student remains challenging due to their large size gap: the student often fails to reproduce the teacher's complex, high-dimensional representations, leading to unstable learning and degraded performance. To address this, we propose Masters (Masking Teacher and Reinforcing Student), a mask-progressive reinforcement learning (RL) distillation framework. Masters first masks non-dominant weights of the teacher to reduce unnecessary complexity, then progressively restores the teacher by gradually increasing its capacity during training. This strategy allows the student to learn richer representations from the teacher in a smooth and stable manner. To further refine knowledge transfer, Masters integrates an offline RL stage with two complementary rewards: an accuracy reward that measures the correctness of the generated responses, and a distillation reward that quantifies the ease of transferring responses from teacher to student. Unlike online think-answer RL paradigms that are computationally expensive and generate lengthy responses, our offline RL leverages pre-generated responses from masked teachers. These provide rich yet efficient guidance, enabling students to achieve strong performance without requiring the think-answer process.
掩码教师与强化学生:用于蒸馏视觉语言模型 / Masking Teacher and Reinforcing Student for Distilling Vision-Language Models
这篇论文提出了一种名为Masters的新方法,通过逐步掩码大模型(教师)的非关键部分并结合强化学习奖励,来更稳定、高效地将大视觉语言模型的知识压缩到小模型(学生)中,解决了因模型尺寸差距大而导致的知识蒸馏效果不佳的问题。
源自 arXiv: 2512.22238