GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training

📄 Abstract - GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training

Multi-turn reinforcement learning (RL) for multi-modal agents built upon vision-language models (VLMs) is hampered by sparse rewards and long-horizon credit assignment. Recent methods densify the reward by querying a teacher that provides step-level feedback, e.g., Guided Thought Reinforcement (GTR) and On-Policy Distillation, but rely on costly, often privileged models as the teacher, limiting practicality and reproducibility. We introduce GTR-Turbo, a highly efficient upgrade to GTR, which matches the performance without training or querying an expensive teacher model. Specifically, GTR-Turbo merges the weights of checkpoints produced during the ongoing RL training, and then uses this merged model as a "free" teacher to guide the subsequent RL via supervised fine-tuning or soft logit distillation. This design removes dependence on privileged VLMs (e.g., GPT or Gemini), mitigates the "entropy collapse" observed in prior work, and keeps training stable. Across diverse visual agentic tasks, GTR-Turbo improves the accuracy of the baseline model by 10-30% while reducing wall-clock training time by 50% and compute cost by 60% relative to GTR.

GTR-Turbo：合并的检查点悄然成为智能视觉语言模型训练的免费导师 / GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training

1️⃣ 一句话总结

这篇论文提出了一种名为GTR-Turbo的高效训练方法，它通过合并训练过程中产生的模型检查点来创建一个‘免费’的指导模型，从而在无需依赖昂贵外部模型的情况下，显著提升了视觉智能体的性能，并大幅降低了训练时间和计算成本。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要