📄 论文总结
MobileVLA-R1:强化移动机器人的视觉-语言-动作整合 / MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots
1️⃣ 一句话总结
这篇论文提出了一种名为MobileVLA-R1的新方法,通过结合思维链数据和强化学习,有效提升了四足机器人根据语言指令执行连续动作的稳定性和泛化能力。
Grounding natural-language instructions into continuous control for quadruped robots remains a fundamental challenge in vision language action. Existing methods struggle to bridge high-level semantic reasoning and low-level actuation, leading to unstable grounding and weak generalization in the real world. To address these issues, we present MobileVLA-R1, a unified vision-language-action framework that enables explicit reasoning and continuous control for quadruped robots. We construct MobileVLA-CoT, a large-scale dataset of multi-granularity chain-of-thought (CoT) for embodied trajectories, providing structured reasoning supervision for alignment. Built upon this foundation, we introduce a two-stage training paradigm that combines supervised CoT alignment with GRPO reinforcement learning to enhance reasoning consistency, control stability, and long-horizon execution. Extensive evaluations on VLN and VLA tasks demonstrate superior performance over strong baselines, with approximately a 5% improvement. Real-world deployment on a quadruped robot validates robust performance in complex environments. Code: this https URL. Website: this https URL.
MobileVLA-R1:强化移动机器人的视觉-语言-动作整合 / MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots
这篇论文提出了一种名为MobileVLA-R1的新方法,通过结合思维链数据和强化学习,有效提升了四足机器人根据语言指令执行连续动作的稳定性和泛化能力。