← 返回列表

🤖 系统

📄 Abstract - MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots

Grounding natural-language instructions into continuous control for quadruped robots remains a fundamental challenge in vision language action. Existing methods struggle to bridge high-level semantic reasoning and low-level actuation, leading to unstable grounding and weak generalization in the real world. To address these issues, we present MobileVLA-R1, a unified vision-language-action framework that enables explicit reasoning and continuous control for quadruped robots. We construct MobileVLA-CoT, a large-scale dataset of multi-granularity chain-of-thought (CoT) for embodied trajectories, providing structured reasoning supervision for alignment. Built upon this foundation, we introduce a two-stage training paradigm that combines supervised CoT alignment with GRPO reinforcement learning to enhance reasoning consistency, control stability, and long-horizon execution. Extensive evaluations on VLN and VLA tasks demonstrate superior performance over strong baselines, with approximately a 5% improvement. Real-world deployment on a quadruped robot validates robust performance in complex environments. Code: this https URL. Website: this https URL.

顶级标签: robotics vision-language-action agents

📄 论文总结

MobileVLA-R1：强化移动机器人的视觉-语言-动作整合 / MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots

1️⃣ 一句话总结

这篇论文提出了一种名为MobileVLA-R1的新方法，通过结合思维链数据和强化学习，有效提升了四足机器人根据语言指令执行连续动作的稳定性和泛化能力。

📄 打开原文 PDF

← 返回列表

菜单

🤖 AI 深度阅读

📄 论文总结

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

🤖 AI 深度阅读

📄 论文总结

1️⃣ 一句话总结

获取最新论文摘要