Being-H0.5:面向跨形态泛化的人本机器人学习规模化模型 / Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization
1️⃣ 一句话总结
这篇论文提出了一个名为Being-H0.5的通用机器人基础模型,它通过将人类操作数据作为‘通用语言’来训练,使不同形态和能力的机器人能够相互学习技能,从而在多种真实和模拟机器人平台上实现了卓越的跨平台适应与任务执行能力。
We introduce Being-H0.5, a foundational Vision-Language-Action (VLA) model designed for robust cross-embodiment generalization across diverse robotic platforms. While existing VLAs often struggle with morphological heterogeneity and data scarcity, we propose a human-centric learning paradigm that treats human interaction traces as a universal "mother tongue" for physical interaction. To support this, we present UniHand-2.0, the largest embodied pre-training recipe to date, comprising over 35,000 hours of multimodal data across 30 distinct robotic embodiments. Our approach introduces a Unified Action Space that maps heterogeneous robot controls into semantically aligned slots, enabling low-resource robots to bootstrap skills from human data and high-resource platforms. Built upon this human-centric foundation, we design a unified sequential modeling and multi-task pre-training paradigm to bridge human demonstrations and robotic execution. Architecturally, Being-H0.5 utilizes a Mixture-of-Transformers design featuring a novel Mixture-of-Flow (MoF) framework to decouple shared motor primitives from specialized embodiment-specific experts. Finally, to make cross-embodiment policies stable in the real world, we introduce Manifold-Preserving Gating for robustness under sensory shift and Universal Async Chunking to universalize chunked control across embodiments with different latency and control profiles. We empirically demonstrate that Being-H0.5 achieves state-of-the-art results on simulated benchmarks, such as LIBERO (98.9%) and RoboCasa (53.9%), while also exhibiting strong cross-embodiment capabilities on five robotic platforms.
Being-H0.5:面向跨形态泛化的人本机器人学习规模化模型 / Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization
这篇论文提出了一个名为Being-H0.5的通用机器人基础模型,它通过将人类操作数据作为‘通用语言’来训练,使不同形态和能力的机器人能够相互学习技能,从而在多种真实和模拟机器人平台上实现了卓越的跨平台适应与任务执行能力。
源自 arXiv: 2601.12993