菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-12
📄 Abstract - ABot-N0: Technical Report on the VLA Foundation Model for Versatile Embodied Navigation

Embodied navigation has long been fragmented by task-specific architectures. We introduce ABot-N0, a unified Vision-Language-Action (VLA) foundation model that achieves a ``Grand Unification'' across 5 core tasks: Point-Goal, Object-Goal, Instruction-Following, POI-Goal, and Person-Following. ABot-N0 utilizes a hierarchical ``Brain-Action'' architecture, pairing an LLM-based Cognitive Brain for semantic reasoning with a Flow Matching-based Action Expert for precise, continuous trajectory generation. To support large-scale learning, we developed the ABot-N0 Data Engine, curating 16.9M expert trajectories and 5.0M reasoning samples across 7,802 high-fidelity 3D scenes (10.7 $\text{km}^2$). ABot-N0 achieves new SOTA performance across 7 benchmarks, significantly outperforming specialized models. Furthermore, our Agentic Navigation System integrates a planner with hierarchical topological memory, enabling robust, long-horizon missions in dynamic real-world environments.

顶级标签: robotics multi-modal agents
详细标签: embodied navigation vision-language-action foundation model trajectory generation hierarchical architecture 或 搜索:

ABot-N0技术报告:面向通用具身导航的视觉-语言-动作基础模型 / ABot-N0: Technical Report on the VLA Foundation Model for Versatile Embodied Navigation


1️⃣ 一句话总结

这篇论文提出了一个名为ABot-N0的统一基础模型,它通过结合语言模型进行语义理解和新型动作模型生成连续轨迹,成功地将多种不同的机器人导航任务整合到一个框架中,并在大规模数据集上训练后,在多个标准测试中取得了领先的性能。

源自 arXiv: 2602.11598