RGB-only Active 3D Scene Graph Generation for Indoor Mobile Robots

📄 Abstract - RGB-only Active 3D Scene Graph Generation for Indoor Mobile Robots

Current approaches to 3D scene graph generation rely on dedicated depth sensors, such as LiDAR or RGB-D cameras, for metric 3D reconstruction. This limits deployment to specialized robotic platforms and excludes settings where only RGB cameras are available, such as fixed external infrastructure. Existing pipelines also typically operate on passively collected observation trajectories, rather than selecting viewpoints based on the partially built scene representation, and therefore fail to effectively exploit the semantic and spatial information encoded within the graph during exploration. This paper presents a fully visual framework for the active, incremental construction of 3D scene graphs from RGB input only, addressing both limitations. The proposed approach unifies perception and planning around a shared structured representation that captures object semantics, 3D geometry, relational context, and information from multiple viewpoints. Because the framework is hardware-agnostic and relies only on RGB observations, it can incorporate inputs from both onboard robot cameras and fixed external cameras within the same representation. Experiments on the Replica dataset show that the RGB-only pipeline achieves F1-score parity with baselines using ground-truth depth. Active exploration experiments on ReplicaCAD further show that semantic-driven viewpoint selection detects more than twice as many objects as a geometric frontier-based baseline under the same exploration budget. Finally, the external-camera setting demonstrates that complementary RGB views can effectively bootstrap the scene graph and improve contextual understanding at no additional exploration cost.

仅用RGB摄像头的室内移动机器人主动式3D场景图构建 / RGB-only Active 3D Scene Graph Generation for Indoor Mobile Robots

1️⃣ 一句话总结

本文提出一种仅依赖普通RGB摄像头（无需深度传感器）就能让室内机器人主动探索并逐步构建3D场景图的方法，通过将感知与规划统一在同一个语义-几何结构中，并智能选择观察角度，在相同探索时间内能发现两倍以上的物体，还能融合固定摄像头的视角来提升场景理解。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要