从二维观测中学习可泛化的三维场景表示 / Towards Learning a Generalizable 3D Scene Representation from 2D Observations
1️⃣ 一句话总结
这篇论文提出了一种新的可泛化神经辐射场方法,能够仅通过机器人第一视角的二维图像,直接预测出全局坐标系下的三维空间占用情况,无需针对新场景进行额外训练,从而更好地支持机器人抓取等任务。
We introduce a Generalizable Neural Radiance Field approach for predicting 3D workspace occupancy from egocentric robot observations. Unlike prior methods operating in camera-centric coordinates, our model constructs occupancy representations in a global workspace frame, making it directly applicable to robotic manipulation. The model integrates flexible source views and generalizes to unseen object arrangements without scene-specific finetuning. We demonstrate the approach on a humanoid robot and evaluate predicted geometry against 3D sensor ground truth. Trained on 40 real scenes, our model achieves 26mm reconstruction error, including occluded regions, validating its ability to infer complete 3D occupancy beyond traditional stereo vision methods.
从二维观测中学习可泛化的三维场景表示 / Towards Learning a Generalizable 3D Scene Representation from 2D Observations
这篇论文提出了一种新的可泛化神经辐射场方法,能够仅通过机器人第一视角的二维图像,直接预测出全局坐标系下的三维空间占用情况,无需针对新场景进行额外训练,从而更好地支持机器人抓取等任务。
源自 arXiv: 2602.10943