菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-17
📄 Abstract - MessyKitchens: Contact-rich object-level 3D scene reconstruction

Monocular 3D scene reconstruction has recently seen significant progress. Powered by the modern neural architectures and large-scale data, recent methods achieve high performance in depth estimation from a single image. Meanwhile, reconstructing and decomposing common scenes into individual 3D objects remains a hard challenge due to the large variety of objects, frequent occlusions and complex object relations. Notably, beyond shape and pose estimation of individual objects, applications in robotics and animation require physically-plausible scene reconstruction where objects obey physical principles of non-penetration and realistic contacts. In this work we advance object-level scene reconstruction along two directions. First, we introduceMessyKitchens, a new dataset with real-world scenes featuring cluttered environments and providing high-fidelity object-level ground truth in terms of 3D object shapes, poses and accurate object contacts. Second, we build on the recent SAM 3D approach for single-object reconstruction and extend it with Multi-Object Decoder (MOD) for joint object-level scene reconstruction. To validate our contributions, we demonstrate MessyKitchens to significantly improve previous datasets in registration accuracy and inter-object penetration. We also compare our multi-object reconstruction approach on three datasets and demonstrate consistent and significant improvements of MOD over the state of the art. Our new benchmark, code and pre-trained models will become publicly available on our project website: this https URL.

顶级标签: computer vision robotics data
详细标签: 3d scene reconstruction object-level reconstruction dataset contact modeling monocular depth estimation 或 搜索:

凌乱厨房:富含接触关系的物体级三维场景重建 / MessyKitchens: Contact-rich object-level 3D scene reconstruction


1️⃣ 一句话总结

这篇论文通过发布一个包含真实杂乱场景和精确物体接触关系的高质量数据集,并开发了一种新的多物体联合重建方法,显著提升了从单张图片进行物理上合理的三维物体级场景重建的能力。

源自 arXiv: 2603.16868