Every9D-21M:大规模真实世界日常物体的9D规范化数据集 / Every9D-21M: Large-Scale Real-World 9D Canonicalization of Everyday Objects
1️⃣ 一句话总结
该研究构建了一个包含2180万张真实世界图像、涵盖700种日常物体类别的大规模9D姿态数据集,通过多视角点云重建和跨实例对齐技术,仅人工标注了极少量参考物体,实现了比此前最大数据集规模高出两个数量级的突破,显著提升了物体姿态估计模型的泛化能力。
Estimating the 9D pose of everyday objects from a single real-world image remains challenging. This is largely due to the lack of large-scale supervision. Most existing datasets either rely heavily on synthetic renderings or provide limited coverage of real-world objects: the largest real-world 9D pose dataset to date contains only 17K annotated objects across 9 categories. We address this gap with Every9D-21M, a dataset of 9D pose annotations for 21.8M real-world images from 109K object- centric videos spanning 700 everyday object categories - two orders of magnitude larger than prior real-world 9D pose benchmarks in both image and category count. To achieve this scale, we leverage object-centric videos by reconstructing object- level point clouds via multi-view geometry and aligning similar instances into a shared canonical coordinate frame. Canonical poses are manually annotated for only a small set of reference objects (fewer than 0.01% of all images) and propagated to the remaining instances via cross-instance alignment. All propagated canonical poses are then verified from multiple viewpoints. We further introduce cross-category orientation rules that induce category-level symmetries, enabling symmetry-aware evaluation. Beyond establishing dedicated training and evaluation splits as a benchmark for 9D pose foundation models, we show that training on Every9D-21M improves performance on ImageNet3D and PASCAL3D+, and generalizes to HANDAL substantially better than training on ImageNet3D. Data and code are available at this https URL.
Every9D-21M:大规模真实世界日常物体的9D规范化数据集 / Every9D-21M: Large-Scale Real-World 9D Canonicalization of Everyday Objects
该研究构建了一个包含2180万张真实世界图像、涵盖700种日常物体类别的大规模9D姿态数据集,通过多视角点云重建和跨实例对齐技术,仅人工标注了极少量参考物体,实现了比此前最大数据集规模高出两个数量级的突破,显著提升了物体姿态估计模型的泛化能力。
源自 arXiv: 2605.28270