Demo-Pose: Depth-Monocular Modality Fusion For Object Pose Estimation

📄 Abstract - Demo-Pose: Depth-Monocular Modality Fusion For Object Pose Estimation

Object pose estimation is a fundamental task in 3D vision with applications in robotics, AR/VR, and scene understanding. We address the challenge of category-level 9-DoF pose estimation (6D pose + 3Dsize) from RGB-D input, without relying on CAD models during inference. Existing depth-only methods achieve strong results but ignore semantic cues from RGB, while many RGB-D fusion models underperform due to suboptimal cross-modal fusion that fails to align semantic RGB cues with 3D geometric representations. We propose DeMo-Pose, a hybrid architecture that fuses monocular semantic features with depth-based graph convolutional representations via a novel multimodal fusion strategy. To further improve geometric reasoning, we introduce a novel Mesh-Point Loss (MPL) that leverages mesh structure during training without adding inference overhead. Our approach achieves real-time inference and significantly improves over state-of-the-art methods across object categories, outperforming the strong GPV-Pose baseline by 3.2\% on 3D IoU and 11.1\% on pose accuracy on the REAL275 benchmark. The results highlight the effectiveness of depth-RGB fusion and geometry-aware learning, enabling robust category-level 3D pose estimation for real-world applications.

Demo-Pose：用于物体姿态估计的深度-单目模态融合方法 / Demo-Pose: Depth-Monocular Modality Fusion For Object Pose Estimation

1️⃣ 一句话总结

这篇论文提出了一种名为DeMo-Pose的新方法，它通过巧妙融合单目相机的语义信息和深度相机的几何信息，实现了无需CAD模型、实时且更准确的物体三维姿态估计，在标准测试集上性能显著超越了现有最好方法。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要