菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-19
📄 Abstract - Generalized Hand-Object Pose Estimation with Occlusion Awareness

Generalized 3D hand-object pose estimation from a single RGB image remains challenging due to the large variations in object appearances and interaction patterns, especially under heavy occlusion. We propose GenHOI, a framework for generalized hand-object pose estimation with occlusion awareness. GenHOI integrates hierarchical semantic knowledge with hand priors to enhance model generalization under challenging occlusion conditions. Specifically, we introduce a hierarchical semantic prompt that encodes object states, hand configurations, and interaction patterns via textual descriptions. This enables the model to learn abstract high-level representations of hand-object interactions for generalization to unseen objects and novel interactions while compensating for missing or ambiguous visual cues. To enable robust occlusion reasoning, we adopt a multi-modal masked modeling strategy over RGB images, predicted point clouds, and textual descriptions. Moreover, we leverage hand priors as stable spatial references to extract implicit interaction constraints. This allows reliable pose inference even under significant variations in object shapes and interaction patterns. Extensive experiments on the challenging DexYCB and HO3Dv2 benchmarks demonstrate that our method achieves state-of-the-art performance in hand-object pose estimation.

顶级标签: computer vision multi-modal model training
详细标签: pose estimation hand-object interaction occlusion reasoning semantic prompting 3d reconstruction 或 搜索:

具有遮挡感知能力的广义手-物体姿态估计 / Generalized Hand-Object Pose Estimation with Occlusion Awareness


1️⃣ 一句话总结

这篇论文提出了一个名为GenHOI的新框架,它通过结合文本描述的语义提示和手部先验知识,能够从单张RGB图像中更准确地估计出被遮挡的手和物体的三维姿态,即使面对从未见过的物体或新的交互方式也表现良好。

源自 arXiv: 2603.19013