EgoPoseFormer v2: Accurate Egocentric Human Motion Estimation for AR/VR

📄 Abstract - EgoPoseFormer v2: Accurate Egocentric Human Motion Estimation for AR/VR

Egocentric human motion estimation is essential for AR/VR experiences, yet remains challenging due to limited body coverage from the egocentric viewpoint, frequent occlusions, and scarce labeled data. We present EgoPoseFormer v2, a method that addresses these challenges through two key contributions: (1) a transformer-based model for temporally consistent and spatially grounded body pose estimation, and (2) an auto-labeling system that enables the use of large unlabeled datasets for training. Our model is fully differentiable, introduces identity-conditioned queries, multi-view spatial refinement, causal temporal attention, and supports both keypoints and parametric body representations under a constant compute budget. The auto-labeling system scales learning to tens of millions of unlabeled frames via uncertainty-aware semi-supervised training. The system follows a teacher-student schema to generate pseudo-labels and guide training with uncertainty distillation, enabling the model to generalize to different environments. On the EgoBody3M benchmark, with a 0.8 ms latency on GPU, our model outperforms two state-of-the-art methods by 12.2% and 19.4% in accuracy, and reduces temporal jitter by 22.2% and 51.7%. Furthermore, our auto-labeling system further improves the wrist MPJPE by 13.1%.

EgoPoseFormer v2：面向AR/VR的精确第一人称人体运动估计 / EgoPoseFormer v2: Accurate Egocentric Human Motion Estimation for AR/VR

1️⃣ 一句话总结

这篇论文提出了一个名为EgoPoseFormer v2的新方法，它通过一个基于Transformer的模型和一个自动标注系统，显著提升了AR/VR应用中第一人称视角下人体姿态估计的准确性和流畅度。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要