菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-29
📄 Abstract - SnapPose3D: Diffusion-Based Single-Frame 2D-to-3D Lifting of Human Poses

Depth ambiguity and joint uncertainty are the two main obstacles in obtaining accurate human pose predictions by 2D-to-3D lifting methods proposed in the literature. In particular, these issues are caused by 2D joint locations that can be mapped to multiple 3D positions, inducing multiple possible final poses. Following these considerations, we propose leveraging diffusion-based models generation capability to predict multiple hypotheses and aggregate them in a final accurate pose. Therefore, we introduce SnapPose3D, a pose-lifting framework trained deterministically to denoise 3D poses conditioned on both visual context and 2D pose features. SnapPose3D adopts a probabilistic approach during inference, generating multiple hypotheses through random sampling from a unit Gaussian distribution. Unlike most previous methods that address pose ambiguity by processing temporal sequences, SnapPose3D uses single frames as input, avoiding tracking and limiting computational cost, data acquisition complexity, and the need for online, real-time applications. We extensively evaluate SnapPose3D on well-known benchmarks for the 3D human pose estimation task showing its ability to generate and aggregate accurate hypotheses that lead to state-of-the-art results.

顶级标签: computer vision machine learning
详细标签: 3d human pose estimation diffusion models 2d-to-3d lifting single-frame pose ambiguity 或 搜索:

SnapPose3D:基于扩散模型的单帧二维到三维人体姿态提升方法 / SnapPose3D: Diffusion-Based Single-Frame 2D-to-3D Lifting of Human Poses


1️⃣ 一句话总结

本文提出了一种名为SnapPose3D的新方法,利用扩散模型从单张图片的二维人体姿态出发,生成多个合理的三维姿态假设并整合为最终精确结果,有效解决了传统方法中因深度模糊和关节不确定性导致的姿态预测不准问题,且无需依赖视频序列,降低了计算和采集成本。

源自 arXiv: 2604.26620