菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-07
📄 Abstract - Human Interaction-Aware 3D Reconstruction from a Single Image

Reconstructing textured 3D human models from a single image is fundamental for AR/VR and digital human applications. However, existing methods mostly focus on single individuals and thus fail in multi-human scenes, where naive composition of individual reconstructions often leads to artifacts such as unrealistic overlaps, missing geometry in occluded regions, and distorted interactions. These limitations highlight the need for approaches that incorporate group-level context and interaction priors. We introduce a holistic method that explicitly models both group- and instance-level information. To mitigate perspective-induced geometric distortions, we first transform the input into a canonical orthographic space. Our primary component, Human Group-Instance Multi-View Diffusion (HUG-MVD), then generates complete multi-view normals and images by jointly modeling individuals and group context to resolve occlusions and proximity. Subsequently, the Human Group-Instance Geometric Reconstruction (HUG-GR) module optimizes the geometry by leveraging explicit, physics-based interaction priors to enforce physical plausibility and accurately model inter-human contact. Finally, the multi-view images are fused into a high-fidelity texture. Together, these components form our complete framework, HUG3D. Extensive experiments show that HUG3D significantly outperforms both single-human and existing multi-human methods, producing physically plausible, high-fidelity 3D reconstructions of interacting people from a single image. Project page: this https URL

顶级标签: computer vision multi-modal model training
详细标签: 3d reconstruction human interaction multi-view diffusion physics-based priors textured models 或 搜索:

基于单张图像的人类交互感知三维重建 / Human Interaction-Aware 3D Reconstruction from a Single Image


1️⃣ 一句话总结

这篇论文提出了一个名为HUG3D的全新方法,能够从一张普通的照片中,高质量地重建出多个正在互动的人物的三维模型,有效解决了以往方法在处理多人场景时产生的模型重叠、遮挡部位缺失和互动姿态扭曲等问题。

源自 arXiv: 2604.05436