📄
Abstract - DiffProxy: Multi-View Human Mesh Recovery via Diffusion-Generated Dense Proxies
Human mesh recovery from multi-view images faces a fundamental challenge: real-world datasets contain imperfect ground-truth annotations that bias the models' training, while synthetic data with precise supervision suffers from domain gap. In this paper, we propose DiffProxy, a novel framework that generates multi-view consistent human proxies for mesh recovery. Central to DiffProxy is leveraging the diffusion-based generative priors to bridge the synthetic training and real-world generalization. Its key innovations include: (1) a multi-conditional mechanism for generating multi-view consistent, pixel-aligned human proxies; (2) a hand refinement module that incorporates flexible visual prompts to enhance local details; and (3) an uncertainty-aware test-time scaling method that increases robustness to challenging cases during optimization. These designs ensure that the mesh recovery process effectively benefits from the precise synthetic ground truth and generative advantages of the diffusion-based pipeline. Trained entirely on synthetic data, DiffProxy achieves state-of-the-art performance across five real-world benchmarks, demonstrating strong zero-shot generalization particularly on challenging scenarios with occlusions and partial views. Project page: this https URL
DiffProxy:通过扩散生成的密集代理实现多视角人体网格恢复 /
DiffProxy: Multi-View Human Mesh Recovery via Diffusion-Generated Dense Proxies
1️⃣ 一句话总结
这篇论文提出了一个名为DiffProxy的新框架,它利用扩散模型的生成能力来创建多视角一致的人体代理,从而有效结合合成数据的精确标注优势和真实数据的泛化需求,在仅使用合成数据训练的情况下,显著提升了在真实复杂场景(如遮挡、局部视角)下人体三维重建的准确性和鲁棒性。