AnyLift: Scaling Motion Reconstruction from Internet Videos via 2D Diffusion

📄 Abstract - AnyLift: Scaling Motion Reconstruction from Internet Videos via 2D Diffusion

Reconstructing 3D human motion and human-object interactions (HOI) from Internet videos is a fundamental step toward building large-scale datasets of human behavior. Existing methods struggle to recover globally consistent 3D motion under dynamic cameras, especially for motion types underrepresented in current motion-capture datasets, and face additional difficulty recovering coherent human-object interactions in 3D. We introduce a two-stage framework leveraging 2D diffusion that reconstructs 3D human motion and HOI from Internet videos. In the first stage, we synthesize multi-view 2D motion data for each domain, leveraging 2D keypoints extracted from Internet videos to incorporate human motions that rarely appear in existing MoCap datasets. In the second stage, a camera-conditioned multi-view 2D motion diffusion model is trained on the domain-specific synthetic data to recover 3D human motion and 3D HOI in the world space. We demonstrate the effectiveness of our method on Internet videos featuring challenging motions such as gymnastics, as well as in-the-wild HOI videos, and show that it outperforms prior work in producing realistic human motion and human-object interaction.

AnyLift：通过二维扩散模型从互联网视频中扩展运动重建 / AnyLift: Scaling Motion Reconstruction from Internet Videos via 2D Diffusion

1️⃣ 一句话总结

本文提出一个两阶段框架，利用二维扩散模型从互联网视频中重建三维人体运动和人与物体交互，特别擅长处理传统动作捕捉数据中缺乏的复杂运动（如体操）和自然场景中的交互行为，显著提升了重建结果的真实性和全局一致性。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要