TROPHIES: Temporal Reconstruction of Places, Humans, and Cameras from Multi-view Videos

📄 Abstract - TROPHIES: Temporal Reconstruction of Places, Humans, and Cameras from Multi-view Videos

Reconstructing humans and their surrounding environments in a globally consistent 4D space is essential for comprehensive perception. However, prior works typically assume single-view inputs or decouple humans, scenes, and cameras, making them unable to recover coherent geometry, stable motion, and physically aligned trajectories. These limitations motivate us to introduce a new task: unified human-scene-camera reconstruction from multi-view videos, which aims to jointly estimate dynamic humans, static scenes, and camera poses in one global coordinate frame. We propose TROPHIES--Temporal Reconstruction of Places, Humans, and Cameras from Multi-view Videos-a unified framework tailored for this task. TROPHIES features a Human Branch that models humans through temporal and spatial reasoning, and a Scene Branch that reconstructs static geometry with human-aware attention. A global alignment and optimization module couples both branches by enforcing scale consistency, contact priors, and cross-view temporal coherence. Experiments on EgoHuman and EgoExo4D demonstrate that TROPHIES achieves globally aligned, physically plausible 4D reconstructions and consistently outperforms existing paradigms in both global fidelity and human-scene consistency.

TROPHIES：从多视角视频中实现人物、场景与相机的时间维度重建 / TROPHIES: Temporal Reconstruction of Places, Humans, and Cameras from Multi-view Videos

1️⃣ 一句话总结

本文提出了一个名为TROPHIES的统一框架，能够从多视角视频中同时重建动态人体、静态场景和相机运动，解决了以往方法中人体、场景和相机重建不协调的问题，最终生成在全局坐标系中物理一致且时间连贯的4D（三维空间加时间）模型。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要