菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-01
📄 Abstract - TROPHIES: Temporal Reconstruction of Places, Humans, and Cameras from Multi-view Videos

Reconstructing humans and their surrounding environments in a globally consistent 4D space is essential for comprehensive perception. However, prior works typically assume single-view inputs or decouple humans, scenes, and cameras, making them unable to recover coherent geometry, stable motion, and physically aligned trajectories. These limitations motivate us to introduce a new task: unified human-scene-camera reconstruction from multi-view videos, which aims to jointly estimate dynamic humans, static scenes, and camera poses in one global coordinate frame. We propose TROPHIES--Temporal Reconstruction of Places, Humans, and Cameras from Multi-view Videos-a unified framework tailored for this task. TROPHIES features a Human Branch that models humans through temporal and spatial reasoning, and a Scene Branch that reconstructs static geometry with human-aware attention. A global alignment and optimization module couples both branches by enforcing scale consistency, contact priors, and cross-view temporal coherence. Experiments on EgoHuman and EgoExo4D demonstrate that TROPHIES achieves globally aligned, physically plausible 4D reconstructions and consistently outperforms existing paradigms in both global fidelity and human-scene consistency.

顶级标签: computer vision multi-modal model training
详细标签: 4d reconstruction human-scene-camera multi-view video temporal coherence spatial reasoning 或 搜索:

TROPHIES:从多视角视频中实现人物、场景与相机的时间维度重建 / TROPHIES: Temporal Reconstruction of Places, Humans, and Cameras from Multi-view Videos


1️⃣ 一句话总结

本文提出了一个名为TROPHIES的统一框架,能够从多视角视频中同时重建动态人体、静态场景和相机运动,解决了以往方法中人体、场景和相机重建不协调的问题,最终生成在全局坐标系中物理一致且时间连贯的4D(三维空间加时间)模型。

源自 arXiv: 2606.02350