MoRe:一种感知运动的4D重建前馈Transformer / MoRe: Motion-aware Feed-forward 4D Reconstruction Transformer
1️⃣ 一句话总结
这篇论文提出了一种名为MoRe的高效前馈神经网络,它能从单目视频中快速重建出动态3D场景,核心是通过一种注意力机制巧妙地将场景中的动态物体和静态背景分离开来,解决了传统方法因物体移动导致相机定位不准的难题,并且重建速度快、质量高。
Reconstructing dynamic 4D scenes remains challenging due to the presence of moving objects that corrupt camera pose estimation. Existing optimization methods alleviate this issue with additional supervision, but they are mostly computationally expensive and impractical in real-time applications. To address these limitations, we propose MoRe, a feedforward 4D reconstruction network that efficiently recovers dynamic 3D scenes from monocular videos. Built upon a strong static reconstruction backbone, MoRe employs an attention-forcing strategy to disentangle dynamic motion from static structure. To further enhance robustness, we fine-tune the model on large-scale, diverse datasets encompassing both dynamic and static scenes. Moreover, our grouped causal attention captures temporal dependencies and adapts to varying token lengths across frames, ensuring temporally coherent geometry reconstruction. Extensive experiments on multiple benchmarks demonstrate that MoRe achieves high-quality dynamic reconstructions with exceptional efficiency.
MoRe:一种感知运动的4D重建前馈Transformer / MoRe: Motion-aware Feed-forward 4D Reconstruction Transformer
这篇论文提出了一种名为MoRe的高效前馈神经网络,它能从单目视频中快速重建出动态3D场景,核心是通过一种注意力机制巧妙地将场景中的动态物体和静态背景分离开来,解决了传统方法因物体移动导致相机定位不准的难题,并且重建速度快、质量高。
源自 arXiv: 2603.05078