Efficiently Reconstructing Dynamic Scenes One D4RT at a Time

📄 Abstract - Efficiently Reconstructing Dynamic Scenes One D4RT at a Time

Understanding and reconstructing the complex geometry and motion of dynamic scenes from video remains a formidable challenge in computer vision. This paper introduces D4RT, a simple yet powerful feedforward model designed to efficiently solve this task. D4RT utilizes a unified transformer architecture to jointly infer depth, spatio-temporal correspondence, and full camera parameters from a single video. Its core innovation is a novel querying mechanism that sidesteps the heavy computation of dense, per-frame decoding and the complexity of managing multiple, task-specific decoders. Our decoding interface allows the model to independently and flexibly probe the 3D position of any point in space and time. The result is a lightweight and highly scalable method that enables remarkably efficient training and inference. We demonstrate that our approach sets a new state of the art, outperforming previous methods across a wide spectrum of 4D reconstruction tasks. We refer to the project webpage for animated results: this https URL.

高效动态场景重建：一次一个D4RT / Efficiently Reconstructing Dynamic Scenes One D4RT at a Time

1️⃣ 一句话总结

这篇论文提出了一种名为D4RT的新型前馈模型，它利用统一的Transformer架构，仅从一个视频就能高效地同时推断出深度、时空对应关系和完整的相机参数，从而在动态场景的4D重建任务上取得了领先的性能。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要