菜单

🤖 系统
📄 Abstract - MagicWorld: Interactive Geometry-driven Video World Exploration

Recent interactive video world model methods generate scene evolution conditioned on user instructions. Although they achieve impressive results, two key limitations remain. First, they fail to fully exploit the correspondence between instruction-driven scene motion and the underlying 3D geometry, which results in structural instability under viewpoint changes. Second, they easily forget historical information during multi-step interaction, resulting in error accumulation and progressive drift in scene semantics and structure. To address these issues, we propose MagicWorld, an interactive video world model that integrates 3D geometric priors and historical retrieval. MagicWorld starts from a single scene image, employs user actions to drive dynamic scene evolution, and autoregressively synthesizes continuous scenes. We introduce the Action-Guided 3D Geometry Module (AG3D), which constructs a point cloud from the first frame of each interaction and the corresponding action, providing explicit geometric constraints for viewpoint transitions and thereby improving structural consistency. We further propose History Cache Retrieval (HCR) mechanism, which retrieves relevant historical frames during generation and injects them as conditioning signals, helping the model utilize past scene information and mitigate error accumulation. Experimental results demonstrate that MagicWorld achieves notable improvements in scene stability and continuity across interaction iterations.

顶级标签: computer vision video generation multi-modal
详细标签: interactive video 3d geometry scene evolution autoregressive synthesis error accumulation 或 搜索:

📄 论文总结

MagicWorld:基于几何驱动的交互式视频世界探索 / MagicWorld: Interactive Geometry-driven Video World Exploration


1️⃣ 一句话总结

这篇论文提出了MagicWorld模型,通过引入3D几何约束和历史检索机制,解决了现有交互式视频生成方法在视角变化下结构不稳定和多次交互中容易遗忘历史信息的问题,显著提升了生成场景的稳定性和连续性。


📄 打开原文 PDF