菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-26
📄 Abstract - Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

Video world models have shown immense potential in simulating the physical world, yet existing memory mechanisms primarily treat environments as static canvases. When dynamic subjects hide out of sight and later re-emerge, current methods often struggle, leading to frozen, distorted, or vanishing subjects. To address this, we introduce Hybrid Memory, a novel paradigm requiring models to simultaneously act as precise archivists for static backgrounds and vigilant trackers for dynamic subjects, ensuring motion continuity during out-of-view intervals. To facilitate research in this direction, we construct HM-World, the first large-scale video dataset dedicated to hybrid memory. It features 59K high-fidelity clips with decoupled camera and subject trajectories, encompassing 17 diverse scenes, 49 distinct subjects, and meticulously designed exit-entry events to rigorously evaluate hybrid coherence. Furthermore, we propose HyDRA, a specialized memory architecture that compresses memory into tokens and utilizes a spatiotemporal relevance-driven retrieval mechanism. By selectively attending to relevant motion cues, HyDRA effectively preserves the identity and motion of hidden subjects. Extensive experiments on HM-World demonstrate that our method significantly outperforms state-of-the-art approaches in both dynamic subject consistency and overall generation quality.

顶级标签: video generation model training computer vision
详细标签: world models video synthesis memory architecture dynamic scenes dataset 或 搜索:

视野之外,心念之中:用于动态视频世界模型的混合记忆 / Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models


1️⃣ 一句话总结

本文提出了一种名为‘混合记忆’的新方法,通过构建专用数据集HM-World和设计HyDRA记忆架构,有效解决了现有视频世界模型在动态物体(如人或车)移出视野后再出现时,无法保持其运动连续性和身份一致性的关键问题。

源自 arXiv: 2603.25716