WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling

📄 Abstract - WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling

This paper presents WorldPlay, a streaming video diffusion model that enables real-time, interactive world modeling with long-term geometric consistency, resolving the trade-off between speed and memory that limits current methods. WorldPlay draws power from three key innovations. 1) We use a Dual Action Representation to enable robust action control in response to the user's keyboard and mouse inputs. 2) To enforce long-term consistency, our Reconstituted Context Memory dynamically rebuilds context from past frames and uses temporal reframing to keep geometrically important but long-past frames accessible, effectively alleviating memory attenuation. 3) We also propose Context Forcing, a novel distillation method designed for memory-aware model. Aligning memory context between the teacher and student preserves the student's capacity to use long-range information, enabling real-time speeds while preventing error drift. Taken together, WorldPlay generates long-horizon streaming 720p video at 24 FPS with superior consistency, comparing favorably with existing techniques and showing strong generalization across diverse scenes. Project page and online demo can be found: this https URL and this https URL.

WorldPlay：面向实时交互式世界建模的长期几何一致性研究 / WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling

1️⃣ 一句话总结

这篇论文提出了一个名为WorldPlay的实时视频生成模型，它通过创新的双动作控制、动态重建记忆和上下文强制蒸馏技术，解决了现有方法在速度与长期几何一致性之间的权衡问题，能够以24帧/秒的速度生成连贯且高质量的长视频。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要