Yume-1.5:一种文本控制的交互式世界生成模型 / Yume-1.5: A Text-Controlled Interactive World Generation Model
1️⃣ 一句话总结
这篇论文提出了一个名为Yume-1.5的新模型,它能够根据一张图片或一段文字描述,快速生成一个逼真、可交互且连续扩展的虚拟世界,并支持用户用键盘在其中进行实时探索。
Recent approaches have demonstrated the promise of using diffusion models to generate interactive and explorable worlds. However, most of these methods face critical challenges such as excessively large parameter sizes, reliance on lengthy inference steps, and rapidly growing historical context, which severely limit real-time performance and lack text-controlled generation capabilities. To address these challenges, we propose \method, a novel framework designed to generate realistic, interactive, and continuous worlds from a single image or text prompt. \method achieves this through a carefully designed framework that supports keyboard-based exploration of the generated worlds. The framework comprises three core components: (1) a long-video generation framework integrating unified context compression with linear attention; (2) a real-time streaming acceleration strategy powered by bidirectional attention distillation and an enhanced text embedding scheme; (3) a text-controlled method for generating world events. We have provided the codebase in the supplementary material.
Yume-1.5:一种文本控制的交互式世界生成模型 / Yume-1.5: A Text-Controlled Interactive World Generation Model
这篇论文提出了一个名为Yume-1.5的新模型,它能够根据一张图片或一段文字描述,快速生成一个逼真、可交互且连续扩展的虚拟世界,并支持用户用键盘在其中进行实时探索。
源自 arXiv: 2512.22096