菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-04
📄 Abstract - VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory

Autoregressive (AR) diffusion enables streaming, interactive long-video generation by producing frames causally, yet maintaining coherence over minute-scale horizons remains challenging due to accumulated errors, motion drift, and content repetition. We approach this problem from a memory perspective, treating video synthesis as a recurrent dynamical process that requires coordinated short- and long-term context. We propose VideoSSM, a Long Video Model that unifies AR diffusion with a hybrid state-space memory. The state-space model (SSM) serves as an evolving global memory of scene dynamics across the entire sequence, while a context window provides local memory for motion cues and fine details. This hybrid design preserves global consistency without frozen, repetitive patterns, supports prompt-adaptive interaction, and scales in linear time with sequence length. Experiments on short- and long-range benchmarks demonstrate state-of-the-art temporal consistency and motion stability among autoregressive video generator especially at minute-scale horizons, enabling content diversity and interactive prompt-based control, thereby establishing a scalable, memory-aware framework for long video generation.

顶级标签: video generation aigc model training
详细标签: autoregressive diffusion state-space model long video generation temporal consistency memory architecture 或 搜索:

VideoSSM:基于混合状态空间记忆的自回归长视频生成 / VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory


1️⃣ 一句话总结

这篇论文提出了VideoSSM模型,它通过结合自回归扩散和一种混合状态空间记忆机制,有效解决了生成长视频时画面不连贯、动作漂移和内容重复的问题,从而能够稳定地生成长达数分钟且内容多样的高质量视频。


源自 arXiv: 2512.04519