OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory

📄 Abstract - OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory

Storytelling in real-world videos often unfolds through multiple shots -- discontinuous yet semantically connected clips that together convey a coherent narrative. However, existing multi-shot video generation (MSV) methods struggle to effectively model long-range cross-shot context, as they rely on limited temporal windows or single keyframe conditioning, leading to degraded performance under complex narratives. In this work, we propose OneStory, enabling global yet compact cross-shot context modeling for consistent and scalable narrative generation. OneStory reformulates MSV as a next-shot generation task, enabling autoregressive shot synthesis while leveraging pretrained image-to-video (I2V) models for strong visual conditioning. We introduce two key modules: a Frame Selection module that constructs a semantically-relevant global memory based on informative frames from prior shots, and an Adaptive Conditioner that performs importance-guided patchification to generate compact context for direct conditioning. We further curate a high-quality multi-shot dataset with referential captions to mirror real-world storytelling patterns, and design effective training strategies under the next-shot paradigm. Finetuned from a pretrained I2V model on our curated 60K dataset, OneStory achieves state-of-the-art narrative coherence across diverse and complex scenes in both text- and image-conditioned settings, enabling controllable and immersive long-form video storytelling.

OneStory：一种具有自适应记忆能力的连贯多镜头视频生成方法 / OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory

1️⃣ 一句话总结

这篇论文提出了一种名为OneStory的新方法，它通过构建一个自适应的全局记忆模块来捕捉和整合不同镜头间的语义关联，从而能够生成情节连贯、可控的长篇多镜头叙事视频，解决了现有方法在复杂叙事下连贯性不足的问题。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要