MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

📄 Abstract - MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

Current video generation techniques excel at single-shot clips but struggle to produce narrative multi-shot videos, which require flexible shot arrangement, coherent narrative, and controllability beyond text prompts. To tackle these challenges, we propose MultiShotMaster, a framework for highly controllable multi-shot video generation. We extend a pretrained single-shot model by integrating two novel variants of RoPE. First, we introduce Multi-Shot Narrative RoPE, which applies explicit phase shift at shot transitions, enabling flexible shot arrangement while preserving the temporal narrative order. Second, we design Spatiotemporal Position-Aware RoPE to incorporate reference tokens and grounding signals, enabling spatiotemporal-grounded reference injection. In addition, to overcome data scarcity, we establish an automated data annotation pipeline to extract multi-shot videos, captions, cross-shot grounding signals and reference images. Our framework leverages the intrinsic architectural properties to support multi-shot video generation, featuring text-driven inter-shot consistency, customized subject with motion control, and background-driven customized scene. Both shot count and duration are flexibly configurable. Extensive experiments demonstrate the superior performance and outstanding controllability of our framework.

MultiShotMaster：一个可控的多镜头视频生成框架 / MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

1️⃣ 一句话总结

这篇论文提出了一个名为MultiShotMaster的新框架，它通过改进现有模型和引入创新的位置编码技术，解决了AI生成多镜头叙事视频的难题，实现了对镜头数量、时长、内容以及连贯性的灵活控制。

← 返回列表

菜单

🤖 AI 深度阅读

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

🤖 AI 深度阅读

1️⃣ 一句话总结

获取最新论文摘要