菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-24
📄 Abstract - DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation

The "one-shot" technique represents a distinct and sophisticated aesthetic in filmmaking. However, its practical realization is often hindered by prohibitive costs and complex real-world constraints. Although emerging video generation models offer a virtual alternative, existing approaches typically rely on naive clip concatenation, which frequently fails to maintain visual smoothness and temporal coherence. In this paper, we introduce DreaMontage, a comprehensive framework designed for arbitrary frame-guided generation, capable of synthesizing seamless, expressive, and long-duration one-shot videos from diverse user-provided inputs. To achieve this, we address the challenge through three primary dimensions. (i) We integrate a lightweight intermediate-conditioning mechanism into the DiT architecture. By employing an Adaptive Tuning strategy that effectively leverages base training data, we unlock robust arbitrary-frame control capabilities. (ii) To enhance visual fidelity and cinematic expressiveness, we curate a high-quality dataset and implement a Visual Expression SFT stage. In addressing critical issues such as subject motion rationality and transition smoothness, we apply a Tailored DPO scheme, which significantly improves the success rate and usability of the generated content. (iii) To facilitate the production of extended sequences, we design a Segment-wise Auto-Regressive (SAR) inference strategy that operates in a memory-efficient manner. Extensive experiments demonstrate that our approach achieves visually striking and seamlessly coherent one-shot effects while maintaining computational efficiency, empowering users to transform fragmented visual materials into vivid, cohesive one-shot cinematic experiences.

顶级标签: video generation aigc model training
详细标签: one-shot video generation frame-guided generation diffusion transformer long video synthesis preference optimization 或 搜索:

DreaMontage:基于任意帧引导的单镜头视频生成框架 / DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation


1️⃣ 一句话总结

本文提出了DreaMontage框架,它能够根据用户提供的任意关键帧或视频片段,生成无缝、连贯、高质量的长视频,解决了现有方法在视觉平滑性、时间连贯性和计算效率方面的挑战。


2️⃣ 论文创新点

1. 轻量级中间条件机制与自适应调优

2. 渐进式训练流程与视觉表达优化

3. 分段自回归推理策略

4. 针对特定伪影的定制化直接偏好优化


3️⃣ 主要结果与价值

结果亮点

实际价值


4️⃣ 术语表

源自 arXiv: 2512.21252