菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-03
📄 Abstract - NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing

Recent video editing models have achieved impressive results, but most still require large-scale paired datasets. Collecting such naturally aligned pairs at scale remains highly challenging and constitutes a critical bottleneck, especially for local video editing data. Existing workarounds transfer image editing to video through global motion control for pair-free video editing, but such designs struggle with background and temporal consistency. In this paper, we propose NOVA: Sparse Control \& Dense Synthesis, a new framework for unpaired video editing. Specifically, the sparse branch provides semantic guidance through user-edited keyframes distributed across the video, and the dense branch continuously incorporates motion and texture information from the original video to maintain high fidelity and coherence. Moreover, we introduce a degradation-simulation training strategy that enables the model to learn motion reconstruction and temporal consistency by training on artificially degraded videos, thus eliminating the need for paired data. Our extensive experiments demonstrate that NOVA outperforms existing approaches in edit fidelity, motion preservation, and temporal coherence.

顶级标签: video generation aigc computer vision
详细标签: video editing motion consistency unpaired training temporal coherence sparse control 或 搜索:

NOVA:面向无配对视频编辑的稀疏控制与密集合成框架 / NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing


1️⃣ 一句话总结

这篇论文提出了一个名为NOVA的新框架,它通过让用户只编辑视频中的少量关键帧来提供语义指导,同时利用原始视频的运动和纹理信息进行密集合成,从而在无需大量配对训练数据的情况下,实现了高质量、时序连贯的视频编辑。

源自 arXiv: 2603.02802