菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-16
📄 Abstract - EditCtrl: Disentangled Local and Global Control for Real-Time Generative Video Editing

High-fidelity generative video editing has seen significant quality improvements by leveraging pre-trained video foundation models. However, their computational cost is a major bottleneck, as they are often designed to inefficiently process the full video context regardless of the inpainting mask's size, even for sparse, localized edits. In this paper, we introduce EditCtrl, an efficient video inpainting control framework that focuses computation only where it is needed. Our approach features a novel local video context module that operates solely on masked tokens, yielding a computational cost proportional to the edit size. This local-first generation is then guided by a lightweight temporal global context embedder that ensures video-wide context consistency with minimal overhead. Not only is EditCtrl 10 times more compute efficient than state-of-the-art generative editing methods, it even improves editing quality compared to methods designed with full-attention. Finally, we showcase how EditCtrl unlocks new capabilities, including multi-region editing with text prompts and autoregressive content propagation.

顶级标签: video generation model training computer vision
详细标签: video inpainting computational efficiency local-global control generative editing real-time editing 或 搜索:

EditCtrl:用于实时生成式视频编辑的解耦局部与全局控制框架 / EditCtrl: Disentangled Local and Global Control for Real-Time Generative Video Editing


1️⃣ 一句话总结

这篇论文提出了一个名为EditCtrl的高效视频编辑框架,它通过将计算资源集中在需要修改的局部区域并辅以轻量级的全局一致性引导,在实现高质量视频编辑效果的同时,将计算效率提升了10倍,并能支持多区域编辑等新功能。

源自 arXiv: 2602.15031