菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-10
📄 Abstract - When to Lock Attention: Training-Free KV Control in Video Diffusion

Maintaining background consistency while enhancing foreground quality remains a core challenge in video editing. Injecting full-image information often leads to background artifacts, whereas rigid background locking severely constrains the model's capacity for foreground generation. To address this issue, we propose KV-Lock, a training-free framework tailored for DiT-based video diffusion models. Our core insight is that the hallucination metric (variance of denoising prediction) directly quantifies generation diversity, which is inherently linked to the classifier-free guidance (CFG) scale. Building upon this, KV-Lock leverages diffusion hallucination detection to dynamically schedule two key components: the fusion ratio between cached background key-values (KVs) and newly generated KVs, and the CFG scale. When hallucination risk is detected, KV-Lock strengthens background KV locking and simultaneously amplifies conditional guidance for foreground generation, thereby mitigating artifacts and improving generation fidelity. As a training-free, plug-and-play module, KV-Lock can be easily integrated into any pre-trained DiT-based models. Extensive experiments validate that our method outperforms existing approaches in improved foreground quality with high background fidelity across various video editing tasks.

顶级标签: video generation model training computer vision
详细标签: video diffusion attention control kv caching background consistency training-free 或 搜索:

何时锁定注意力:视频扩散模型中的免训练KV控制 / When to Lock Attention: Training-Free KV Control in Video Diffusion


1️⃣ 一句话总结

这篇论文提出了一种名为KV-Lock的免训练方法,它能智能地判断何时需要锁定视频背景、何时需要增强前景生成,从而在视频编辑中同时实现高质量的前景和稳定的背景。

源自 arXiv: 2603.09657