菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-26
📄 Abstract - ProEdit: Inversion-based Editing From Prompts Done Right

Inversion-based visual editing provides an effective and training-free way to edit an image or a video based on user instructions. Existing methods typically inject source image information during the sampling process to maintain editing consistency. However, this sampling strategy overly relies on source information, which negatively affects the edits in the target image (e.g., failing to change the subject's atributes like pose, number, or color as instructed). In this work, we propose ProEdit to address this issue both in the attention and the latent aspects. In the attention aspect, we introduce KV-mix, which mixes KV features of the source and the target in the edited region, mitigating the influence of the source image on the editing region while maintaining background consistency. In the latent aspect, we propose Latents-Shift, which perturbs the edited region of the source latent, eliminating the influence of the inverted latent on the sampling. Extensive experiments on several image and video editing benchmarks demonstrate that our method achieves SOTA performance. In addition, our design is plug-and-play, which can be seamlessly integrated into existing inversion and editing methods, such as RF-Solver, FireFlow and UniEdit.

顶级标签: computer vision model training aigc
详细标签: image editing video editing inversion-based editing attention mechanism latent space manipulation 或 搜索:

ProEdit:基于反转的提示编辑的正确实现 / ProEdit: Inversion-based Editing From Prompts Done Right


1️⃣ 一句话总结

这篇论文提出了一种名为ProEdit的新方法,它通过改进图像和视频编辑过程中的注意力机制和潜在特征处理,解决了现有AI编辑工具在根据文字指令修改图片时,常常无法彻底改变物体属性(如姿态、数量或颜色)的问题,从而实现了更准确、更灵活的编辑效果。

源自 arXiv: 2512.22118