菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-22
📄 Abstract - Rethinking Where to Edit: Task-Aware Localization for Instruction-Based Image Editing

Instruction-based image editing (IIE) aims to modify images according to textual instructions while preserving irrelevant content. Despite recent advances in diffusion transformers, existing methods often suffer from over-editing, introducing unintended changes to regions unrelated to the desired edit. We identify that this limitation arises from the lack of an explicit mechanism for edit localization. In particular, different editing operations (e.g., addition, removal and replacement) induce distinct spatial patterns, yet current IIE models typically treat localization in a task-agnostic manner. To address this limitation, we propose a training-free, task-aware edit localization framework that exploits the intrinsic source and target image streams within IIE models. For each image stream, We first obtain attention-based edit cues, and then construct feature centroids based on these attentive cues to partition tokens into edit and non-edit regions. Based on the observation that optimal localization is inherently task-dependent, we further introduce a unified mask construction strategy that selectively leverages source and target image streams for different editing tasks. We provide a systematic analysis for our proposed insights and approaches. Extensive experiments on EdiVal-Bench demonstrate our framework consistently improves non-edit region consistency while maintaining strong instruction-following performance on top of powerful recent image editing backbones, including Step1X-Edit and Qwen-Image-Edit.

顶级标签: computer vision image editing
详细标签: instruction-based editing edit localization task-aware diffusion transformers attention mechanism 或 搜索:

重新思考编辑位置:面向指令的图像编辑中的任务感知定位 / Rethinking Where to Edit: Task-Aware Localization for Instruction-Based Image Editing


1️⃣ 一句话总结

本文提出了一种无需重新训练的、能根据不同类型编辑任务(如添加、删除或替换)自动确定图像中哪些区域需要修改的方法,有效避免了以往方法误改无关区域的问题,从而在保持编辑效果的同时,更好地保留图像中不需要修改的部分。

源自 arXiv: 2604.20258