FineEdit:基于边界框引导的细粒度图像编辑 / FineEdit: Fine-Grained Image Edit with Bounding Box Guidance
1️⃣ 一句话总结
这篇论文提出了一种名为FineEdit的新方法,通过让用户在图片上画框来精确指定要修改的区域,从而在智能修图时既能准确改变目标物体,又能完美保持图片背景不变。
Diffusion-based image editing models have achieved significant progress in real world applications. However, conventional models typically rely on natural language prompts, which often lack the precision required to localize target objects. Consequently, these models struggle to maintain background consistency due to their global image regeneration paradigm. Recognizing that visual cues provide an intuitive means for users to highlight specific areas of interest, we utilize bounding boxes as guidance to explicitly define the editing target. This approach ensures that the diffusion model can accurately localize the target while preserving background consistency. To achieve this, we propose FineEdit, a multi-level bounding box injection method that enables the model to utilize spatial conditions more effectively. To support this high precision guidance, we present FineEdit-1.2M, a large scale, fine-grained dataset comprising 1.2 million image editing pairs with precise bounding box annotations. Furthermore, we construct a comprehensive benchmark, termed FineEdit-Bench, which includes 1,000 images across 10 subjects to effectively evaluate region based editing capabilities. Evaluations on FineEdit-Bench demonstrate that our model significantly outperforms state-of-the-art open-source models (e.g., Qwen-Image-Edit and LongCat-Image-Edit) in instruction compliance and background preservation. Further assessments on open benchmarks (GEdit and ImgEdit Bench) confirm its superior generalization and robustness.
FineEdit:基于边界框引导的细粒度图像编辑 / FineEdit: Fine-Grained Image Edit with Bounding Box Guidance
这篇论文提出了一种名为FineEdit的新方法,通过让用户在图片上画框来精确指定要修改的区域,从而在智能修图时既能准确改变目标物体,又能完美保持图片背景不变。
源自 arXiv: 2604.10954