Omni IIE Bench:评估图像编辑模型的实际能力基准 / Omni IIE Bench: Benchmarking the Practical Capabilities of Image Editing Models
1️⃣ 一句话总结
这篇论文提出了一个名为Omni IIE Bench的新基准测试,专门用于诊断指令式图像编辑模型在不同语义复杂度任务中的表现一致性,发现几乎所有主流模型在处理高语义复杂度任务时性能都会显著下降。
While Instruction-based Image Editing (IIE) has achieved significant progress, existing benchmarks pursue task breadth via mixed evaluations. This paradigm obscures a critical failure mode crucial in professional applications: the inconsistent performance of models across tasks of varying semantic scales. To address this gap, we introduce Omni IIE Bench, a high-quality, human-annotated benchmark specifically designed to diagnose the editing consistency of IIE models in practical application scenarios. Omni IIE Bench features an innovative dual-track diagnostic design: (1) Single-turn Consistency, comprising shared-context task pairs of attribute modification and entity replacement; and (2) Multi-turn Coordination, involving continuous dialogue tasks that traverse semantic scales. The benchmark is constructed via an exceptionally rigorous multi-stage human filtering process, incorporating a quality standard enforced by computer vision graduate students and an industry relevance review conducted by professional designers. We perform a comprehensive evaluation of 8 mainstream IIE models using Omni IIE Bench. Our analysis quantifies, for the first time, a prevalent performance gap: nearly all models exhibit a significant performance degradation when transitioning from low-semantic-scale to high-semantic-scale tasks. Omni IIE Bench provides critical diagnostic tools and insights for the development of next-generation, more reliable, and stable IIE models.
Omni IIE Bench:评估图像编辑模型的实际能力基准 / Omni IIE Bench: Benchmarking the Practical Capabilities of Image Editing Models
这篇论文提出了一个名为Omni IIE Bench的新基准测试,专门用于诊断指令式图像编辑模型在不同语义复杂度任务中的表现一致性,发现几乎所有主流模型在处理高语义复杂度任务时性能都会显著下降。
源自 arXiv: 2603.16944