菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-28
📄 Abstract - M-ErasureBench: A Comprehensive Multimodal Evaluation Benchmark for Concept Erasure in Diffusion Models

Text-to-image diffusion models may generate harmful or copyrighted content, motivating research on concept erasure. However, existing approaches primarily focus on erasing concepts from text prompts, overlooking other input modalities that are increasingly critical in real-world applications such as image editing and personalized generation. These modalities can become attack surfaces, where erased concepts re-emerge despite defenses. To bridge this gap, we introduce M-ErasureBench, a novel multimodal evaluation framework that systematically benchmarks concept erasure methods across three input modalities: text prompts, learned embeddings, and inverted latents. For the latter two, we evaluate both white-box and black-box access, yielding five evaluation scenarios. Our analysis shows that existing methods achieve strong erasure performance against text prompts but largely fail under learned embeddings and inverted latents, with Concept Reproduction Rate (CRR) exceeding 90% in the white-box setting. To address these vulnerabilities, we propose IRECE (Inference-time Robustness Enhancement for Concept Erasure), a plug-and-play module that localizes target concepts via cross-attention and perturbs the associated latents during denoising. Experiments demonstrate that IRECE consistently restores robustness, reducing CRR by up to 40% under the most challenging white-box latent inversion scenario, while preserving visual quality. To the best of our knowledge, M-ErasureBench provides the first comprehensive benchmark of concept erasure beyond text prompts. Together with IRECE, our benchmark offers practical safeguards for building more reliable protective generative models.

顶级标签: model evaluation benchmark multi-modal
详细标签: concept erasure diffusion models multimodal evaluation robustness text-to-image 或 搜索:

M-ErasureBench:一个用于扩散模型概念擦除的综合多模态评估基准 / M-ErasureBench: A Comprehensive Multimodal Evaluation Benchmark for Concept Erasure in Diffusion Models


1️⃣ 一句话总结

这篇论文提出了首个超越文本提示的多模态概念擦除评估基准M-ErasureBench,并设计了一个名为IRECE的即插即用模块,能有效提升扩散模型在图像编辑等真实场景下抵御通过嵌入或潜在代码重新生成有害或受保护概念的能力。

源自 arXiv: 2512.22877