擦除还是侵蚀?评估文本到图像扩散模型在概念遗忘中的组合能力退化 / Erasure or Erosion? Evaluating Compositional Degradation in Unlearned Text-To-Image Diffusion Models
1️⃣ 一句话总结
这篇论文通过系统实验发现,当前从大模型中‘遗忘’特定概念(如裸体)的技术,普遍存在一个两难困境:要么能有效擦除概念但严重损害模型组合生成图像的能力,要么能保持组合能力但擦除效果不佳。
Post-hoc unlearning has emerged as a practical mechanism for removing undesirable concepts from large text-to-image diffusion models. However, prior work primarily evaluates unlearning through erasure success; its impact on broader generative capabilities remains poorly understood. In this work, we conduct a systematic empirical study of concept unlearning through the lens of compositional text-to-image generation. Focusing on nudity removal in Stable Diffusion 1.4, we evaluate a diverse set of state-of-the-art unlearning methods using T2I-CompBench++ and GenEval, alongside established unlearning benchmarks. Our results reveal a consistent trade-off between unlearning effectiveness and compositional integrity: methods that achieve strong erasure frequently incur substantial degradation in attribute binding, spatial reasoning, and counting. Conversely, approaches that preserve compositional structure often fail to provide robust erasure. These findings highlight limitations of current evaluation practices and underscore the need for unlearning objectives that explicitly account for semantic preservation beyond targeted suppression.
擦除还是侵蚀?评估文本到图像扩散模型在概念遗忘中的组合能力退化 / Erasure or Erosion? Evaluating Compositional Degradation in Unlearned Text-To-Image Diffusion Models
这篇论文通过系统实验发现,当前从大模型中‘遗忘’特定概念(如裸体)的技术,普遍存在一个两难困境:要么能有效擦除概念但严重损害模型组合生成图像的能力,要么能保持组合能力但擦除效果不佳。
源自 arXiv: 2604.04575