菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-07
📄 Abstract - Uncovering Entity Identity Confusion in Multimodal Knowledge Editing

Multimodal knowledge editing (MKE) aims to correct the internal knowledge of large vision-language models after deployment, yet the behavioral patterns of post-edit models remain underexplored. In this paper, we identify a systemic failure mode in edited models, termed Entity Identity Confusion (EIC): edited models exhibit an absurd behavior where text-only queries about the original entity's identity unexpectedly return information about the new entity. To rigorously investigate EIC, we construct EC-Bench, a diagnostic benchmark that directly probes how image-entity bindings shift before and after editing. Our analysis reveals that EIC stems from existing methods failing to distinguish between Image-Entity (I-E) binding and Entity-Entity (E-E) relational knowledge in the model, causing models to overfit E-E associations as a shortcut: the image is still perceived as the original entity, with the new entity's name serving only as a spurious identity label. We further explore potential mitigation strategies, showing that constraining edits to the model's I-E processing stage encourages edits to act more faithfully on I-E binding, thereby substantially reducing EIC. Based on these findings, we discuss principled desiderata for faithful MKE and provide methodological guidance for future research.

顶级标签: multi-modal model editing
详细标签: entity identity confusion multimodal knowledge editing benchmark vision-language models knowledge editing 或 搜索:

多模态知识编辑中的实体身份混淆问题 / Uncovering Entity Identity Confusion in Multimodal Knowledge Editing


1️⃣ 一句话总结

本文发现当前多模态知识编辑方法会导致大模型出现“实体身份混淆”问题——即编辑后模型误将新实体的名称当作旧实体的身份标签,却仍将图片识别为旧实体;研究构建了诊断基准EC-Bench,揭示了问题根源在于模型混淆了“图片与实体”的绑定关系和“实体与实体”的关联知识,并提出通过限制编辑作用于图片-实体处理阶段来缓解该问题。

源自 arXiv: 2605.06096