缺失了什么?解释由“缺席概念”激活的神经元 / What is Missing? Explaining Neurons Activated by Absent Concepts
1️⃣ 一句话总结
这篇论文发现,在深度神经网络中,某些神经元的激活不是因为输入中存在某个特征,反而是因为该特征‘缺席’了,而主流可解释AI方法难以发现这种‘缺席编码’现象,作者为此提出了两种简单的扩展方法来揭示它。
Explainable artificial intelligence (XAI) aims to provide human-interpretable insights into the behavior of deep neural networks (DNNs), typically by estimating a simplified causal structure of the model. In existing work, this causal structure often includes relationships where the presence of a concept is associated with a strong activation of a neuron. For example, attribution methods primarily identify input pixels that contribute most to a prediction, and feature visualization methods reveal inputs that cause high activation of a target neuron - the former implicitly assuming that the relevant information resides in the input, and the latter that neurons encode the presence of concepts. However, a largely overlooked type of causal relationship is that of encoded absences, where the absence of a concept increases neural activation. In this work, we show that such missing but relevant concepts are common and that mainstream XAI methods struggle to reveal them when applied in their standard form. To address this, we propose two simple extensions to attribution and feature visualization techniques that uncover encoded absences. Across experiments, we show how mainstream XAI methods can be used to reveal and explain encoded absences, how ImageNet models exploit them, and that debiasing can be improved when considering them.
缺失了什么?解释由“缺席概念”激活的神经元 / What is Missing? Explaining Neurons Activated by Absent Concepts
这篇论文发现,在深度神经网络中,某些神经元的激活不是因为输入中存在某个特征,反而是因为该特征‘缺席’了,而主流可解释AI方法难以发现这种‘缺席编码’现象,作者为此提出了两种简单的扩展方法来揭示它。
源自 arXiv: 2603.09787