从注意力头到神经元:多任务视觉语言模型中的因果归因与调控 / From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision-Language Models
1️⃣ 一句话总结
这篇论文提出了一个名为HONES的新方法,它通过分析模型内部注意力头与神经元之间的关联,来更准确地找出并调控那些对完成不同视觉语言任务至关重要的神经元,从而提升模型在多任务场景下的性能和可解释性。
Recent work has increasingly explored neuron-level interpretation in vision-language models (VLMs) to identify neurons critical to final predictions. However, existing neuron analyses generally focus on single tasks, limiting the comparability of neuron importance across tasks. Moreover, ranking strategies tend to score neurons in isolation, overlooking how task-dependent information pathways shape the write-in effects of feed-forward network (FFN) neurons. This oversight can exacerbate neuron polysemanticity in multi-task settings, introducing noise into the identification and intervention of task-critical neurons. In this study, we propose HONES (Head-Oriented Neuron Explanation & Steering), a gradient-free framework for task-aware neuron attribution and steering in multi-task VLMs. HONES ranks FFN neurons by their causal write-in contributions conditioned on task-relevant attention heads, and further modulates salient neurons via lightweight scaling. Experiments on four diverse multimodal tasks and two popular VLMs show that HONES outperforms existing methods in identifying task-critical neurons and improves model performance after steering. Our source code is released at: this https URL.
从注意力头到神经元:多任务视觉语言模型中的因果归因与调控 / From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision-Language Models
这篇论文提出了一个名为HONES的新方法,它通过分析模型内部注意力头与神经元之间的关联,来更准确地找出并调控那些对完成不同视觉语言任务至关重要的神经元,从而提升模型在多任务场景下的性能和可解释性。
源自 arXiv: 2604.17941