菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-11
📄 Abstract - Attention Gathers, MLPs Compose: A Causal Analysis of an Action-Outcome Circuit in VideoViT

The paper explores how video models trained for classification tasks represent nuanced, hidden semantic information that may not affect the final outcome, a key challenge for Trustworthy AI models. Through Explainable and Interpretable AI methods, specifically mechanistic interpretability techniques, the internal circuit responsible for representing the action's outcome is reverse-engineered in a pre-trained video vision transformer, revealing that the "Success vs Failure" signal is computed through a distinct amplification cascade. While there are low-level differences observed from layer 0, the abstract and semantic representation of the outcome is progressively amplified from layers 5 through 11. Causal analysis, primarily using activation patching supported by ablation results, reveals a clear division of labor: Attention Heads act as "evidence gatherers", providing necessary low-level information for partial signal recovery, while MLP Blocks function as robust "concept composers", each of which is the primary driver to generate the "success" signal. This distributed and redundant circuit in the model's internals explains its resilience to simple ablations, demonstrating a core computational pattern for processing human-action outcomes. Crucially, the existence of this sophisticated circuit for representing complex outcomes, even within a model trained only for simple classification, highlights the potential for models to develop forms of 'hidden knowledge' beyond their explicit task, underscoring the need for mechanistic oversight for building genuinely Explainable and Trustworthy AI systems intended for deployment.

顶级标签: computer vision model evaluation theory
详细标签: mechanistic interpretability vision transformer causal analysis attention mlp 或 搜索:

注意力机制收集证据,MLP模块组合概念:对VideoViT中动作-结果回路的因果分析 / Attention Gathers, MLPs Compose: A Causal Analysis of an Action-Outcome Circuit in VideoViT


1️⃣ 一句话总结

这篇论文通过因果分析方法,揭示了视频分类模型内部存在一个专门处理动作“成功与否”的隐藏回路,其中注意力机制负责收集低层证据,而多层感知机则负责稳健地组合概念,这解释了模型为何能形成超越其训练任务的复杂“隐藏知识”,并凸显了构建可信AI系统时进行机制性监督的重要性。

源自 arXiv: 2603.11142