菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-18
📄 Abstract - Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models

Transformer models achieve state-of-the-art performance across domains and tasks, yet their deeply layered representations make their predictions difficult to interpret. Existing explainability methods rely on final-layer attributions, capture either local token-level attributions or global attention patterns without unification, and lack context-awareness of inter-token dependencies and structural components. They also fail to capture how relevance evolves across layers and how structural components shape decision-making. To address these limitations, we proposed the \textbf{Context-Aware Layer-wise Integrated Gradients (CA-LIG) Framework}, a unified hierarchical attribution framework that computes layer-wise Integrated Gradients within each Transformer block and fuses these token-level attributions with class-specific attention gradients. This integration yields signed, context-sensitive attribution maps that capture supportive and opposing evidence while tracing the hierarchical flow of relevance through the Transformer layers. We evaluate the CA-LIG Framework across diverse tasks, domains, and transformer model families, including sentiment analysis and long and multi-class document classification with BERT, hate speech detection in a low-resource language setting with XLM-R and AfroLM, and image classification with Masked Autoencoder vision Transformer model. Across all tasks and architectures, CA-LIG provides more faithful attributions, shows stronger sensitivity to contextual dependencies, and produces clearer, more semantically coherent visualizations than established explainability methods. These results indicate that CA-LIG provides a more comprehensive, context-aware, and reliable explanation of Transformer decision-making, advancing both the practical interpretability and conceptual understanding of deep neural models.

顶级标签: natural language processing model evaluation machine learning
详细标签: explainable ai transformer interpretability integrated gradients attention mechanisms model attribution 或 搜索:

可解释AI:用于解释Transformer模型的上下文感知分层集成梯度方法 / Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models


1️⃣ 一句话总结

本文提出了一种名为CA-LIG的新方法,它通过结合分层梯度计算和注意力机制,为Transformer模型(如BERT)的决策过程提供了更全面、更贴合上下文且易于理解的可视化解释。

源自 arXiv: 2602.16608