菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-05
📄 Abstract - Towards Worst-Case Guarantees with Scale-Aware Interpretability

Neural networks organize information according to the hierarchical, multi-scale structure of natural data. Methods to interpret model internals should be similarly scale-aware, explicitly tracking how features compose across resolutions and guaranteeing bounds on the influence of fine-grained structure that is discarded as irrelevant noise. We posit that the renormalisation framework from physics can meet this need by offering technical tools that can overcome limitations of current methods. Moreover, relevant work from adjacent fields has now matured to a point where scattered research threads can be synthesized into practical, theory-informed tools. To combine these threads in an AI safety context, we propose a unifying research agenda -- \emph{scale-aware interpretability} -- to develop formal machinery and interpretability tools that have robustness and faithfulness properties supported by statistical physics.

顶级标签: theory model evaluation machine learning
详细标签: interpretability renormalization robustness statistical physics multi-scale analysis 或 搜索:

迈向具有最坏情况保证与尺度感知可解释性 / Towards Worst-Case Guarantees with Scale-Aware Interpretability


1️⃣ 一句话总结

这篇论文提出了一种名为‘尺度感知可解释性’的新研究框架,旨在借鉴物理学中的重正化理论,开发出能够追踪神经网络在不同尺度上如何组合特征、并保证对噪声影响的可靠解释工具,以提升AI模型的可解释性与安全性。

源自 arXiv: 2602.05184