📄
Abstract - Sparse probes and murky physics: a case study of interpretability challenges in a foundation model for continuum dynamics
Generative AI emulators are increasingly used in scientific domains where we already have strong theory, benchmarks, and physical intuition. This raises a central evaluation and interpretability question: when a foundation-style model can reproduce known continuum dynamics, what internal mechanism supports that behavior, is the internal behaviour consistent with known physics, and how does it relate to where the emulator succeeds or fails? We investigate a cross-domain foundation model for continuum dynamics, Walrus by Polymathic, using mechanistic interpretability guided by physical principles. We apply a sparse autoencoder (SAE) to probe a selected layer, and address the practical challenge of triaging a large feature set (over 20,000) using enstrophy as a physically grounded metric. As a deliberately simple testbed, we focus on shear flow and compare feature recruitment across multiple shear-flow setups, i.e. parameter values in the numerical simulation. Across setups we find evidence of piecewise consistency, with subsets of features recurring in similar roles, but this structure is intermittent and does not map cleanly onto standard physical decompositions. In parallel, direct comparisons between numerical simulation and the emulator reveal systematic output-level discrepancies, including regimes where energy/structures become too diffuse or too localized. We connect parts of these discrepancies to changes in specific SAE feature usage. Our work highlights open questions for scientific foundation models: how to robustly prioritize mechanistically meaningful features, how to separate stable structure from analysis artifacts (including single-layer and SAE limitations), and how to use established benchmarks to decide when "different" internal representations are genuinely informative rather than merely effective.
稀疏探针与模糊物理:连续介质动力学基础模型的可解释性挑战案例研究 /
Sparse probes and murky physics: a case study of interpretability challenges in a foundation model for continuum dynamics
1️⃣ 一句话总结
该论文通过分析一个能够模拟连续介质动力学的基础AI模型(Walrus)的内部工作机制,发现其学习到的特征与经典物理分解并不完全对应,且模型在特定条件下会出现能量分布偏差,揭示了用物理直觉解释这类模型时面临的挑战。