世界模型中的观察者效应:侵入式适应会破坏潜在的物理规律 / The Observer Effect in World Models: Invasive Adaptation Corrupts Latent Physics
1️⃣ 一句话总结
这篇论文发现,在评估神经网络是否真正学会了物理规律时,传统的微调或高容量探针等‘侵入式’评估方法会破坏模型内部已学到的潜在物理结构,而他们提出的‘非侵入式’线性解码方法能更准确地揭示模型是否内化了物理世界模型。
Determining whether neural models internalize physical laws as world models, rather than exploiting statistical shortcuts, remains challenging, especially under out-of-distribution (OOD) shifts. Standard evaluations often test latent capability via downstream adaptation (e.g., fine-tuning or high-capacity probes), but such interventions can change the representations being measured and thus confound what was learned during self-supervised learning (SSL). We propose a non-invasive evaluation protocol, PhyIP. We test whether physical quantities are linearly decodable from frozen representations, motivated by the linear representation hypothesis. Across fluid dynamics and orbital mechanics, we find that when SSL achieves low error, latent structure becomes linearly accessible. PhyIP recovers internal energy and Newtonian inverse-square scaling on OOD tests (e.g., $\rho > 0.90$). In contrast, adaptation-based evaluations can collapse this structure ($\rho \approx 0.05$). These findings suggest that adaptation-based evaluation can obscure latent structures and that low-capacity probes offer a more accurate evaluation of physical world models.
世界模型中的观察者效应:侵入式适应会破坏潜在的物理规律 / The Observer Effect in World Models: Invasive Adaptation Corrupts Latent Physics
这篇论文发现,在评估神经网络是否真正学会了物理规律时,传统的微调或高容量探针等‘侵入式’评估方法会破坏模型内部已学到的潜在物理结构,而他们提出的‘非侵入式’线性解码方法能更准确地揭示模型是否内化了物理世界模型。
源自 arXiv: 2602.12218