📄
Abstract - Towards a data-scale independent regulariser for robust sparse identification of non-linear dynamics
Data normalisation, a common and often necessary preprocessing step in engineering and scientific applications, can severely distort the discovery of governing equations by magnitudebased sparse regression methods. This issue is particularly acute for the Sparse Identification of Nonlinear Dynamics (SINDy) framework, where the core assumption of sparsity is undermined by the interaction between data scaling and measurement noise. The resulting discovered models can be dense, uninterpretable, and physically incorrect. To address this critical vulnerability, we introduce the Sequential Thresholding of Coefficient of Variation (STCV), a novel, computationally efficient sparse regression algorithm that is inherently robust to data scaling. STCV replaces conventional magnitude-based thresholding with a dimensionless statistical metric, the Coefficient Presence (CP), which assesses the statistical validity and consistency of candidate terms in the model library. This shift from magnitude to statistical significance makes the discovery process invariant to arbitrary data scaling. Through comprehensive benchmarking on canonical dynamical systems and practical engineering problems, including a physical mass-spring-damper experiment, we demonstrate that STCV consistently and significantly outperforms standard Sequential Thresholding Least Squares (STLSQ) and Ensemble-SINDy (E-SINDy) on normalised, noisy datasets. The results show that STCV-based methods can successfully identify the correct, sparse physical laws even when other methods fail. By mitigating the distorting effects of normalisation, STCV makes sparse system identification a more reliable and automated tool for real-world applications, thereby enhancing model interpretability and trustworthiness.
面向数据尺度无关的正则化器:用于鲁棒非线性动力学稀疏辨识 /
Towards a data-scale independent regulariser for robust sparse identification of non-linear dynamics
1️⃣ 一句话总结
本文提出了一种名为STCV的新算法,它通过使用一个与数据尺度无关的统计指标来筛选模型项,从而有效解决了传统稀疏辨识方法在数据归一化和噪声干扰下容易失效的问题,显著提升了从复杂数据中发现真实、简洁物理规律的可靠性和自动化程度。