无监督学习中泛化误差的信息几何分解 / Information-Geometric Decomposition of Generalization Error in Unsupervised Learning
1️⃣ 一句话总结
这篇论文利用信息几何学原理,将无监督学习模型的泛化误差精确分解为模型误差、数据偏差和方差三个部分,并以正则化主成分分析为例,揭示了模型选择(如保留多少主成分)如何在这三者之间进行权衡,从而找到最优解。
We decompose the Kullback--Leibler generalization error (GE) -- the expected KL divergence from the data distribution to the trained model -- of unsupervised learning into three non-negative components: model error, data bias, and variance. The decomposition is exact for any e-flat model class and follows from two identities of information geometry: the generalized Pythagorean theorem and a dual e-mixture variance identity. As an analytically tractable demonstration, we apply the framework to $\epsilon$-PCA, a regularized principal component analysis in which the empirical covariance is truncated at rank $N_K$ and discarded directions are pinned at a fixed noise floor $\epsilon$. Although rank-constrained $\epsilon$-PCA is not itself e-flat, it admits a technical reformulation with the same total GE on isotropic Gaussian data, under which each component of the decomposition takes closed form. The optimal rank emerges as the cutoff $\lambda_{\mathrm{cut}}^{*} = \epsilon$ -- the model retains exactly those empirical eigenvalues exceeding the noise floor -- with the cutoff reflecting a marginal-rate balance between model-error gain and data-bias cost. A boundary comparison further yields a three-regime phase diagram -- retain-all, interior, and collapse -- separated by the lower Marchenko--Pastur edge and an analytically computable collapse threshold $\epsilon_{*}(\alpha)$, where $\alpha$ is the dimension-to-sample-size ratio. All claims are verified numerically.
无监督学习中泛化误差的信息几何分解 / Information-Geometric Decomposition of Generalization Error in Unsupervised Learning
这篇论文利用信息几何学原理,将无监督学习模型的泛化误差精确分解为模型误差、数据偏差和方差三个部分,并以正则化主成分分析为例,揭示了模型选择(如保留多少主成分)如何在这三者之间进行权衡,从而找到最优解。
源自 arXiv: 2604.12340