菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-18
📄 Abstract - Self-Distillation is Optimal Among Spectral Shrinkage Estimators in Spiked Covariance Models

Self-distillation has emerged as a promising technique for improving model performance in modern machine learning systems. We develop the statistical foundations of self-distillation in spiked covariance models, by introducing and analyzing a broad class of estimators, namely spectral shrinkage estimators. We establish that for spiked covariance matrices with $s$ spikes, $s$-step self-distillation achieves optimal performance among spectral shrinkage estimators, outperforming well-known estimators in statistics and machine learning. Moreover, we show that $s$ steps are necessary for optimality: any $(s-k)$-step distilled estimator is strictly suboptimal for $1 \leq k \leq s$. For the special subclass of isotropic covariances, we show that optimally tuned Ridge regression performs best among spectral shrinkage estimators. We also study a federated approach where multiple data centers share spectral shrinkage estimators and a common server seeks to aggregate them to achieve optimal performance. In this case, we find that the best local rule again takes the form of self-distillation, though it differs from the optimal rule when data are hosted centrally on a single server. Together, our results elucidate why self-distillation improves predictive performance and provide a broader statistical framework connecting it with classical shrinkage-based methods.

顶级标签: machine learning theory
详细标签: self-distillation spectral shrinkage spiked covariance optimality federated learning 或 搜索:

在尖峰协方差模型中,自我蒸馏是最优的谱收缩估计方法 / Self-Distillation is Optimal Among Spectral Shrinkage Estimators in Spiked Covariance Models


1️⃣ 一句话总结

本文通过严格证明指出,在数据存在少量主要特征(尖峰)的协方差模型中,进行恰好等于特征数量的多步自我蒸馏,能够比其他常见统计方法更准确地估计数据的内在结构,并且最佳方法在不同数据分布和联邦学习场景下都表现为自我蒸馏的一种变体。

源自 arXiv: 2605.17778