Self-Distillation is Optimal Among Spectral Shrinkage Estimators in Spiked Covariance Models

📄 Abstract - Self-Distillation is Optimal Among Spectral Shrinkage Estimators in Spiked Covariance Models

Self-distillation has emerged as a promising technique for improving model performance in modern machine learning systems. We develop the statistical foundations of self-distillation in spiked covariance models, by introducing and analyzing a broad class of estimators, namely spectral shrinkage estimators. We establish that for spiked covariance matrices with $s$ spikes, $s$-step self-distillation achieves optimal performance among spectral shrinkage estimators, outperforming well-known estimators in statistics and machine learning. Moreover, we show that $s$ steps are necessary for optimality: any $(s-k)$-step distilled estimator is strictly suboptimal for $1 \leq k \leq s$. For the special subclass of isotropic covariances, we show that optimally tuned Ridge regression performs best among spectral shrinkage estimators. We also study a federated approach where multiple data centers share spectral shrinkage estimators and a common server seeks to aggregate them to achieve optimal performance. In this case, we find that the best local rule again takes the form of self-distillation, though it differs from the optimal rule when data are hosted centrally on a single server. Together, our results elucidate why self-distillation improves predictive performance and provide a broader statistical framework connecting it with classical shrinkage-based methods.

在尖峰协方差模型中，自我蒸馏是最优的谱收缩估计方法 / Self-Distillation is Optimal Among Spectral Shrinkage Estimators in Spiked Covariance Models

1️⃣ 一句话总结

本文通过严格证明指出，在数据存在少量主要特征（尖峰）的协方差模型中，进行恰好等于特征数量的多步自我蒸馏，能够比其他常见统计方法更准确地估计数据的内在结构，并且最佳方法在不同数据分布和联邦学习场景下都表现为自我蒸馏的一种变体。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要