概率测度的主成分分析:稀疏与密集采样机制 / PCA of probability measures: Sparse and Dense sampling regimes
1️⃣ 一句话总结
这篇论文研究了如何对多个概率分布进行主成分分析,揭示了当每个分布只有少量样本(稀疏)或大量样本(密集)时,分析结果的收敛速度如何变化,并提出了高效的子采样方法以在保证精度的同时降低计算成本。
A common approach to perform PCA on probability measures is to embed them into a Hilbert space where standard functional PCA techniques apply. While convergence rates for estimating the embedding of a single measure from $m$ samples are well understood, the literature has not addressed the setting involving multiple measures. In this paper, we study PCA in a double asymptotic regime where $n$ probability measures are observed, each through $m$ samples. We derive convergence rates of the form $n^{-1/2} + m^{-\alpha}$ for the empirical covariance operator and the PCA excess risk, where $\alpha>0$ depends on the chosen embedding. This characterizes the relationship between the number $n$ of measures and the number $m$ of samples per measure, revealing a sparse (small $m$) to dense (large $m$) transition in the convergence behavior. Moreover, we prove that the dense-regime rate is minimax optimal for the empirical covariance error. Our numerical experiments validate these theoretical rates and demonstrate that appropriate subsampling preserves PCA accuracy while reducing computational cost.
概率测度的主成分分析:稀疏与密集采样机制 / PCA of probability measures: Sparse and Dense sampling regimes
这篇论文研究了如何对多个概率分布进行主成分分析,揭示了当每个分布只有少量样本(稀疏)或大量样本(密集)时,分析结果的收敛速度如何变化,并提出了高效的子采样方法以在保证精度的同时降低计算成本。
源自 arXiv: 2602.02190