基于序列岭杠杆得分的Nystrom方法分析 / Analysis of Nystrom method with sequential ridge leverage scores
1️⃣ 一句话总结
本文提出了一种名为INK-ESTIMATE的算法,能够在大规模核岭回归问题中,仅需一次遍历数据并占用固定小内存,即可逐步高效估算岭杠杆得分,从而通过Nystrom采样方法在每一步都保证近似核矩阵的重建质量和预测性能。
Large-scale kernel ridge regression (KRR) is limited by the need to store a large kernel matrix K_t. To avoid storing the entire matrix K_t, Nystrom methods subsample a subset of columns of the kernel matrix, and efficiently find an approximate KRR solution on the reconstructed matrix. The chosen subsampling distribution in turn affects the statistical and computational tradeoffs. For KRR problems, recent works show that a sampling distribution proportional to the ridge leverage scores (RLSs) provides strong reconstruction guarantees for the approximation. While exact RLSs are as difficult to compute as a KRR solution, we may be able to approximate them well enough. In this paper, we study KRR problems in a sequential setting and introduce the INK-ESTIMATE algorithm, that incrementally computes the RLSs estimates. INK-ESTIMATE maintains a small sketch of K_t, that at each step is used to compute an intermediate estimate of the RLSs. First, our sketch update does not require access to previously seen columns, and therefore a single pass over the kernel matrix is sufficient. Second, the algorithm requires a fixed, small space budget to run dependent only on the effective dimension of the kernel matrix. Finally, our sketch provides strong approximation guarantees on the distance between the true kernel matrix and its approximation, and on the statistical risk of the approximate KRR solution at any time, because all our guarantees hold at any intermediate step.
基于序列岭杠杆得分的Nystrom方法分析 / Analysis of Nystrom method with sequential ridge leverage scores
本文提出了一种名为INK-ESTIMATE的算法,能够在大规模核岭回归问题中,仅需一次遍历数据并占用固定小内存,即可逐步高效估算岭杠杆得分,从而通过Nystrom采样方法在每一步都保证近似核矩阵的重建质量和预测性能。
源自 arXiv: 2604.20077