菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-08
📄 Abstract - The Theory and Practice of Highly Scalable Gaussian Process Regression with Nearest Neighbours

Gaussian process ($GP$) regression is a widely used non-parametric modeling tool, but its cubic complexity in the training size limits its use on massive data sets. A practical remedy is to predict using only the nearest neighbours of each test point, as in Nearest Neighbour Gaussian Process ($NNGP$) regression for geospatial problems and the related scalable $GPnn$ method for more general machine-learning applications. Despite their strong empirical performance, the large-$n$ theory of $NNGP/GPnn$ remains incomplete. We develop a theoretical framework for $NNGP$ and $GPnn$ regression. Under mild regularity assumptions, we derive almost sure pointwise limits for three key predictive criteria: mean squared error ($MSE$), calibration coefficient ($CAL$), and negative log-likelihood ($NLL$). We then study the $L_2$-risk, prove universal consistency, and show that the risk attains Stone's minimax rate $n^{-2\alpha/(2p+d)}$, where $\alpha$ and $p$ capture regularity of the regression problem. We also prove uniform convergence of $MSE$ over compact hyper-parameter sets and show that its derivatives with respect to lengthscale, kernel scale, and noise variance vanish asymptotically, with explicit rates. This explains the observed robustness of $GPnn$ to hyper-parameter tuning. These results provide a rigorous statistical foundation for $NNGP/GPnn$ as a highly scalable and principled alternative to full $GP$ models.

顶级标签: machine learning theory model training
详细标签: gaussian processes scalability nearest neighbors regression statistical theory 或 搜索:

基于最近邻的高可扩展高斯过程回归的理论与实践 / The Theory and Practice of Highly Scalable Gaussian Process Regression with Nearest Neighbours


1️⃣ 一句话总结

这篇论文为一种基于最近邻的高斯过程回归方法建立了完整的理论框架,证明了该方法在大规模数据集上不仅计算高效,而且在统计上具有最优的预测精度和超参数鲁棒性,为实际应用提供了坚实的理论基础。

源自 arXiv: 2604.07267