菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-01-29
📄 Abstract - Optimization, Generalization and Differential Privacy Bounds for Gradient Descent on Kolmogorov-Arnold Networks

Kolmogorov--Arnold Networks (KANs) have recently emerged as a structured alternative to standard MLPs, yet a principled theory for their training dynamics, generalization, and privacy properties remains limited. In this paper, we analyze gradient descent (GD) for training two-layer KANs and derive general bounds that characterize their training dynamics, generalization, and utility under differential privacy (DP). As a concrete instantiation, we specialize our analysis to logistic loss under an NTK-separable assumption, where we show that polylogarithmic network width suffices for GD to achieve an optimization rate of order $1/T$ and a generalization rate of order $1/n$, with $T$ denoting the number of GD iterations and $n$ the sample size. In the private setting, we characterize the noise required for $(\epsilon,\delta)$-DP and obtain a utility bound of order $\sqrt{d}/(n\epsilon)$ (with $d$ the input dimension), matching the classical lower bound for general convex Lipschitz problems. Our results imply that polylogarithmic width is not only sufficient but also necessary under differential privacy, revealing a qualitative gap between non-private (sufficiency only) and private (necessity also emerges) training regimes. Experiments further illustrate how these theoretical insights can guide practical choices, including network width selection and early stopping.

顶级标签: theory machine learning model training
详细标签: kolmogorov-arnold networks gradient descent differential privacy generalization bounds optimization theory 或 搜索:

梯度下降训练Kolmogorov-Arnold网络的优化、泛化与差分隐私界分析 / Optimization, Generalization and Differential Privacy Bounds for Gradient Descent on Kolmogorov-Arnold Networks


1️⃣ 一句话总结

这篇论文首次系统分析了梯度下降训练两层Kolmogorov-Arnold网络(KANs)的理论性能,证明了在极窄的网络宽度下就能实现高效优化和泛化,并揭示了差分隐私训练会迫使网络必须保持窄宽度,而非隐私训练则无此限制。

源自 arXiv: 2601.22409