菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-20
📄 Abstract - A Sharper Picture of Generalization in Transformers

We study transformers' generalization behavior on boolean domains from the perspective of the Fourier Spectra of their target functions. In contrast to prior work (Edelman et al., 2022; Trauger and Tewari, 2024), which derived generalization bounds from Rademacher complexity, we investigate the feasibility of obtaining generalization bounds via PAC-Bayes theory. We show that sparse spectra concentrated on low-degree components enable low-sharpness constructions with good generalization properties. Our idea is to show the existence of flat minima implementing any boolean function of sparsity no greater than the context length, and then apply a PAC-Bayes bound to an idealized low-sharpness learner, resulting in a non-vacuous generalization bound. We evaluate predictions empirically and conduct a mechanistic interpretability study to support the realism of our theoretical construction in real transformers.

顶级标签: machine learning theory
详细标签: transformers generalization boolean functions pac-bayes fourier analysis 或 搜索:

更清晰的Transformer泛化图像 / A Sharper Picture of Generalization in Transformers


1️⃣ 一句话总结

本文提出了一种新的理论方法,通过分析布尔函数的傅里叶频谱性质,证明Transformer在输入特征稀疏且低阶时能够找到平坦的极小值,从而获得非平凡的泛化保证,并利用实验和可解释性分析验证了理论的有效性。

源自 arXiv: 2605.20988