← 返回列表

arXiv 提交日期: 2026-05-20

📄 Abstract - A Sharper Picture of Generalization in Transformers

We study transformers' generalization behavior on boolean domains from the perspective of the Fourier Spectra of their target functions. In contrast to prior work (Edelman et al., 2022; Trauger and Tewari, 2024), which derived generalization bounds from Rademacher complexity, we investigate the feasibility of obtaining generalization bounds via PAC-Bayes theory. We show that sparse spectra concentrated on low-degree components enable low-sharpness constructions with good generalization properties. Our idea is to show the existence of flat minima implementing any boolean function of sparsity no greater than the context length, and then apply a PAC-Bayes bound to an idealized low-sharpness learner, resulting in a non-vacuous generalization bound. We evaluate predictions empirically and conduct a mechanistic interpretability study to support the realism of our theoretical construction in real transformers.

顶级标签: machine learning theory

更清晰的Transformer泛化图像 / A Sharper Picture of Generalization in Transformers

1️⃣ 一句话总结

本文提出了一种新的理论方法，通过分析布尔函数的傅里叶频谱性质，证明Transformer在输入特征稀疏且低阶时能够找到平坦的极小值，从而获得非平凡的泛化保证，并利用实验和可解释性分析验证了理论的有效性。

👋 没兴趣 ☆ 感兴趣 📌 待读

打开原文 PDF

源自 arXiv: 2605.20988

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要