菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-03
📄 Abstract - On the Expressive Power of Transformers for Maxout Networks and Continuous Piecewise Linear Functions

Transformer networks have achieved remarkable empirical success across a wide range of applications, yet their theoretical expressive power remains insufficiently understood. In this paper, we study the expressive capabilities of Transformer architectures. We first establish an explicit approximation of maxout networks by Transformer networks while preserving comparable model complexity. As a consequence, Transformers inherit the universal approximation capability of ReLU networks under similar complexity constraints. Building on this connection, we develop a framework to analyze the approximation of continuous piecewise linear functions by Transformers and quantitatively characterize their expressivity via the number of linear regions, which grows exponentially with depth. Our analysis establishes a theoretical bridge between approximation theory for standard feedforward neural networks and Transformer architectures. It also yields structural insights into Transformers: self-attention layers implement max-type operations, while feedforward layers realize token-wise affine transformations.

顶级标签: theory model training machine learning
详细标签: transformers expressive power approximation theory maxout networks piecewise linear functions 或 搜索:

关于Transformer网络对Maxout网络和连续分段线性函数的表达能力研究 / On the Expressive Power of Transformers for Maxout Networks and Continuous Piecewise Linear Functions


1️⃣ 一句话总结

这篇论文证明了Transformer网络在模型复杂度相近的情况下,能够有效逼近Maxout网络和连续分段线性函数,从而继承了ReLU网络的通用逼近能力,并通过分析线性区域的数量定量刻画了其表达能力随深度指数增长的特性。

源自 arXiv: 2603.03084