📄
Abstract - An expressivity analysis of hierarchical modelling in deep transformers via bounded-depth grammars
Deep neural networks are widely believed to derive their expressive power from their ability to form \textbf{hierarchical representations}, capturing progressively more abstract and compositional features across layers. In language modeling, \textbf{transformers} have emerged as the dominant architecture, with early layers capturing local syntactic patterns and later layers encoding more complex clause-level dependencies. While this intuition has shaped model design, there remains a lack of rigorous theoretical work demonstrating \textbf{how} deep transformers represent such hierarchical structures. In this work, we analyze the expressiveness of deep transformer models through the formal lens of bounded-depth, non-recursive context-free grammars. For this class of grammars, we explicitly construct transformers with positional attention whose depth grows linearly with grammar depth, while the neuron count scales with the number of derivation-tree shapes and quadratically with the number of production rules. Our theoretical results support the linear representation hypothesis by demonstrating that these architectures possess the structural capacity to encode abstract grammatical states into low-dimensional, linearly separable subspaces within the residual stream.
基于有界深度文法的深度Transformer层次建模表达能力分析 /
An expressivity analysis of hierarchical modelling in deep transformers via bounded-depth grammars
1️⃣ 一句话总结
该论文通过有界深度上下文无关文法这一理论工具,严格证明了深度Transformer模型能够随着网络层数的线性增加,逐步构建出层次化的语言结构,并将抽象的语法状态编码为残差流中低维、线性可分的子空间,从而支持了深度学习中的层次表示假设。