📄
Abstract - Transformer Geometry Observatory TGO-I: Spectral Geometry Observatory
Despite the widespread adoption of Vision Transformers (ViTs) and their success across numerous computer vision applications, the fundamental understanding of their dimensional and representational geometry remains relatively underexplored. To address this gap, we introduce Transformer Geometry Observatory (TGO), a systematic framework of experiments and analysis pipelines designed to investigate the representational geometry and dynamics of Vision Transformers. TGO-I, the first installment of the framework, focuses on the spectral geometry of ViT representations. Using a ViT-Small/16 model trained on ImageNet-100, we analyze Effective Rank, Stable Rank, Participation Ratio, Spectral Entropy, Spectral Flatness, Spectral Anisotropy, covariance structure, eigenspectra, and singular value spectra throughout training. Our results reveal a consistent increase in dimensional utilization, accompanied by decreasing anisotropy, increasing spectral entropy, increasing participation ratio, and progressively flatter eigenspectra. Contrary to the common intuition that training should concentrate information into a small number of dominant directions, we observe a progressive redistribution of variance across representational dimensions. This phenomenon is particularly pronounced in the final CLS token representation, which exhibits the highest effective dimensionality and lowest anisotropy within the network.
Transformer几何观测站TGO-I:谱几何观测站 /
Transformer Geometry Observatory TGO-I: Spectral Geometry Observatory
1️⃣ 一句话总结
本文构建了一个名为TGO的系统性分析框架,用于深入探究视觉Transformer模型内部表征的几何结构,并通过实验发现:在训练过程中,模型并非将信息压缩到少数主导方向,而是使表征的维度利用更均匀、各向异性降低、谱熵增加,其中最终的分类标记(CLS token)表征具有最高的有效维度和最低的各向异性,这一发现挑战了传统直觉。