菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-08
📄 Abstract - Group Representational Position Encoding

We present GRAPE (Group RepresentAtional Position Encoding), a unified framework for positional encoding based on group actions. GRAPE brings together two families of mechanisms: (i) multiplicative rotations (Multiplicative GRAPE) in $\mathrm{SO}(d)$ and (ii) additive logit biases (Additive GRAPE) arising from unipotent actions in the general linear group $\mathrm{GL}$. In Multiplicative GRAPE, a position $n \in \mathbb{Z}$ (or $t \in \mathbb{R}$) acts as $\mathbf{G}(n)=\exp(n\,\omega\,\mathbf{L})$ with a rank-2 skew generator $\mathbf{L} \in \mathbb{R}^{d \times d}$, yielding a relative, compositional, norm-preserving map with a closed-form matrix exponential. RoPE is recovered exactly when the $d/2$ planes are the canonical coordinate pairs with log-uniform spectrum. Learned commuting subspaces and compact non-commuting mixtures strictly extend this geometry to capture cross-subspace feature coupling at $O(d)$ and $O(r d)$ cost per head, respectively. In Additive GRAPE, additive logits arise as rank-1 (or low-rank) unipotent actions, recovering ALiBi and the Forgetting Transformer (FoX) as exact special cases while preserving an exact relative law and streaming cacheability. Altogether, GRAPE supplies a principled design space for positional geometry in long-context models, subsuming RoPE and ALiBi as special cases. Project Page: this https URL.

顶级标签: natural language processing model training theory
详细标签: positional encoding group theory long-context attention transformer 或 搜索:

群表示位置编码 / Group Representational Position Encoding


1️⃣ 一句话总结

这篇论文提出了一个名为GRAPE的统一框架,它利用数学中的群作用理论,将RoPE和ALiBi等主流位置编码方法都纳入其中,为设计长文本模型中的位置信息表示提供了一个更通用、更灵活的理论基础。


源自 arXiv: 2512.07805