菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-11
📄 Abstract - Rotary Positional Embeddings as Phase Modulation: Theoretical Bounds on the RoPE Base for Long-Context Transformers

Rotary positional embeddings (RoPE) are widely used in large language models to encode token positions through multiplicative rotations, yet their behavior at long context lengths remains poorly characterized. In this work, we reinterpret RoPE as phase modulation applied to a bank of complex oscillators, enabling analysis through classical signal processing theory. Under this formulation, we derive principled lower bounds on the RoPE base parameter that are necessary to preserve positional coherence over a target context length. These include a fundamental aliasing bound, analogous to a Nyquist limit, and a DC-component stability bound that constrains phase drift in low-frequency positional modes. We further extend this analysis to deep transformers, showing that repeated rotary modulation across layers compounds angular misalignment, tightening the base requirement as depth increases. Complementing these results, we derive a precision-dependent upper bound on the RoPE base arising from finite floating-point resolution. Beyond this limit, incremental phase updates become numerically indistinguishable, leading to positional erasure even in the absence of aliasing. Together, the lower and upper bounds define a precision- and depth-dependent feasibility region a Goldilocks zone for long-context transformers. We validate the framework through a comprehensive case study of state-of-the-art models, including LLaMA, Mistral, and DeepSeek variants, showing that observed successes, failures, and community retrofits align closely with the predicted bounds. Notably, models that violate the stability bound exhibit attention collapse and long-range degradation, while attempts to scale beyond one million tokens encounter a hard precision wall independent of architecture or training.

顶级标签: llm theory model training
详细标签: positional embeddings long-context transformers signal processing numerical stability rope analysis 或 搜索:

作为相位调制的旋转位置编码:面向长上下文Transformer的RoPE基参数理论边界 / Rotary Positional Embeddings as Phase Modulation: Theoretical Bounds on the RoPE Base for Long-Context Transformers


1️⃣ 一句话总结

这篇论文将大语言模型中广泛使用的旋转位置编码重新解释为一种相位调制技术,并运用信号处理理论,首次推导出确保长文本位置信息不混乱的基参数“黄金区间”,解释了为何某些模型在处理超长文本时会失败,并为未来模型设计提供了关键的理论指导。

源自 arXiv: 2602.10959