超越实数:面向长上下文大语言模型的旋转位置编码虚部扩展 / Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs
1️⃣ 一句话总结
这篇论文提出了一种改进的旋转位置编码方法,通过重新利用之前被丢弃的虚部信息来增强大语言模型对长文本的理解能力,实验证明该方法能有效提升长上下文任务的性能。
Rotary Position Embeddings (RoPE) have become a standard for encoding sequence order in Large Language Models (LLMs) by applying rotations to query and key vectors in the complex plane. Standard implementations, however, utilize only the real component of the complex-valued dot product for attention score calculation. This simplification discards the imaginary component, which contains valuable phase information, leading to a potential loss of relational details crucial for modeling long-context dependencies. In this paper, we propose an extension that re-incorporates this discarded imaginary component. Our method leverages the full complex-valued representation to create a dual-component attention score. We theoretically and empirically demonstrate that this approach enhances the modeling of long-context dependencies by preserving more positional information. Furthermore, evaluations on a suite of long-context language modeling benchmarks show that our method consistently improves performance over the standard RoPE, with the benefits becoming more significant as context length increases. The code is available at this https URL.
超越实数:面向长上下文大语言模型的旋转位置编码虚部扩展 / Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs
这篇论文提出了一种改进的旋转位置编码方法,通过重新利用之前被丢弃的虚部信息来增强大语言模型对长文本的理解能力,实验证明该方法能有效提升长上下文任务的性能。
源自 arXiv: 2512.07525