菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-23
📄 Abstract - Hyperloop Transformers

LLM architecture research generally aims to maximize model quality subject to fixed compute/latency budgets. However, many applications of interest such as edge and on-device deployment are further constrained by the model's memory footprint, thus motivating parameter-efficient architectures for language modeling. This paper describes a simple architecture that improves the parameter-efficiency of LLMs. Our architecture makes use of looped Transformers as a core primitive, which reuse Transformer layers across depth and are thus more parameter-efficient than ordinary (depth-matched) Transformers. We organize the looped Transformer into three blocks--begin, middle, and end blocks--where each block itself consists of multiple Transformer layers, and only the middle block is applied recurrently across depth. We augment the looped middle block with hyper-connections (Xie et al., 2026), which expand the residual stream into matrix-valued residual streams. Hyper-connections are applied only after each loop, and therefore add minimal new parameters and compute cost. Across various model scales, we find that our Hyper-Connected Looped Transformer (Hyperloop Transformer) is able to outperform depth-matched Transformer and mHC Transformer baselines despite using approximately 50% fewer parameters. The outperformance persists through post-training weight quantization, thus positioning Hyperloop Transformers as an attractive architecture for memory-efficient language modeling.

顶级标签: llm model training model evaluation
详细标签: parameter efficiency looped transformer hyper-connections memory-efficient architecture 或 搜索:

超环变压器 / Hyperloop Transformers


1️⃣ 一句话总结

本文提出一种名为超环变压器(Hyperloop Transformer)的新型语言模型架构,通过循环使用同一组中间层并结合超连接技术,在参数量减少约50%的情况下,仍能超越传统变压器模型的性能,尤其适合内存受限的设备端部署。

源自 arXiv: 2604.21254