菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-25
📄 Abstract - Language Models Need Sleep

Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache. During sleep, the model performs $N$ offline recurrent passes over the accumulated context and updates the fast weights in its state-space model (SSM) blocks through a learned local rule. During inference, this shifts extra computation to sleep while preserving the latency of wake-time prediction. We test our method on controlled synthetic tasks, including cellular automata and multi-hop graph retrieval, as well as a realistic math reasoning task, on which a regular transformer as well as SSM-attention hybrid models fail. We then show that increasing sleep duration $N$ for our models improves performance, with the largest gains on examples that require deeper reasoning.

顶级标签: llm machine learning systems
详细标签: memory consolidation fast weights state-space model context length reasoning 或 搜索:

语言模型需要“睡眠” / Language Models Need Sleep


1️⃣ 一句话总结

本论文提出一种类似动物睡眠的机制,让大型语言模型在处理长序列任务时,能通过离线“睡眠”阶段将已积累的信息转化为持久化的快速权重,从而在不增加推理延迟的情况下显著提升模型在需要深度推理任务上的表现。

源自 arXiv: 2605.26099