快速权重乘积键记忆 / Fast-weight Product Key Memory
1️⃣ 一句话总结
这篇论文提出了一种名为FwPKM的新型神经网络架构,它通过动态更新内部参数,让模型在训练和推理时都能快速记住并调用新信息,从而在保持高效计算的同时,显著提升了处理超长文本的能力。
Sequence modeling layers in modern language models typically face a trade-off between storage capacity and computational efficiency. While Softmax attention offers unbounded storage at prohibitive quadratic costs, linear variants provide efficiency but suffer from limited, fixed-size storage. We propose Fast-weight Product Key Memory (FwPKM), a novel architecture that resolves this tension by transforming the sparse Product Key Memory (PKM) from a static module into a dynamic, "fast-weight" episodic memory. Unlike PKM, FwPKM updates its parameters dynamically at both training and inference time via local chunk-level gradient descent, allowing the model to rapidly memorize and retrieve new key-value pairs from input sequences. Experiments reveal that FwPKM functions as an effective episodic memory that complements the semantic memory of standard modules, yielding significant perplexity reductions on long-context datasets. Notably, in Needle in a Haystack evaluations, FwPKM generalizes to 128K-token contexts despite being trained on only 4K-token sequences.
快速权重乘积键记忆 / Fast-weight Product Key Memory
这篇论文提出了一种名为FwPKM的新型神经网络架构,它通过动态更新内部参数,让模型在训练和推理时都能快速记住并调用新信息,从而在保持高效计算的同时,显著提升了处理超长文本的能力。
源自 arXiv: 2601.00671