菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-22
📄 Abstract - Preisach Attention: A Hysteretic Model of Sequential Memory

We introduce the Preisach Attention Layer (PAL), a novel sequence modelling architecture grounded in the classical Preisach hysteresis operator from mathematical physics. PAL replaces the softmax attention mechanism with a binary relay operator parameterised by learned activation and deactivation thresholds, maintaining a stack of local extrema as its internal state. A single-layer PAL-Transformer with O(1) depth is Turing-complete under arbitrary precision arithmetic, achievable through simulation of a two-stack pushdown automaton -- in contrast to the O(log n) depth required by standard hard-attention transformers. Second, we prove that the function classes computable by PAL and by the transformer are incomparable: PAL computes historical range statistics in O(1) layers that require O(log n) layers for transformers, while transformers support random-access retrieval that PAL cannot perform without auxiliary state. The separating property is rate-independence -- PAL responds only to the sequence of local extrema, not to absolute token positions or temporal spacing. Third, we show that the extremum stack constitutes a minimal sufficient statistic of the input history for all rate-independent functionals, providing a formal analogue of the wiping property in classical hysteresis theory. PAL is thus an efficient architecture for tasks with long episodic memory and weak positional dependence, with O(n log n) total inference cost versus O(n^2) for standard attention.

顶级标签: machine learning theory
详细标签: sequence modelling attention mechanism hysteresis transformer computational complexity 或 搜索:

Preisach注意力:一种基于迟滞效应的序列记忆模型 / Preisach Attention: A Hysteretic Model of Sequential Memory


1️⃣ 一句话总结

本文提出了一种名为Preisach注意力层的新型神经网络结构,它用物理中的迟滞算子替代了传统Transformer中的注意力机制,通过记忆输入序列的局部极值来实现高效计算,在处理长序列和对位置不敏感的任务上比标准Transformer更快、更具理论优势。

源自 arXiv: 2605.23603