菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-07
📄 Abstract - The UNDO Flip-Flop: A Controlled Probe for Reversible Semantic State Management in State Space Model

State space models (SSMs) have been shown to possess the theoretical capacity to model both star-free sequential tasks and bounded hierarchical structures Sarrof et al. (2024). However, formal expressivity results do not guarantee that gradient-based optimisation will reliably discover the corresponding solutions. Existing benchmarks probe either monotonic state tracking, as in the standard Flip-Flop task, or structural nesting, as in the Dyck languages, but neither isolates reversible semantic state retrieval. We introduce the UNDO Flip-Flop task to fill this gap. By extending the standard Flip-Flop with an UNDO, the task requires a model to maintain an implicit bounded stack and recover historical states under non-monotonic update sequences. We evaluate one-layer and two-layer Mamba-2 under this framework. Both variants fail to acquire the provably expressible stack-based rollback mechanism, converging instead on a local toggle heuristic that inverts the current state rather than retrieving stored history. Under an adversarial retraction pressure test held within the training length distribution, the two-layer model collapses to 41.10% accuracy, which is below random chance. The results confirm systematic rather than incidental failure. Causal ablation shows that the bottleneck lies in retrieval, not storage. These results draw a clear line between what an architecture can in principle represent and what gradient descent reliably learns, a distinction that theoretical expressivity analyses alone cannot capture.

顶级标签: theory model evaluation systems
详细标签: state space models expressivity benchmark gradient descent semantic state 或 搜索:

UNDO触发器:状态空间模型中可逆语义状态管理的受控探针 / The UNDO Flip-Flop: A Controlled Probe for Reversible Semantic State Management in State Space Model


1️⃣ 一句话总结

这篇论文通过设计一个需要模型记住并撤销历史状态的新任务,发现即使理论上能学会,像Mamba-2这样的状态空间模型在实际训练中也无法可靠地掌握可逆状态管理,而是依赖简单的局部策略,揭示了模型理论表达能力与实际学习能力之间的关键差距。

源自 arXiv: 2604.05923