菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-03
📄 Abstract - Retrievit: In-context Retrieval Capabilities of Transformers, State Space Models, and Hybrid Architectures

Transformers excel at in-context retrieval but suffer from quadratic complexity with sequence length, while State Space Models (SSMs) offer efficient linear-time processing but have limited retrieval capabilities. We investigate whether hybrid architectures combining Transformers and SSMs can achieve the best of both worlds on two synthetic in-context retrieval tasks. The first task, n-gram retrieval, requires the model to identify and reproduce an n-gram that succeeds the query within the input sequence. The second task, position retrieval, presents the model with a single query token and requires it to perform a two-hop associative lookup: first locating the corresponding element in the sequence, and then outputting its positional index. Under controlled experimental conditions, we assess data efficiency, length generalization, robustness to out of domain training examples, and learned representations across Transformers, SSMs, and hybrid architectures. We find that hybrid models outperform SSMs and match or exceed Transformers in data efficiency and extrapolation for information-dense context retrieval. However, Transformers maintain superiority in position retrieval tasks. Through representation analysis, we discover that SSM-based models develop locality-aware embeddings where tokens representing adjacent positions become neighbors in embedding space, forming interpretable structures. This emergent property, absent in Transformers, explains both the strengths and limitations of SSMs and hybrids for different retrieval tasks. Our findings provide principled guidance for architecture selection based on task requirements and reveal fundamental differences in how Transformers and SSMs, and hybrid models learn positional associations.

顶级标签: model training model evaluation theory
详细标签: in-context retrieval hybrid architectures state space models positional encoding length generalization 或 搜索:

检索能力探究:Transformer、状态空间模型与混合架构的上下文检索能力 / Retrievit: In-context Retrieval Capabilities of Transformers, State Space Models, and Hybrid Architectures


1️⃣ 一句话总结

这篇论文通过两项合成检索任务发现,结合Transformer和状态空间模型的混合架构在数据效率和信息密集检索上能媲美甚至超越Transformer,但在位置检索任务上Transformer仍占优,并揭示了不同模型学习位置关联方式的根本差异。

源自 arXiv: 2603.02874