Mechanistic Interpretability of Large-Scale Counting in LLMs through a System-2 Strategy

📄 Abstract - Mechanistic Interpretability of Large-Scale Counting in LLMs through a System-2 Strategy

Large language models (LLMs), despite strong performance on complex mathematical problems, exhibit systematic limitations in counting tasks. This issue arises from architectural limits of transformers, where counting is performed across layers, leading to degraded precision for larger counting problems due to depth constraints. To address this limitation, we propose a simple test-time strategy inspired by System-2 cognitive processes that decomposes large counting tasks into smaller, independent sub-problems that the model can reliably solve. We evaluate this approach using observational and causal mediation analyses to understand the underlying mechanism of this System-2-like strategy. Our mechanistic analysis identifies key components: latent counts are computed and stored in the final item representations of each part, transferred to intermediate steps via dedicated attention heads, and aggregated in the final stage to produce the total count. Experimental results demonstrate that this strategy enables LLMs to surpass architectural limitations and achieve high accuracy on large-scale counting tasks. This work provides mechanistic insight into System-2 counting in LLMs and presents a generalizable approach for improving and understanding their reasoning behavior.

通过系统二策略实现大语言模型大规模计数任务的机制可解释性 / Mechanistic Interpretability of Large-Scale Counting in LLMs through a System-2 Strategy

1️⃣ 一句话总结

这篇论文提出了一种模仿人类深度思考（系统二）的简单方法，通过将大语言模型不擅长的大规模计数任务拆解成多个小任务分别解决再汇总，从而突破了模型自身的结构限制，显著提升了计数准确率，并揭示了其内部工作机制。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要