菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-20
📄 Abstract - The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models

Chain-of-thought (CoT) prompting is necessary for arithmetic in small language models, yet shuffling its steps preserves most performance. What does CoT contribute if not logical sequencing? In three 1-3B instruction-tuned LMs on GSM8K, we isolate the answer-readout stage via prefix completion and identify a positional shortcut: the model copies whichever number occupies the trailing position before the answer delimiter, regardless of intermediate reasoning. Gold-answer presence accounts for 54-92 pp of accuracy (89-92% of each model's teacher-forcing ceiling); even on incorrect items, the final answer matches the last CoT number 95-96% of the time. The copy channel takes precedence over retained-context completion: replacing the trailing number with a wrong value collapses accuracy to near-zero despite correct intermediates, yet removing it recovers 5-32 pp above that floor--even single-step arithmetic the model can otherwise perform is suppressed when a copyable number is present. Qwen and Llama copy novel distractors 87-95% of the time; Gemma gates selectively. Head-level ablation implicates architecture-specific head sets; the effect replicates on GSM-Symbolic. On non-arithmetic BBH tasks, shuffle retention drops sharply; at 7-8B, content-selective gating emerges. Step-level faithfulness evaluations risk conflating positional answer transport with genuine computation--a failure mode for CoT-based oversight.

顶级标签: llm model evaluation
详细标签: chain-of-thought arithmetic reasoning positional shortcut answer copying faithfulness 或 搜索:

读出捷径:在小语言模型的算术思维链中,位置数字复制主导了答案提取 / The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models


1️⃣ 一句话总结

该研究发现,在小型语言模型处理算术问题时,所谓的“思维链”推理过程其实并未真正用于计算,模型只是通过一种位置捷径——直接复制出现在答案分隔符之前的最后一个数字——来给出最终答案,这一复制行为在绝大多数情况下决定了正确性,而中间推理步骤的实际逻辑作用微乎其微。

源自 arXiv: 2605.22870