A^3-Bench:通过锚点与吸引子激活来评测记忆驱动的科学推理 / $A^3$-Bench: Benchmarking Memory-Driven Scientific Reasoning via Anchor and Attractor Activation
1️⃣ 一句话总结
这篇论文提出了一个新的评测基准A^3-Bench,它通过测量模型在科学推理中激活和利用先验知识(锚点)与经验结构(吸引子)的能力,来评估其记忆驱动的推理水平,而不仅仅是看最终答案的对错。
Scientific reasoning relies not only on logical inference but also on activating prior knowledge and experiential structures. Memory can efficiently reuse knowledge and enhance reasoning consistency and stability. However, existing benchmarks mainly evaluate final answers or step-by-step coherence, overlooking the \textit{memory-driven} mechanisms that underlie human reasoning, which involves activating anchors and attractors, then integrating them into multi-step inference. To address this gap, we propose $A^3$-Bench~ this https URL, a benchmark designed to evaluate scientific reasoning through dual-scale memory-driven activation, grounded in Anchor and Attractor Activation. First, we annotate 2,198 science reasoning problems across domains using the SAPM process(subject, anchor & attractor, problem, and memory developing). Second, we introduce a dual-scale memory evaluation framework utilizing anchors and attractors, along with the AAUI(Anchor--Attractor Utilization Index) metric to measure memory activation rates. Finally, through experiments with various base models and paradigms, we validate $A^3$-Bench and analyze how memory activation impacts reasoning performance, providing insights into memory-driven scientific reasoning.
A^3-Bench:通过锚点与吸引子激活来评测记忆驱动的科学推理 / $A^3$-Bench: Benchmarking Memory-Driven Scientific Reasoning via Anchor and Attractor Activation
这篇论文提出了一个新的评测基准A^3-Bench,它通过测量模型在科学推理中激活和利用先验知识(锚点)与经验结构(吸引子)的能力,来评估其记忆驱动的推理水平,而不仅仅是看最终答案的对错。
源自 arXiv: 2601.09274