菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-25
📄 Abstract - Explaining Too Much? Understanding How Large Language Model Reasoning Traces Influence Performance and Metacognition

Large Language Model interfaces are increasingly verbose, exposing intermediate reasoning traces alongside final answers. Traces are framed as transparency mechanisms, yet it is unclear how people use them to solve problems. We report a preregistered between-subjects study (N = 559) in which participants solved ten LSAT-style reasoning problems under one of three conditions: an Answer-only baseline, a Full-trace revealed before the answer, and a Summary-trace presented alongside the answer. Summaries preserved task performance at the no-trace baseline while significantly elevating trust and hedonic appeal, establishing that trace exposure shifts subjective appraisal of the interaction without bringing performance benefits. Under an open-weight reasoning model exposing verbose intermediate output, full traces additionally impaired performance relative to the answer-only baseline. Across all conditions, participants substantially overestimated their performance, and no trace format supported calibrated self-evaluation. Further analysis indicates that hedonic appeal, not trust, carries the indirect path to overestimation, consistent with a processing-fluency account. Reasoning traces are best understood as user-facing interface artifacts rather than transparent windows into model cognition, and calibration is unlikely to emerge from the traces themselves and may best be scaffolded by interactions that elicit users' own reasoning first.

顶级标签: llm behavior general
详细标签: reasoning traces metacognition trust user study performance calibration 或 搜索:

解释过度?理解大语言模型推理痕迹如何影响表现与元认知 / Explaining Too Much? Understanding How Large Language Model Reasoning Traces Influence Performance and Metacognition


1️⃣ 一句话总结

本文通过实验发现,大语言模型展示的详细推理过程并不能提升用户的问题解决能力,反而可能降低表现,且会使用户过度高估自己的成绩,因此这类痕迹更多是影响用户满意度的界面设计,而非真正帮助人们理解模型的思考过程。

源自 arXiv: 2605.25856