菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-31
📄 Abstract - Measuring the metacognition of AI

A robust decision-making process must take into account uncertainty, especially when the choice involves inherent risks. Because artificial Intelligence (AI) systems are increasingly integrated into decision-making workflows, managing uncertainty relies more and more on the metacognitive capabilities of these systems; i.e, their ability to assess the reliability of and regulate their own decisions. Hence, it is crucial to employ robust methods to measure the metacognitive abilities of AI. This paper is primarily a methodological contribution arguing for the adoption of the meta-d' framework, or its model-free alternatives, as the gold standard for assessing the metacognitive sensitivity of AIs--the ability to generate confidence ratings that distinguish correct from incorrect responses. Moreover, we propose to leverage signal detection theory (SDT) to measure the ability of AIs to spontaneously regulate their decisions based on uncertainty and risk. To demonstrate the practical utility of these psychophysical frameworks, we conduct two series of experiments on three large language models (LLMs)--GPT-5, DeepSeek-V3.2-Exp, and Mistral-Medium-2508. In the first experiments, LLMs performed a primary judgment followed by a confidence rating. In the second, LLMs only performed the primary judgment, while we manipulated the risk associated with either response. On the one hand, applying the meta-d' framework allows us to conduct comparisons along three axes: comparing an LLM to optimality, comparing different LLMs on a given task, and comparing the same LLM across different tasks. On the other hand, SDT allows us to assess whether LLMs become more conservative when risks are high.

顶级标签: llm model evaluation theory
详细标签: metacognition uncertainty quantification signal detection theory confidence calibration risk sensitivity 或 搜索:

衡量人工智能的元认知能力 / Measuring the metacognition of AI


1️⃣ 一句话总结

这篇论文提出了一套基于心理学测量框架(如meta-d'和信号检测理论)的方法,用于评估AI系统(特别是大语言模型)是否具备像人类一样的‘自知之明’——即能否准确评估自己决策的可靠性,并在高风险情境下自发调整决策策略。

源自 arXiv: 2603.29693