大型语言模型中信念引导的能动性与元认知监控的迹象 / Indications of Belief-Guided Agency and Meta-Cognitive Monitoring in Large Language Models
1️⃣ 一句话总结
这项研究通过实验发现,大型语言模型不仅能够根据外部输入形成内部“信念”,这些信念能系统地影响其行为选择,而且模型还能监控并报告自身的信念状态,为AI系统可能具备某种形式的“能动性”和“自我认知”提供了初步证据。
Rapid advancements in large language models (LLMs) have sparked the question whether these models possess some form of consciousness. To tackle this challenge, Butlin et al. (2023) introduced a list of indicators for consciousness in artificial systems based on neuroscientific theories. In this work, we evaluate a key indicator from this list, called HOT-3, which tests for agency guided by a general belief-formation and action selection system that updates beliefs based on meta-cognitive monitoring. We view beliefs as representations in the model's latent space that emerge in response to a given input, and introduce a metric to quantify their dominance during generation. Analyzing the dynamics between competing beliefs across models and tasks reveals three key findings: (1) external manipulations systematically modulate internal belief formation, (2) belief formation causally drives the model's action selection, and (3) models can monitor and report their own belief states. Together, these results provide empirical support for the existence of belief-guided agency and meta-cognitive monitoring in LLMs. More broadly, our work lays methodological groundwork for investigating the emergence of agency, beliefs, and meta-cognition in LLMs.
大型语言模型中信念引导的能动性与元认知监控的迹象 / Indications of Belief-Guided Agency and Meta-Cognitive Monitoring in Large Language Models
这项研究通过实验发现,大型语言模型不仅能够根据外部输入形成内部“信念”,这些信念能系统地影响其行为选择,而且模型还能监控并报告自身的信念状态,为AI系统可能具备某种形式的“能动性”和“自我认知”提供了初步证据。
源自 arXiv: 2602.02467