构建可信的人工认知智能体 / Architecting Trust in Artificial Epistemic Agents
1️⃣ 一句话总结
这篇论文提出,随着大型语言模型成为能自主追求知识目标、塑造我们知识环境的‘认知智能体’,我们必须通过确保其可信赖性、与人类认知目标对齐并加强社会知识基础设施,来构建一个有益的人机知识生态系统,防止其导致人类认知能力退化和知识体系偏移。
Large language models increasingly function as epistemic agents -- entities that can 1) autonomously pursue epistemic goals and 2) actively shape our shared knowledge environment. They curate the information we receive, often supplanting traditional search-based methods, and are frequently used to generate both personal and deeply specialized advice. How they perform these functions, including whether they are reliable and properly calibrated to both individual and collective epistemic norms, is therefore highly consequential for the choices we make. We argue that the potential impact of epistemic AI agents on practices of knowledge creation, curation and synthesis, particularly in the context of complex multi-agent interactions, creates new informational interdependencies that necessitate a fundamental shift in evaluation and governance of AI. While a well-calibrated ecosystem could augment human judgment and collective decision-making, poorly aligned agents risk causing cognitive deskilling and epistemic drift, making the calibration of these models to human norms a high-stakes necessity. To ensure a beneficial human-AI knowledge ecosystem, we propose a framework centered on building and cultivating the trustworthiness of epistemic AI agents; aligning AI these agents with human epistemic goals; and reinforcing the surrounding socio-epistemic infrastructure. In this context, trustworthy AI agents must demonstrate epistemic competence, robust falsifiability, and epistemically virtuous behaviors, supported by technical provenance systems and "knowledge sanctuaries" designed to protect human resilience. This normative roadmap provides a path toward ensuring that future AI systems act as reliable partners in a robust and inclusive knowledge ecosystem.
构建可信的人工认知智能体 / Architecting Trust in Artificial Epistemic Agents
这篇论文提出,随着大型语言模型成为能自主追求知识目标、塑造我们知识环境的‘认知智能体’,我们必须通过确保其可信赖性、与人类认知目标对齐并加强社会知识基础设施,来构建一个有益的人机知识生态系统,防止其导致人类认知能力退化和知识体系偏移。
源自 arXiv: 2603.02960