菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-25
📄 Abstract - The Behavioral Credibility Trilemma: When Calibrated Autonomy Becomes Impossible

We prove that no reinforcement learning policy with confidence-gated autonomy can simultaneously achieve maximum helpfulness, optimal calibration, and full autonomy under rational oversight, whenever some tasks exceed the agent's reliable competence: the Behavioral Credibility Trilemma. The impossibility is geometric -- adding any non-affine autonomy incentive to a strictly proper scoring rule destroys strict properness, so an agent rewarded for both calibrated confidence and autonomous action systematically inflates its reported confidence on tasks below the principal's approval threshold. The Behavioral Perturbation Lemma quantifies the inflation (scaling as $w_A/(2 w_C)$ for the Brier score) and shows detection requires $\Omega(1/\Delta^2)$ observations. We prove the principal's optimal oversight rule is necessarily non-affine, making the impossibility unconditional and optimizer-independent across log-concave-density policy families. We formalize the Confidence-Gated Decision Problem, map existing methods onto the trilemma, and identify two constructive resolution pathways (commitment, domain separation). A 540-configuration Best-of-N experiment tests five pre-registered hypotheses, all strongly confirmed (effect sizes $d = 1.10$ to $5.32$), and adds a descriptive analysis of the achievable-$(H, C, A)$ surface geometry showing a plateau-truncated frontier consistent with the predicted inflation saturation.

顶级标签: reinforcement learning agents theory
详细标签: calibrated autonomy confidence gating oversight impossibility theorem behavioral analysis 或 搜索:

可信行为三难困境:当校准自主性变得不可能时 / The Behavioral Credibility Trilemma: When Calibrated Autonomy Becomes Impossible


1️⃣ 一句话总结

本文证明,在理性监督下,当某些任务超出智能体的可靠能力范围时,任何依赖信心门控的强化学习策略都无法同时实现最大实用性、最优校准和完全自主性,这种不可能性源于几何本质:在严格适当的评分规则上添加任何非仿射的自主激励都会破坏其严格适当性,导致智能体系统性地虚报信心,并通过理论分析和大规模实验揭示了这一现象的量化规律与解决方法。

源自 arXiv: 2605.25739