相对委托人、多元对齐与结构性的价值对齐问题 / Relative Principals, Pluralistic Alignment, and the Structural Value Alignment Problem
1️⃣ 一句话总结
本文提出,人工智能的“价值对齐”不应仅视为技术或伦理问题,而应从治理角度出发,通过分析目标、信息和委托人三个相互影响的维度来诊断和解决实际系统中出现的对齐偏差,并强调对齐本质上是一个需要多方协商和持续管理的结构性过程。
The value alignment problem for artificial intelligence (AI) is often framed as a purely technical or normative challenge, sometimes focused on hypothetical future systems. I argue that the problem is better understood as a structural question about governance: not whether an AI system is aligned in the abstract, but whether it is aligned enough, for whom, and at what cost. Drawing on the principal-agent framework from economics, this paper reconceptualises misalignment as arising along three interacting axes: objectives, information, and principals. The three-axis framework provides a systematic way of diagnosing why misalignment arises in real-world systems and clarifies that alignment cannot be treated as a single technical property of models but an outcome shaped by how objectives are specified, how information is distributed, and whose interests count in practice. The core contribution of this paper is to show that the three-axis decomposition implies that alignment is fundamentally a problem of governance rather than engineering alone. From this perspective, alignment is inherently pluralistic and context-dependent, and resolving misalignment involves trade-offs among competing values. Because misalignment can occur along each axis -- and affect stakeholders differently -- the structural description shows that alignment cannot be "solved" through technical design alone, but must be managed through ongoing institutional processes that determine how objectives are set, how systems are evaluated, and how affected communities can contest or reshape those decisions.
相对委托人、多元对齐与结构性的价值对齐问题 / Relative Principals, Pluralistic Alignment, and the Structural Value Alignment Problem
本文提出,人工智能的“价值对齐”不应仅视为技术或伦理问题,而应从治理角度出发,通过分析目标、信息和委托人三个相互影响的维度来诊断和解决实际系统中出现的对齐偏差,并强调对齐本质上是一个需要多方协商和持续管理的结构性过程。
源自 arXiv: 2604.20805