注意差距:偏好学习中的结构感知一致性 / Mind the Gap: Structure-Aware Consistency in Preference Learning
1️⃣ 一句话总结
这篇论文揭示了直接偏好优化(DPO)等主流方法在理论上存在一致性缺陷,并提出了一种基于语义距离动态调整边界的结构感知偏好学习目标(SA-DPO),从而在有限模型容量下实现更可靠的对齐效果。
Preference learning has become the foundation of aligning Large Language Models (LLMs) with human intent. Popular methods, such as Direct Preference Optimization (DPO), minimize surrogate losses as proxies for the intractable pairwise ranking loss. However, we demonstrate that for the equicontinuous hypothesis sets typical of neural networks, these standard surrogates are theoretically inconsistent, yielding vacuous generalization guarantees. To resolve this, we formulate LLM alignment within a margin-shifted ranking framework. We derive rigorous $H$-consistency bounds that depend on enforcing a separation margin $\gamma$. Crucially, we extend this to Structure-Aware $H$-consistency, introducing a novel objective (SA-DPO) that adapts the margin based on the semantic distance between responses to handle synonyms and hard pairs. Finally, we analyze the trade-off between consistency and model limitations via the Margin-Capacity Profile, proving that heavy-tailed surrogates (such as the Polynomial Hinge family) offer superior consistency guarantees for capacity-bounded models compared to the standard logistic loss used in DPO.
注意差距:偏好学习中的结构感知一致性 / Mind the Gap: Structure-Aware Consistency in Preference Learning
这篇论文揭示了直接偏好优化(DPO)等主流方法在理论上存在一致性缺陷,并提出了一种基于语义距离动态调整边界的结构感知偏好学习目标(SA-DPO),从而在有限模型容量下实现更可靠的对齐效果。
源自 arXiv: 2604.27733