菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-22
📄 Abstract - MGDA-Decoupled: Geometry-Aware Multi-Objective Optimisation for DPO-based LLM Alignment

Aligning large language models (LLMs) to desirable human values requires balancing multiple, potentially conflicting objectives such as helpfulness, truthfulness, and harmlessness, which presents a multi-objective optimisation challenge. Most alignment pipelines rely on a fixed scalarisation of these objectives, which can introduce procedural unfairness by systematically under-weighting harder-to-optimise or minority objectives. To promote more equitable trade-offs, we introduce MGDA-Decoupled, a geometry-based multi-objective optimisation algorithm that finds a shared descent direction while explicitly accounting for each objective's convergence dynamics. In contrast to prior methods that depend on reinforcement learning (e.g., GAPO) or explicit reward models (e.g., MODPO), our approach operates entirely within the lightweight Direct Preference Optimisation (DPO) paradigm. Experiments on the UltraFeedback dataset show that geometry-aware methods -- and MGDA-Decoupled in particular -- achieve the highest win rates against golden responses, both overall and per objective.

顶级标签: llm reinforcement learning multi-modal
详细标签: dpo multi-objective optimisation alignment human values direct preference optimisation 或 搜索:

MGDA-Decoupled:基于DPF的大语言模型对齐中的几何感知多目标优化方法 / MGDA-Decoupled: Geometry-Aware Multi-Objective Optimisation for DPO-based LLM Alignment


1️⃣ 一句话总结

本文提出了一种名为MGDA-Decoupled的多目标优化算法,通过在轻量级DPO框架中引入几何信息,平衡大语言模型在有用性、真实性和无害性等多个目标上的对齐效果,从而避免传统固定权重方法对难以优化目标的忽视,实验表明该方法在整体和各目标上均取得了更高胜率。

源自 arXiv: 2604.20685