SURF:引导标量化权重以均匀遍历帕累托前沿 / SURF: Steering the Scalarization Weight to Uniformly Traverse the Pareto Front
1️⃣ 一句话总结
本文提出一种名为SURF的方法,通过分析权重变化时帕累托前沿遍历速度不均匀的原因,设计出能根据目标函数分布自动调整权重采样规则的算法,从而让多目标优化产生的解更均匀地覆盖整个最优解范围,并在多目标赌博机、强化学习和大模型对齐等任务中取得更好效果。
Scalarization is widely used in multi-objective optimization owing to its simplicity and scalability. In many applications, the goal is to generate solutions that represent diverse user preferences, ideally with uniform coverage of the Pareto front (PF). However, uniformly sampling scalarization weights usually induces non-uniform coverage of the PF. We explain this mismatch through a geometric analysis of the scalarization path. As the scalarization weight varies, the corresponding solutions trace the PF with a generally non-uniform traversal speed. This speed induces an arc-length cumulative distribution function (CDF); inverting this CDF map yields a principled rule for selecting weights that produce uniform PF coverage. Building on this insight, we propose SURF (Sampling Uniformly along the PaReto Front). For structured problems, including bi-objective bandits, we derive closed-form expressions for this CDF map and the resulting PF-aware weight sampling rule. For general problems, SURF alternates between CDF reconstruction and weight sampling. Theoretically, we show that under provable conditions, SURF converges linearly to an unavoidable finite-sampling floor. Empirically, experiments on bandits, multi-objective-gymnasium, and multi-objective LLM alignment demonstrate that SURF efficiently achieves more uniform PF coverage than baselines.
SURF:引导标量化权重以均匀遍历帕累托前沿 / SURF: Steering the Scalarization Weight to Uniformly Traverse the Pareto Front
本文提出一种名为SURF的方法,通过分析权重变化时帕累托前沿遍历速度不均匀的原因,设计出能根据目标函数分布自动调整权重采样规则的算法,从而让多目标优化产生的解更均匀地覆盖整个最优解范围,并在多目标赌博机、强化学习和大模型对齐等任务中取得更好效果。
源自 arXiv: 2605.20619