MAVRL:通过摊销变分推断从多种反馈类型中学习奖励函数 / MAVRL: Learning Reward Functions from Multiple Feedback Types with Amortized Variational Inference
1️⃣ 一句话总结
这篇论文提出了一种名为MAVRL的新方法,能够像侦探综合多种线索一样,将人类提供的演示、比较、评分和停止等不同形式的反馈统一起来,自动学习出更准确、更鲁棒的奖励函数,从而帮助AI智能体更好地理解任务并做出决策。
Reward learning typically relies on a single feedback type or combines multiple feedback types using manually weighted loss terms. Currently, it remains unclear how to jointly learn reward functions from heterogeneous feedback types such as demonstrations, comparisons, ratings, and stops that provide qualitatively different signals. We address this challenge by formulating reward learning from multiple feedback types as Bayesian inference over a shared latent reward function, where each feedback type contributes information through an explicit likelihood. We introduce a scalable amortized variational inference approach that learns a shared reward encoder and feedback-specific likelihood decoders and is trained by optimizing a single evidence lower bound. Our approach avoids reducing feedback to a common intermediate representation and eliminates the need for manual loss balancing. Across discrete and continuous-control benchmarks, we show that jointly inferred reward posteriors outperform single-type baselines, exploit complementary information across feedback types, and yield policies that are more robust to environment perturbations. The inferred reward uncertainty further provides interpretable signals for analyzing model confidence and consistency across feedback types.
MAVRL:通过摊销变分推断从多种反馈类型中学习奖励函数 / MAVRL: Learning Reward Functions from Multiple Feedback Types with Amortized Variational Inference
这篇论文提出了一种名为MAVRL的新方法,能够像侦探综合多种线索一样,将人类提供的演示、比较、评分和停止等不同形式的反馈统一起来,自动学习出更准确、更鲁棒的奖励函数,从而帮助AI智能体更好地理解任务并做出决策。
源自 arXiv: 2602.15206