基于拍卖机制的在线策略自适应方法应对动态变化目标 / Auction-Based Online Policy Adaptation for Evolving Objectives
1️⃣ 一句话总结
这篇论文提出了一种像拍卖会一样让多个目标‘竞价’决策的智能体学习框架,当任务目标动态增减时,只需简单增删对应模块即可快速适应,无需重新训练整个系统。
We consider multi-objective reinforcement learning problems where objectives come from an identical family -- such as the class of reachability objectives -- and may appear or disappear at runtime. Our goal is to design adaptive policies that can efficiently adjust their behaviors as the set of active objectives changes. To solve this problem, we propose a modular framework where each objective is supported by a selfish local policy, and coordination is achieved through a novel auction-based mechanism: policies bid for the right to execute their actions, with bids reflecting the urgency of the current state. The highest bidder selects the action, enabling a dynamic and interpretable trade-off among objectives. Going back to the original adaptation problem, when objectives change, the system adapts by simply adding or removing the corresponding policies. Moreover, as objectives arise from the same family, identical copies of a parameterized policy can be deployed, facilitating immediate adaptation at runtime. We show how the selfish local policies can be computed by turning the problem into a general-sum game, where the policies compete against each other to fulfill their own objectives. To succeed, each policy must not only optimize its own objective, but also reason about the presence of other goals and learn to produce calibrated bids that reflect relative priority. In our implementation, the policies are trained concurrently using proximal policy optimization (PPO). We evaluate on Atari Assault and a gridworld-based path-planning task with dynamic targets. Our method achieves substantially better performance than monolithic policies trained with PPO.
基于拍卖机制的在线策略自适应方法应对动态变化目标 / Auction-Based Online Policy Adaptation for Evolving Objectives
这篇论文提出了一种像拍卖会一样让多个目标‘竞价’决策的智能体学习框架,当任务目标动态增减时,只需简单增删对应模块即可快速适应,无需重新训练整个系统。
源自 arXiv: 2604.02151