菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-19
📄 Abstract - Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning

Value decomposition is a core approach for cooperative multi-agent reinforcement learning (MARL). However, existing methods still rely on a single optimal action and struggle to adapt when the underlying value function shifts during training, often converging to suboptimal policies. To address this limitation, we propose Successive Sub-value Q-learning (S2Q), which learns multiple sub-value functions to retain alternative high-value actions. Incorporating these sub-value functions into a Softmax-based behavior policy, S2Q encourages persistent exploration and enables $Q^{\text{tot}}$ to adjust quickly to the changing optima. Experiments on challenging MARL benchmarks confirm that S2Q consistently outperforms various MARL algorithms, demonstrating improved adaptability and overall performance. Our code is available at this https URL.

顶级标签: multi-agents reinforcement learning model training
详细标签: value decomposition exploration multi-agent q-learning suboptimal actions softmax policy 或 搜索:

在多智能体强化学习中保留次优行动以追踪动态最优解 / Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning


1️⃣ 一句话总结

本文提出了一种名为S2Q的新方法,通过让智能体在学习时记住多个有价值的备选行动,有效解决了传统多智能体协作算法因环境变化而陷入次优策略的问题,从而提升了系统的适应性和整体表现。

源自 arXiv: 2602.17062