菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-12
📄 Abstract - Taming the Adversary: Stable Minimax Deep Deterministic Policy Gradient via Fractional Objectives

Reinforcement learning (RL) has achieved remarkable success in a wide range of control and decision-making tasks. However, RL agents often exhibit unstable or degraded performance when deployed in environments subject to unexpected external disturbances and model uncertainties. Consequently, ensuring reliable performance under such conditions remains a critical challenge. In this paper, we propose minimax deep deterministic policy gradient (MMDDPG), a framework for learning disturbance-resilient policies in continuous control tasks. The training process is formulated as a minimax optimization problem between a user policy and an adversarial disturbance policy. In this problem, the user learns a robust policy that minimizes the objective function, while the adversary generates disturbances that maximize it. To stabilize this interaction, we introduce a fractional objective that balances task performance and disturbance magnitude. This objective prevents excessively aggressive disturbances and promotes robust learning. Experimental evaluations in MuJoCo environments demonstrate that the proposed MMDDPG achieves significantly improved robustness against both external force perturbations and model parameter variations.

顶级标签: reinforcement learning agents model training
详细标签: robust rl adversarial training continuous control minimax optimization policy gradient 或 搜索:

驯服对抗者:通过分数目标实现稳定的极小极大深度确定性策略梯度 / Taming the Adversary: Stable Minimax Deep Deterministic Policy Gradient via Fractional Objectives


1️⃣ 一句话总结

这篇论文提出了一种新的强化学习方法,通过引入一个平衡任务表现与干扰强度的分数目标,让智能体在与模拟“对手”的对抗训练中,学习到更稳定、更能抵抗环境干扰和模型不确定性的控制策略。

源自 arXiv: 2603.12110