具有外生动态的马尔可夫决策过程学习 / Learning in Markov Decision Processes with Exogenous Dynamics
1️⃣ 一句话总结
这篇论文提出了一种针对特定结构(部分状态变量不受智能体控制而独立演化)的强化学习方法,通过利用这种结构显著提升了学习效率,并在理论和实验中验证了其优越性。
Reinforcement learning algorithms are typically designed for generic Markov Decision Processes (MDPs), where any state-action pair can lead to an arbitrary transition distribution. In many practical systems, however, only a subset of the state variables is directly influenced by the agent's actions, while the remaining components evolve according to exogenous dynamics and account for most of the stochasticity. In this work, we study a structured class of MDPs characterized by exogenous state components whose transitions are independent of the agent's actions. We show that exploiting this structure yields significantly improved learning guarantees, with only the size of the exogenous state space appearing in the leading terms of the regret bounds. We further establish a matching lower bound, showing that this dependence is information-theoretically optimal. Finally, we empirically validate our approach across classical toy settings and real-world-inspired environments, demonstrating substantial gains in sample efficiency compared to standard reinforcement learning methods.
具有外生动态的马尔可夫决策过程学习 / Learning in Markov Decision Processes with Exogenous Dynamics
这篇论文提出了一种针对特定结构(部分状态变量不受智能体控制而独立演化)的强化学习方法,通过利用这种结构显著提升了学习效率,并在理论和实验中验证了其优越性。
源自 arXiv: 2603.02862