菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-23
📄 Abstract - $κ$-Explorer: A Unified Framework for Active Model Estimation in MDPs

In tabular Markov decision processes (MDPs) with perfect state observability, each trajectory provides active samples from the transition distributions conditioned on state-action pairs. Consequently, accurate model estimation depends on how the exploration policy allocates visitation frequencies in accordance with the intrinsic complexity of each transition distribution. Building on recent work on coverage-based exploration, we introduce a parameterized family of decomposable and concave objective functions $U_\kappa$ that explicitly incorporate both intrinsic estimation complexity and extrinsic visitation frequency. Moreover, the curvature $\kappa$ provides a unified treatment of various global objectives, such as the average-case and worst-case estimation error objectives. Using the closed-form characterization of the gradient of $U_\kappa$, we propose $\kappa$-Explorer, an active exploration algorithm that performs Frank-Wolfe-style optimization over state-action occupancy measures. The diminishing-returns structure of $U_\kappa$ naturally prioritizes underexplored and high-variance transitions, while preserving smoothness properties that enable efficient optimization. We establish tight regret guarantees for $\kappa$-Explorer and further introduce a fully online and computationally efficient surrogate algorithm for practical use. Experiments on benchmark MDPs demonstrate that $\kappa$-Explorer provides superior performance compared to existing exploration strategies.

顶级标签: reinforcement learning theory model training
详细标签: active exploration markov decision processes model estimation regret analysis frank-wolfe optimization 或 搜索:

κ-探索者:一个用于马尔可夫决策过程中主动模型估计的统一框架 / $κ$-Explorer: A Unified Framework for Active Model Estimation in MDPs


1️⃣ 一句话总结

这篇论文提出了一个名为κ-探索者的统一算法框架,它通过智能地分配探索资源来高效学习马尔可夫决策过程的环境模型,在平衡探索未知区域和降低模型估计误差方面优于现有方法。

源自 arXiv: 2602.20404