菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-27
📄 Abstract - Variance-Adaptive Optimal Algorithm for Reinforcement Learning with Multinomial Logit Function Approximation

Reinforcement learning with multinomial logistic (MNL) function approximation has become an important framework due to its flexibility and broad applicability. While existing studies have established regret guarantees under worst-case analysis, they do not capture how performance depends on the variability of the interaction between the learner and the environment. In this paper, we develop a new theoretical analysis for MNL-based Markov decision processes that yields explicit variance-adaptive regret bounds. Our algorithm is computationally efficient and achieves the instance-wise optimal rate of regret, narrowing the gap between upper and lower bounds. Our numerical experiments validate that our method learns optimal policies more efficiently than conventional approaches.

顶级标签: reinforcement learning machine learning
详细标签: multinomial logit variance-adaptive regret bounds function approximation optimal algorithm 或 搜索:

基于多项逻辑函数逼近的强化学习方差自适应优化算法 / Variance-Adaptive Optimal Algorithm for Reinforcement Learning with Multinomial Logit Function Approximation


1️⃣ 一句话总结

该论文提出了一种新的强化学习算法,能够根据学习过程中环境互动的变化程度自适应调整策略,在多项逻辑函数逼近下实现了理论上最优的遗憾界,并通过实验证明其比传统方法更高效地学习最优策略。

源自 arXiv: 2605.28364