基于模型的强化学习中交叉拟合的近端学习 / Cross-fitted Proximal Learning for Model-Based Reinforcement Learning
1️⃣ 一句话总结
这篇论文提出了一种新的交叉拟合估计方法,用于更准确地估计在存在隐藏混淆变量的离线强化学习环境中的关键模型组件,从而提升基于模型的决策效果。
Model-based reinforcement learning is attractive for sequential decision-making because it explicitly estimates reward and transition models and then supports planning through simulated rollouts. In offline settings with hidden confounding, however, models learned directly from observational data may be biased. This challenge is especially pronounced in partially observable systems, where latent factors may jointly affect actions, rewards, and future observations. Recent work has shown that policy evaluation in such confounded partially observable Markov decision processes (POMDPs) can be reduced to estimating reward-emission and observation-transition bridge functions satisfying conditional moment restrictions (CMRs). In this paper, we study the statistical estimation of these bridge functions. We formulate bridge learning as a CMR problem with nuisance objects given by a conditional mean embedding and a conditional density. We then develop a $K$-fold cross-fitted extension of the existing two-stage bridge estimator. The proposed procedure preserves the original bridge-based identification strategy while using the available data more efficiently than a single sample split. We also derive an oracle-comparator bound for the cross-fitted estimator and decompose the resulting error into a Stage I term induced by nuisance estimation and a Stage II term induced by empirical averaging.
基于模型的强化学习中交叉拟合的近端学习 / Cross-fitted Proximal Learning for Model-Based Reinforcement Learning
这篇论文提出了一种新的交叉拟合估计方法,用于更准确地估计在存在隐藏混淆变量的离线强化学习环境中的关键模型组件,从而提升基于模型的决策效果。
源自 arXiv: 2604.05185