菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-03
📄 Abstract - A Covering Framework for Offline POMDPs Learning using Belief Space Metric

In off policy evaluation (OPE) for partially observable Markov decision processes (POMDPs), an agent must infer hidden states from past observations, which exacerbates both the curse of horizon and the curse of memory in existing OPE methods. This paper introduces a novel covering analysis framework that exploits the intrinsic metric structure of the belief space (distributions over latent states) to relax traditional coverage assumptions. By assuming value relevant functions are Lipschitz continuous in the belief space, we derive error bounds that mitigate exponential blow ups in horizon and memory length. Our unified analysis technique applies to a broad class of OPE algorithms, yielding concrete error bounds and coverage requirements expressed in terms of belief space metrics rather than raw history coverage. We illustrate the improved sample efficiency of this framework via case studies: the double sampling Bellman error minimization algorithm, and the memory based future dependent value functions (FDVF). In both cases, our coverage definition based on the belief space metric yields tighter bounds.

顶级标签: theory reinforcement learning machine learning
详细标签: off-policy evaluation pomdp belief space covering analysis sample efficiency 或 搜索:

基于信念空间度量的离线POMDP学习覆盖框架 / A Covering Framework for Offline POMDPs Learning using Belief Space Metric


1️⃣ 一句话总结

这篇论文提出了一个利用信念空间(对隐藏状态的估计分布)的几何结构来分析和改进离线策略评估的新框架,它通过更宽松的假设条件,显著缓解了传统方法中因决策步骤长和记忆要求高而导致的误差爆炸问题,从而提高了样本效率。

源自 arXiv: 2603.03191