综合POMDP策略:通过学习结合采样与模型检测 / Synthesizing POMDP Policies: Sampling Meets Model-checking via Learning
1️⃣ 一句话总结
本文提出一种新框架,通过结合采样、自动机学习和模型检测技术,为部分可观测马尔可夫决策过程合成带有形式化正确性保证的有限状态控制器,在特定条件下比纯形式化方法更具可扩展性。
Partially Observable Markov Decision Processes (POMDPs) are the standard framework for decision-making under uncertainty. While sampling-based methods scale well, they lack formal correctness guarantees, making them unsuitable for safety-critical applications. Conversely, formal synthesis techniques provide correctness-by-construction but often struggle with scalability, as general POMDP synthesis is undecidable. To bridge this gap, we propose a synthesis framework that integrates sampling, automata learning, and model-checking. Inspired by Angluin's $L^*$ algorithm, our approach utilizes sampling as a membership oracle and model-checking as an equivalence oracle. This enables the synthesis of finite-state controllers with formal guarantees, provided the sampling-induced policy is regular. We establish a relative completeness result for this framework. Experimental results from our prototypical implementation demonstrate that this method successfully solves threshold-safety problems that remain challenging for existing formal synthesis tools. We believe our algorithm serves as a valuable component in a portfolio approach to tackling the inherent difficulty of POMDP synthesis problems.
综合POMDP策略:通过学习结合采样与模型检测 / Synthesizing POMDP Policies: Sampling Meets Model-checking via Learning
本文提出一种新框架,通过结合采样、自动机学习和模型检测技术,为部分可观测马尔可夫决策过程合成带有形式化正确性保证的有限状态控制器,在特定条件下比纯形式化方法更具可扩展性。
源自 arXiv: 2605.14440