面向不完美感知智能体的区间POMDP防护机制 / Interval POMDP Shielding for Imperfect-Perception Agents
1️⃣ 一句话总结
本文提出了一种基于区间部分可观测马尔可夫决策过程的运行时安全防护方法,通过从有限标注数据中估计感知不确定性的置信区间,为感知系统提供有概率保证的安全决策屏障,实验表明该方法能有效降低不安全动作的发生概率。
Autonomous systems that rely on learned perception can make unsafe decisions when sensor readings are misclassified. We study shielding for this setting: given a proposed action, a shield blocks actions that could violate safety. We consider the common case where system dynamics are known but perception uncertainty must be estimated from finite labeled data. From these data we build confidence intervals for the probabilities of perception outcomes and use them to model the system as a finite Interval Partially Observable Markov Decision Process with discrete states and actions. We then propose an algorithm to compute a conservative set of beliefs over the underlying state that is consistent with the observations seen so far. This enables us to construct a runtime shield that comes with a finite-horizon guarantee: with high probability over the training data, if the true perception uncertainty rates lie within the learned intervals, then every action admitted by the shield satisfies a stated lower bound on safety. Experiments on four case studies show that our shielding approach (and variants derived from it) improves the safety of the system over state-of-the-art baselines.
面向不完美感知智能体的区间POMDP防护机制 / Interval POMDP Shielding for Imperfect-Perception Agents
本文提出了一种基于区间部分可观测马尔可夫决策过程的运行时安全防护方法,通过从有限标注数据中估计感知不确定性的置信区间,为感知系统提供有概率保证的安全决策屏障,实验表明该方法能有效降低不安全动作的发生概率。
源自 arXiv: 2604.20728