逆强化学习中潜在观测缺失的量化方法 / Quantifying Potential Observation Missingness in Inverse Reinforcement Learning
1️⃣ 一句话总结
本文提出了一种新方法,用于检测和量化行为数据中可能存在的观测缺失问题,帮助逆强化学习模型在医疗等实际场景中更准确地还原决策者的真实意图,避免因数据不完整而得出误导性结论。
Inverse reinforcement learning (IRL), which infers reward functions from demonstrations, is a valuable tool for modeling and understanding decision-making behavior. Many variants of IRL have been developed to capture complexities of human decision-making, such as subjective beliefs, imperfect planning, and dynamic goals. However, an often-overlooked issue in real-world behavioral datasets is that the recorded data may be missing observations that were available to the original decision-maker. In use-inspired settings such as healthcare, this can make expert actions appear suboptimal, even when they were near-optimal given the information available at the time. As a result, the rewards learned by standard IRL may be misleading. In this paper, we identify the minimal perturbations to the recorded observations needed for the expert's actions to appear optimal. We develop a practical algorithm for this problem and demonstrate its utility for quantifying the possible extent of missing observations in behavioral datasets through extensive experiments on synthetic navigation tasks, a cancer treatment simulator, and ICU treatment data.
逆强化学习中潜在观测缺失的量化方法 / Quantifying Potential Observation Missingness in Inverse Reinforcement Learning
本文提出了一种新方法,用于检测和量化行为数据中可能存在的观测缺失问题,帮助逆强化学习模型在医疗等实际场景中更准确地还原决策者的真实意图,避免因数据不完整而得出误导性结论。
源自 arXiv: 2605.12831