有偏的梦境:潜在空间模型中认知不确定性量化的局限性 / Biased Dreams: Limitations to Epistemic Uncertainty Quantification in Latent Space Models
1️⃣ 一句话总结
该论文揭示了一种基于潜在空间模型的强化学习算法(如Dreamer系列)在预测未来状态时,会不自觉地偏向于它已经熟悉的区域,而忽略环境中的实际变化,从而导致其对自身预测的“不确定性”判断失效,并过高估计未来可能获得的奖励。
Model-Based Reinforcement Learning distinguishes between physical dynamics models operating on proprioceptive inputs and latent dynamics models operating on high-dimensional image observations. A prominent latent approach is the Recurrent State Space Model used in the Dreamer family. While epistemic uncertainty quantification to inform exploration and mitigate model exploitation is well established for physical dynamics models, its transfer to latent dynamics models has received limited scrutiny. We empirically demonstrate that latent transitions are biased toward well-represented regions of latent space, exhibiting an attractor behavior that can deviate from true environment dynamics. As a result, discrepancies in environment dynamics may not manifest in latent space, undermining the reliability of epistemic uncertainty estimates. Because these attractors often lie in high-reward regions, latent rollouts systematically overestimate predicted rewards. Our findings highlight key limitations of epistemic uncertainty estimation in latent dynamics models and motivate more critical evaluation of this method.
有偏的梦境:潜在空间模型中认知不确定性量化的局限性 / Biased Dreams: Limitations to Epistemic Uncertainty Quantification in Latent Space Models
该论文揭示了一种基于潜在空间模型的强化学习算法(如Dreamer系列)在预测未来状态时,会不自觉地偏向于它已经熟悉的区域,而忽略环境中的实际变化,从而导致其对自身预测的“不确定性”判断失效,并过高估计未来可能获得的奖励。
源自 arXiv: 2604.25416