相对时序差分学习的稳定性与敏感性分析:扩展版 / Stability and Sensitivity Analysis of Relative Temporal-Difference Learning: Extended Version
1️⃣ 一句话总结
这篇论文证明了在使用线性函数近似时,通过巧妙选择基线分布,相对时序差分学习算法在任何折扣因子下都能保持稳定,并且其估计结果的偏差和协方差在折扣因子接近1时也不会无限增大。
Relative temporal-difference (TD) learning was introduced to mitigate the slow convergence of TD methods when the discount factor approaches one by subtracting a baseline from the temporal-difference update. While this idea has been studied in the tabular setting, stability guarantees with function approximation remain poorly understood. This paper analyzes relative TD learning with linear function approximation. We establish stability conditions for the algorithm and show that the choice of baseline distribution plays a central role. In particular, when the baseline is chosen as the empirical distribution of the state-action process, the algorithm is stable for any non-negative baseline weight and any discount factor. We also provide a sensitivity analysis of the resulting parameter estimates, characterizing both asymptotic bias and covariance. The asymptotic covariance and asymptotic bias are shown to remain uniformly bounded as the discount factor approaches one.
相对时序差分学习的稳定性与敏感性分析:扩展版 / Stability and Sensitivity Analysis of Relative Temporal-Difference Learning: Extended Version
这篇论文证明了在使用线性函数近似时,通过巧妙选择基线分布,相对时序差分学习算法在任何折扣因子下都能保持稳定,并且其估计结果的偏差和协方差在折扣因子接近1时也不会无限增大。
源自 arXiv: 2603.27874