理解与改进双曲深度强化学习 / Understanding and Improving Hyperbolic Deep Reinforcement Learning
1️⃣ 一句话总结
这篇论文通过分析双曲空间模型在强化学习中训练不稳定的根本原因,提出了一种名为Hyper++的新方法,通过改进价值函数、特征正则化和网络层设计,成功实现了更稳定、更高效且性能更强的双曲深度强化学习智能体。
The performance of reinforcement learning (RL) agents depends critically on the quality of the underlying feature representations. Hyperbolic feature spaces are well-suited for this purpose, as they naturally capture hierarchical and relational structure often present in complex RL environments. However, leveraging these spaces commonly faces optimization challenges due to the nonstationarity of RL. In this work, we identify key factors that determine the success and failure of training hyperbolic deep RL agents. By analyzing the gradients of core operations in the Poincaré Ball and Hyperboloid models of hyperbolic geometry, we show that large-norm embeddings destabilize gradient-based training, leading to trust-region violations in proximal policy optimization (PPO). Based on these insights, we introduce Hyper++, a new hyperbolic PPO agent that consists of three components: (i) stable critic training through a categorical value loss instead of regression; (ii) feature regularization guaranteeing bounded norms while avoiding the curse of dimensionality from clipping; and (iii) using a more optimization-friendly formulation of hyperbolic network layers. In experiments on ProcGen, we show that Hyper++ guarantees stable learning, outperforms prior hyperbolic agents, and reduces wall-clock time by approximately 30%. On Atari-5 with Double DQN, Hyper++ strongly outperforms Euclidean and hyperbolic baselines. We release our code at this https URL .
理解与改进双曲深度强化学习 / Understanding and Improving Hyperbolic Deep Reinforcement Learning
这篇论文通过分析双曲空间模型在强化学习中训练不稳定的根本原因,提出了一种名为Hyper++的新方法,通过改进价值函数、特征正则化和网络层设计,成功实现了更稳定、更高效且性能更强的双曲深度强化学习智能体。
源自 arXiv: 2512.14202