用于高效协同空间探索的混合信念强化学习 / Hybrid Belief Reinforcement Learning for Efficient Coordinated Spatial Exploration
1️⃣ 一句话总结
这篇论文提出了一种混合信念强化学习框架,通过结合概率模型的结构化学习和强化学习的自适应决策,让多个智能体(如无人机)能更高效、更协调地探索未知空间并提供服务,相比传统方法获得了更高的任务收益和更快的训练速度。
Coordinating multiple autonomous agents to explore and serve spatially heterogeneous demand requires jointly learning unknown spatial patterns and planning trajectories that maximize task performance. Pure model-based approaches provide structured uncertainty estimates but lack adaptive policy learning, while deep reinforcement learning often suffers from poor sample efficiency when spatial priors are absent. This paper presents a hybrid belief-reinforcement learning (HBRL) framework to address this gap. In the first phase, agents construct spatial beliefs using a Log-Gaussian Cox Process (LGCP) and execute information-driven trajectories guided by a Pathwise Mutual Information (PathMI) planner with multi-step lookahead. In the second phase, trajectory control is transferred to a Soft Actor-Critic (SAC) agent, warm-started through dual-channel knowledge transfer: belief state initialization supplies spatial uncertainty, and replay buffer seeding provides demonstration trajectories generated during LGCP exploration. A variance-normalized overlap penalty enables coordinated coverage through shared belief state, permitting cooperative sensing in high-uncertainty regions while discouraging redundant coverage in well-explored areas. The framework is evaluated on a multi-UAV wireless service provisioning task. Results show 10.8% higher cumulative reward and 38% faster convergence over baselines, with ablation studies confirming that dual-channel transfer outperforms either channel alone.
用于高效协同空间探索的混合信念强化学习 / Hybrid Belief Reinforcement Learning for Efficient Coordinated Spatial Exploration
这篇论文提出了一种混合信念强化学习框架,通过结合概率模型的结构化学习和强化学习的自适应决策,让多个智能体(如无人机)能更高效、更协调地探索未知空间并提供服务,相比传统方法获得了更高的任务收益和更快的训练速度。
源自 arXiv: 2603.03595