菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-04
📄 Abstract - Beyond Specialization: Robust Reinforcement Learning Navigation via Procedural Map Generators

Deep reinforcement learning (DRL) navigation policies often overfit to the structure of their training environments, as environmental diversity is typically constrained by the manual effort required to design diverse scenarios. While procedural map generation offers scalable diversity, no prior work systematically compares how different generator types affect policy generalization. We integrate four generators (sparse, maze, graph, and Wave Function Collapse) with guaranteed navigability into MuRoSim, a 2D simulator focusing on training efficiency for LiDAR-based navigation. We cross-evaluate five navigation policies on 1000 seeded maps per generator across three training seeds. Results show a strongly asymmetric cross-generator transfer: a specialist trained on sparse layouts falls to 3.3% success on mazes, whereas a policy trained on the combined generator set achieves 91.5 +/- 1.1% mean success. We further demonstrate that A* path-planner subgoal inputs are the dominant factor for robustness, raising success from the 90.2 +/- 1.4% feedforward baseline to 98.9 +/- 0.4% and outperforming GRU recurrence, which only improves the reactive baseline. The DRL policies outperform a classical Carrot+A* controller, which matches their success only at low speeds (1.0 m/s) but collapses to 24.9% at 2.0 m/s. This highlights learned speed adaptation as the decisive advantage of the learned approach. Real-world experiments on a RoboMaster confirm sim-to-real transfer in a cluttered arena, while a maze-like layout exposes remaining failure modes that recurrence helps mitigate.

顶级标签: reinforcement learning robotics agents
详细标签: navigation procedural generation transfer learning sim-to-real path planning 或 搜索:

超越专长:通过程序化地图生成器实现鲁棒的强化学习导航 / Beyond Specialization: Robust Reinforcement Learning Navigation via Procedural Map Generators


1️⃣ 一句话总结

本文通过组合四种不同风格的程序化地图生成器(稀疏、迷宫、图与波函数坍缩)训练强化学习导航策略,发现混合训练能大幅提升泛化能力,实验表明高等级路径规划辅助输入和速度自适应是策略鲁棒性的关键,并在真实机器人上验证了可行性与潜在失败模式。

源自 arXiv: 2605.02528