自主水下导航中强化学习的任务特定子网络发现 / Task-specific Subnetwork Discovery in Reinforcement Learning for Autonomous Underwater Navigation
1️⃣ 一句话总结
本研究通过分析一个预训练的多任务强化学习水下导航模型,发现网络仅用约1.5%的权重来区分不同任务(如追踪不同物种),且这些关键权重主要连接输入层的环境变量和下一层,这一发现为设计更高效、可解释的水下机器人控制策略提供了重要线索。
Autonomous underwater vehicles are required to perform multiple tasks adaptively and in an explainable manner under dynamic, uncertain conditions and limited sensing, challenges that classical controllers struggle to address. This demands robust, generalizable, and inherently interpretable control policies for reliable long-term monitoring. Reinforcement learning, particularly multi-task RL, overcomes these limitations by leveraging shared representations to enable efficient adaptation across tasks and environments. However, while such policies show promising results in simulation and controlled experiments, they yet remain opaque and offer limited insight into the agent's internal decision-making, creating gaps in transparency, trust, and safety that hinder real-world deployment. The internal policy structure and task-specific specialization remain poorly understood. To address these gaps, we analyze the internal structure of a pretrained multi-task reinforcement learning network in the HoloOcean simulator for underwater navigation by identifying and comparing task-specific subnetworks responsible for navigating toward different species. We find that in a contextual multi-task reinforcement learning setting with related tasks, the network uses only about 1.5% of its weights to differentiate between tasks. Of these, approximately 85% connect the context-variable nodes in the input layer to the next hidden layer, highlighting the importance of context variables in such settings. Our approach provides insights into shared and specialized network components, useful for efficient model editing, transfer learning, and continual learning for underwater monitoring through a contextual multi-task reinforcement learning method.
自主水下导航中强化学习的任务特定子网络发现 / Task-specific Subnetwork Discovery in Reinforcement Learning for Autonomous Underwater Navigation
本研究通过分析一个预训练的多任务强化学习水下导航模型,发现网络仅用约1.5%的权重来区分不同任务(如追踪不同物种),且这些关键权重主要连接输入层的环境变量和下一层,这一发现为设计更高效、可解释的水下机器人控制策略提供了重要线索。
源自 arXiv: 2604.21640