AudioRouter:一种基于强化学习的双推理框架,实现数据高效音频理解 / AudioRouter: Data Efficient Audio Understanding via RL based Dual Reasoning
1️⃣ 一句话总结
这篇论文提出了一种名为AudioRouter的强化学习框架,它让大型音频语言模型学会智能地决定何时以及如何使用外部音频工具来辅助推理,从而在极少训练数据下显著提升对声音细节的理解能力,避免了传统方法需要海量数据训练的弊端。
Large Audio Language Models (LALMs) have demonstrated strong capabilities in audio understanding and reasoning. However, their performance on fine grained auditory perception remains unreliable, and existing approaches largely rely on data intensive training to internalize perceptual abilities. We propose AudioRouter, a reinforcement learning framework that enables LALMs to improve audio understanding by learning when and how to use external audio tools. Rather than tightly coupling tool usage with audio reasoning, AudioRouter formulates tool use as an explicit decision making problem and optimizes a lightweight routing policy while keeping the underlying reasoning model frozen. Experimental results show that AudioRouter achieves substantial improvements on standard audio understanding benchmarks while requiring up to 600x less training data to learn tool usage compared with conventional training paradigms. These findings suggest that learning effective tool usage offers a data efficient and scalable alternative to internalizing perceptual abilities in LALMs.
AudioRouter:一种基于强化学习的双推理框架,实现数据高效音频理解 / AudioRouter: Data Efficient Audio Understanding via RL based Dual Reasoning
这篇论文提出了一种名为AudioRouter的强化学习框架,它让大型音频语言模型学会智能地决定何时以及如何使用外部音频工具来辅助推理,从而在极少训练数据下显著提升对声音细节的理解能力,避免了传统方法需要海量数据训练的弊端。
源自 arXiv: 2602.10439