菜单

🤖 系统
📄 Abstract - Thinking While Listening: Simple Test Time Scaling For Audio Classification

We propose a framework that enables neural models to "think while listening" to everyday sounds, thereby enhancing audio classification performance. Motivated by recent advances in the reasoning capabilities of large language models, we address two central questions: (i) how can thinking be incorporated into existing audio classification pipelines to enable reasoning in the category space and improve performance, and (ii) can a new architecture be designed from the ground up to support both thinking and test-time scaling? We demonstrate that in both settings, our models exhibit improved classification accuracy. Leveraging test-time scaling, we observe consistent gains as the number of sampled traces increases. Furthermore, we evaluate two open-source reasoning models, GPT-OSS-20B and Qwen3-14B, showing that while such models are capable of zero-shot reasoning, a lightweight approach--retraining only the embedding matrix of a frozen, smaller model like GPT-2--can surpass the performance of billion-parameter text-based reasoning models.

顶级标签: audio model training model evaluation
详细标签: audio classification test-time scaling reasoning models neural networks embedding retraining 或 搜索:

📄 论文总结

边听边思考:音频分类的简单测试时扩展方法 / Thinking While Listening: Simple Test Time Scaling For Audio Classification


1️⃣ 一句话总结

这篇论文提出了一种让神经网络在识别日常声音时能够‘边听边思考’的方法,通过测试时扩展和推理机制,有效提升了音频分类的准确率,甚至用轻量级模型超越了大型语言模型的零样本推理性能。


📄 打开原文 PDF