菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-18
📄 Abstract - BAT: Better Audio Transformer Guided by Convex Gated Probing

Probing is widely adopted in computer vision to faithfully evaluate self-supervised learning (SSL) embeddings, as fine-tuning may misrepresent their inherent quality. In contrast, audio SSL models still rely on fine-tuning because simple probing fails to unlock their full potential and alters their rankings when competing for SOTA on AudioSet. Hence, a robust and efficient probing mechanism is required to guide the trajectory of audio SSL towards reliable and reproducible methods. We introduce Convex Gated Probing (CGP), a prototype-based method that drastically closes the gap between fine-tuning and probing in audio. CGP efficiently utilizes all frozen layers via a gating mechanism and exposes the location of latent task-relevant information. Guided by CGP, we rework the entire SSL pipeline of current SOTA audio models that use legacy implementations of prior SSL methods. By refining data preprocessing, model architecture, and pre-training recipe, we introduce Better Audio Transformer (BAT), and establish new SOTA on audio benchmarks.

顶级标签: audio model training model evaluation
详细标签: self-supervised learning probing methods audio representation transformer audio classification 或 搜索:

BAT:基于凸门控探测引导的更好音频Transformer / BAT: Better Audio Transformer Guided by Convex Gated Probing


1️⃣ 一句话总结

这篇论文提出了一种名为“凸门控探测”的新方法,它能更准确地评估音频自监督学习模型的真实能力,并以此为指导,通过改进数据处理、模型结构和训练方案,构建了一个性能更好的音频Transformer模型,在多个音频基准测试中取得了新的最佳成绩。

源自 arXiv: 2602.16305