SAM Audio Judge: A Unified Multimodal Framework for Perceptual Evaluation of Audio Separation

📄 Abstract - SAM Audio Judge: A Unified Multimodal Framework for Perceptual Evaluation of Audio Separation

The performance evaluation remains a complex challenge in audio separation, and existing evaluation metrics are often misaligned with human perception, course-grained, relying on ground truth signals. On the other hand, subjective listening tests remain the gold standard for real-world evaluation, but they are expensive, time-consuming, and difficult to scale. This paper addresses the growing need for automated systems capable of evaluating audio separation without human intervention. The proposed evaluation metric, SAM Audio Judge (SAJ), is a multimodal fine-grained reference-free objective metric, which shows highly alignment with human perceptions. SAJ supports three audio domains (speech, music and general sound events) and three prompt inputs (text, visual and span), covering four different dimensions of evaluation (recall, percision, faithfulness, and overall). SAM Audio Judge also shows potential applications in data filtering, pseudo-labeling large datasets and reranking in audio separation models. We release our code and pre-trained models at: this https URL.

SAM音频评判官：用于音频分离感知评估的统一多模态框架 / SAM Audio Judge: A Unified Multimodal Framework for Perceptual Evaluation of Audio Separation

1️⃣ 一句话总结

这篇论文提出了一种名为SAM音频评判官（SAJ）的新型自动化评估系统，它无需人工参与或参考原始音频，就能像人类一样从多个维度精细地评判音频分离效果的好坏，解决了传统方法成本高、效率低且与主观听感不一致的问题。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要