← 返回列表

arXiv 提交日期: 2026-01-20

📄 Abstract - PRiSM: Benchmarking Phone Realization in Speech Models

Phone recognition (PR) serves as the atomic interface for language-agnostic modeling for cross-lingual speech processing and phonetic analysis. Despite prolonged efforts in developing PR systems, current evaluations only measure surface-level transcription accuracy. We introduce PRiSM, the first open-source benchmark designed to expose blind spots in phonetic perception through intrinsic and extrinsic evaluation of PR systems. PRiSM standardizes transcription-based evaluation and assesses downstream utility in clinical, educational, and multilingual settings with transcription and representation probes. We find that diverse language exposure during training is key to PR performance, encoder-CTC models are the most stable, and specialized PR models still outperform Large Audio Language Models. PRiSM releases code, recipes, and datasets to move the field toward multilingual speech models with robust phonetic ability: this https URL.

顶级标签: audio benchmark model evaluation

PRiSM：语音模型中音素实现的基准测试 / PRiSM: Benchmarking Phone Realization in Speech Models

1️⃣ 一句话总结

这篇论文提出了首个开源基准测试PRiSM，通过内在和外在评估来全面衡量音素识别系统的性能，发现训练时接触多语言数据是关键，并指出专业模型在语音感知能力上仍优于大型音频语言模型。

👋 没兴趣 ☆ 感兴趣 📌 待读

打开原文 PDF

源自 arXiv: 2601.14046

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要