← 返回列表

arXiv 提交日期: 2026-04-12

📄 Abstract - BlasBench: An Open Benchmark for Irish Speech Recognition

Existing multilingual benchmarks include Irish among dozens of languages but apply no Irish-aware text normalisation, leaving reliable and reproducible ASR comparison impossible. We introduce BlasBench, an open evaluation harness that provides a standalone Irish-aware normaliser preserving fadas, lenition, and eclipsis; a reproducible scoring harness and per-utterance predictions released for all evaluated runs. We pilot this by benchmarking 12 systems across four architecture families on Common Voice ga-IE and FLEURS ga-IE. All Whisper variants exceed 100% WER through insertion-driven hallucination. Microsoft Azure reaches 22.2% WER on Common Voice and 57.5% on FLEURS; the best open model, Omnilingual ASR 7B, reaches 30.65% and 39.09% respectively. Models fine-tuned on Common Voice degrade 33-43 points moving to FLEURS, while massively multilingual models degrade only 7-10 - a generalisation gap that single-dataset evaluation misses.

顶级标签: audio benchmark natural language processing

BlasBench：爱尔兰语语音识别的开放基准测试 / BlasBench: An Open Benchmark for Irish Speech Recognition

1️⃣ 一句话总结

这篇论文提出了一个专门用于爱尔兰语语音识别评估的开放基准测试工具BlasBench，它通过引入爱尔兰语特有的文本规范化处理和可复现的评分框架，揭示了现有模型在该语言上的性能差异和跨数据集泛化问题。

👋 没兴趣 ☆ 感兴趣 📌 待读

打开原文 PDF

源自 arXiv: 2604.10736

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要