菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-21
📄 Abstract - Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India

Existing Indic ASR benchmarks often use scripted, clean speech and leaderboard driven evaluation that encourages dataset specific overfitting. In addition, strict single reference WER penalizes natural spelling variation in Indian languages, including non standardized spellings of code-mixed English origin words. To address these limitations, we introduce Voice of India, a closed source benchmark built from unscripted telephonic conversations covering 15 major Indian languages across 139 regional clusters. The dataset contains 306230 utterances, totaling 536 hours of speech from 36691 speakers with transcripts accounting for spelling variations. We also analyze performance geographically at the district level, revealing disparities. Finally, we provide detailed analysis across factors such as audio quality, speaking rate, gender, and device type, highlighting where current ASR systems struggle and offering insights for improving real world Indic ASR systems.

顶级标签: machine learning benchmark audio
详细标签: automatic speech recognition indic languages evaluation real-world spelling variation 或 搜索:

印度之声:面向印度真实场景语音识别的大规模基准测试 / Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India


1️⃣ 一句话总结

本文构建了一个包含15种印度语言、来自真实电话对话的大规模语音识别基准数据集,并揭示了现有模型在拼写变体、地区差异和音频质量等方面的性能瓶颈,为改进实际场景下的印度语音识别系统提供了关键参考。

源自 arXiv: 2604.19151