菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-01-25
📄 Abstract - RIR-Mega-Speech: A Reverberant Speech Corpus with Comprehensive Acoustic Metadata and Reproducible Evaluation

Despite decades of research on reverberant speech, comparing methods remains difficult because most corpora lack per-file acoustic annotations or provide limited documentation for reproduction. We present RIR-Mega-Speech, a corpus of approximately 117.5 hours created by convolving LibriSpeech utterances with roughly 5,000 simulated room impulse responses from the RIR-Mega collection. Every file includes RT60, direct-to-reverberant ratio (DRR), and clarity index ($C_{50}$) computed from the source RIR using clearly defined, reproducible procedures. We also provide scripts to rebuild the dataset and reproduce all evaluation results. Using Whisper small on 1,500 paired utterances, we measure 5.20% WER (95% CI: 4.69--5.78) on clean speech and 7.70% (7.04--8.35) on reverberant versions, corresponding to a paired increase of 2.50 percentage points (2.06--2.98). This represents a 48% relative degradation. WER increases monotonically with RT60 and decreases with DRR, consistent with prior perceptual studies. While the core finding that reverberation harms recognition is well established, we aim to provide the community with a standardized resource where acoustic conditions are transparent and results can be verified independently. The repository includes one-command rebuild instructions for both Windows and Linux environments.

顶级标签: audio data benchmark
详细标签: speech corpus reverberation acoustic metadata speech recognition reproducible evaluation 或 搜索:

RIR-Mega-Speech:一个包含全面声学元数据且可复现评估的混响语音语料库 / RIR-Mega-Speech: A Reverberant Speech Corpus with Comprehensive Acoustic Metadata and Reproducible Evaluation


1️⃣ 一句话总结

这篇论文创建了一个名为RIR-Mega-Speech的新型混响语音数据集,它通过为每个语音文件提供精确的声学参数(如混响时间)和完整的重建脚本,解决了以往研究中数据标注不清、结果难以复现的问题,旨在为语音处理领域提供一个透明、可验证的标准评估资源。

源自 arXiv: 2601.19949