← 返回列表

arXiv 提交日期: 2026-06-02

📄 Abstract - BaltiVoice: A Speech Corpus and Fine-tuned Whisper ASR System for the Balti Language

We present BaltiVoice, a 16.8-hour read-speech corpus for Balti (ISO 639-3: bft), a Tibetic language spoken in Gilgit-Baltistan, Pakistan, with no prior publicly available ASR resources. The corpus contains 10,060 validated utterances in native Nastaliq script, derived from Mozilla Common Voice recordings. We fine-tune OpenAI Whisper-small on this corpus and report a Word Error Rate (WER) of 30.07% on a held-out validation set of 538 utterances, down from a measured zero-shot baseline of 182.18% for Whisper-small on Balti. The dataset, fine-tuned model, and a live transcription demo are publicly available on HuggingFace.

顶级标签: audio natural language processing machine learning

BaltiVoice：巴尔蒂语语音语料库及基于Whisper微调的语音识别系统 / BaltiVoice: A Speech Corpus and Fine-tuned Whisper ASR System for the Balti Language

1️⃣ 一句话总结

本文构建了首个公开的巴尔蒂语语音语料库（16.8小时），并通过微调Whisper-small模型将其语音识别词错误率从182.18%大幅降低至30.07%，为这一资源匮乏的藏语方言提供了完整的开源语音识别解决方案。

👋 没兴趣 ☆ 感兴趣 📌 待读

打开原文 PDF

源自 arXiv: 2606.03504

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要