菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-01-19
📄 Abstract - Typhoon ASR Real-time: FastConformer-Transducer for Thai Automatic Speech Recognition

Large encoder-decoder models like Whisper achieve strong offline transcription but remain impractical for streaming applications due to high latency. However, due to the accessibility of pre-trained checkpoints, the open Thai ASR landscape remains dominated by these offline architectures, leaving a critical gap in efficient streaming solutions. We present Typhoon ASR Real-time, a 115M-parameter FastConformer-Transducer model for low-latency Thai speech recognition. We demonstrate that rigorous text normalization can match the impact of model scaling: our compact model achieves a 45x reduction in computational cost compared to Whisper Large-v3 while delivering comparable accuracy. Our normalization pipeline resolves systemic ambiguities in Thai transcription --including context-dependent number verbalization and repetition markers (mai yamok) --creating consistent training targets. We further introduce a two-stage curriculum learning approach for Isan (north-eastern) dialect adaptation that preserves Central Thai performance. To address reproducibility challenges in Thai ASR, we release the Typhoon ASR Benchmark, a gold-standard human-labeled datasets with transcriptions following established Thai linguistic conventions, providing standardized evaluation protocols for the research community.

顶级标签: audio natural language processing model training
详细标签: speech recognition low-latency thai language text normalization benchmark dataset 或 搜索:

台风ASR实时版:用于泰语自动语音识别的FastConformer-Transducer模型 / Typhoon ASR Real-time: FastConformer-Transducer for Thai Automatic Speech Recognition


1️⃣ 一句话总结

这篇论文提出了一个名为Typhoon ASR Real-time的轻量级泰语实时语音识别模型,它通过创新的文本规范化处理和课程学习方法,在保证高准确率的同时,大幅降低了计算成本和延迟,并发布了标准化的泰语语音识别评测数据集以推动该领域研究。

源自 arXiv: 2601.13044