菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-26
📄 Abstract - Efficient Dialect-Aware Modeling and Conditioning for Low-Resource Taiwanese Hakka Speech Processing

Taiwanese Hakka is a low-resource, endangered language that poses significant challenges for automatic speech recognition (ASR), including high dialectal variability and the presence of two distinct writing systems (Hanzi and Pinyin). Traditional ASR models often encounter difficulties in this context, as they tend to conflate essential linguistic content with dialect-specific variations across both phonological and lexical dimensions. To address these challenges, we propose a unified framework grounded in the Recurrent Neural Network Transducers (RNN-T). Central to our approach is the introduction of dialect-aware modeling strategies designed to disentangle dialectal "style" from linguistic "content", which enhances the model's capacity to learn robust and generalized representations. Additionally, the framework employs parameter-efficient prediction networks to concurrently model ASR (Hanzi and Pinyin). We demonstrate that these tasks create a powerful synergy, wherein the cross-script objective serves as a mutual regularizer to improve the primary ASR tasks. Experiments conducted on the HAT corpus reveal that our model achieves 57.00% and 40.41% relative error rate reduction on Hanzi and Pinyin ASR, respectively. To our knowledge, this is the first systematic investigation into the impact of Hakka dialectal variations on ASR and the first single model capable of jointly addressing these tasks.

顶级标签: audio natural language processing model training
详细标签: speech recognition low-resource language dialect modeling multitask learning rnn transducer 或 搜索:

面向低资源台湾客家语语音处理的高效方言感知建模与条件化方法 / Efficient Dialect-Aware Modeling and Conditioning for Low-Resource Taiwanese Hakka Speech Processing


1️⃣ 一句话总结

这项研究提出了一种新的方言感知语音识别框架,能有效分离台湾客家语中的方言特征与语言内容,并首次用一个模型同时处理汉字和拼音两种书写系统,显著提升了低资源濒危语言的识别准确率。

源自 arXiv: 2602.22522