菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-15
📄 Abstract - Diffusion Language Models for Speech Recognition

Diffusion language models have recently emerged as a leading alternative to standard language models, due to their ability for bidirectional attention and parallel text generation. In this work, we explore variants for their use in speech recognition. Specifically, we introduce a comprehensive guide to incorporating masked diffusion language models (MDLM) and uniform-state diffusion models (USDMs) for rescoring ASR hypotheses. Additionally, we design a new joint-decoding method that combines CTC and USDM by integrating the framewise probability distributions derived from CTC with the labelwise probability distributions computed by USDM at each decoding step, thereby generating new candidates that combine strong language knowledge from USDM and acoustic information from CTC. Our findings reveal that USDM, as well as MDLM, can significantly improve the accuracy of recognized text. We publish all our code and recipes.

顶级标签: natural language processing audio model training
详细标签: speech recognition diffusion language models asr rescoring ctc decoding joint decoding 或 搜索:

用于语音识别的扩散语言模型 / Diffusion Language Models for Speech Recognition


1️⃣ 一句话总结

这篇论文探索了如何将扩散语言模型应用于语音识别,通过引入新的重打分和联合解码方法,有效结合了模型的强大语言知识与声学信息,显著提升了语音识别的准确率。

源自 arXiv: 2604.14001