← 返回列表

arXiv 提交日期: 2026-04-15

📄 Abstract - Diffusion Language Models for Speech Recognition

Diffusion language models have recently emerged as a leading alternative to standard language models, due to their ability for bidirectional attention and parallel text generation. In this work, we explore variants for their use in speech recognition. Specifically, we introduce a comprehensive guide to incorporating masked diffusion language models (MDLM) and uniform-state diffusion models (USDMs) for rescoring ASR hypotheses. Additionally, we design a new joint-decoding method that combines CTC and USDM by integrating the framewise probability distributions derived from CTC with the labelwise probability distributions computed by USDM at each decoding step, thereby generating new candidates that combine strong language knowledge from USDM and acoustic information from CTC. Our findings reveal that USDM, as well as MDLM, can significantly improve the accuracy of recognized text. We publish all our code and recipes.

顶级标签: natural language processing audio model training

用于语音识别的扩散语言模型 / Diffusion Language Models for Speech Recognition

1️⃣ 一句话总结

这篇论文探索了如何将扩散语言模型应用于语音识别，通过引入新的重打分和联合解码方法，有效结合了模型的强大语言知识与声学信息，显著提升了语音识别的准确率。

👋 没兴趣 ☆ 感兴趣 📌 待读

打开原文 PDF

源自 arXiv: 2604.14001

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要