Text-only adaptation in LLM-based ASR through text denoising

📄 Abstract - Text-only adaptation in LLM-based ASR through text denoising

Adapting automatic speech recognition (ASR) systems based on large language models (LLMs) to new domains using text-only data is a significant yet underexplored challenge. Standard fine-tuning of the LLM on target-domain text often disrupts the critical alignment between speech and text modalities learned by the projector, degrading performance. We introduce a novel text-only adaptation method that emulates the audio projection task by treating it as a text denoising task. Our approach thus trains the LLM to recover clean transcripts from noisy inputs. This process effectively adapts the model to a target domain while preserving cross-modal alignment. Our solution is lightweight, requiring no architectural changes or additional parameters. Extensive evaluation on two datasets demonstrates up to 22.1% relative improvement, outperforming recent state-of-the-art text-only adaptation methods.

基于大语言模型的语音识别系统通过文本去噪实现纯文本适应 / Text-only adaptation in LLM-based ASR through text denoising

1️⃣ 一句话总结

这篇论文提出了一种新颖的纯文本适应方法，通过将语音投影任务模拟为文本去噪任务，让大语言模型从带噪声的文本中恢复干净的转录，从而在无需修改模型结构或增加参数的情况下，有效适应新领域并保持语音与文本的对齐，显著提升了语音识别性能。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要