Prototype-Based Disentanglement for Controllable Dysarthric Speech Synthesis

📄 Abstract - Prototype-Based Disentanglement for Controllable Dysarthric Speech Synthesis

Dysarthric speech exhibits high variability and limited labeled data, posing major challenges for both automatic speech recognition (ASR) and assistive speech technologies. Existing approaches rely on synthetic data augmentation or speech reconstruction, yet often entangle speaker identity with pathological articulation, limiting controllability and robustness. In this paper, we propose ProtoDisent-TTS, a prototype-based disentanglement TTS framework built on a pre-trained text-to-speech backbone that factorizes speaker timbre and dysarthric articulation within a unified latent space. A pathology prototype codebook provides interpretable and controllable representations of healthy and dysarthric speech patterns, while a dual-classifier objective with a gradient reversal layer enforces invariance of speaker embeddings to pathological attributes. Experiments on the TORGO dataset demonstrate that this design enables bidirectional transformation between healthy and dysarthric speech, leading to consistent ASR performance gains and robust, speaker-aware speech reconstruction.

基于原型的解耦可控构音障碍语音合成 / Prototype-Based Disentanglement for Controllable Dysarthric Speech Synthesis

1️⃣ 一句话总结

这篇论文提出了一种名为ProtoDisent-TTS的新方法，它能够将说话人的声音特质和构音障碍的发音特征分离开来，从而可以灵活地生成或转换健康语音与障碍语音，有效提升了语音识别和辅助技术的性能与可控性。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要