Towards a Phonology-Informed Evaluation of Multilingual TTS

📄 Abstract - Towards a Phonology-Informed Evaluation of Multilingual TTS

Neural TTS systems can sound natural across languages, but naturalness does not guarantee the preservation of sound contrasts that distinguish words from their grammatical forms. Standard metrics like MOS do not test for this. We propose a classifier-based framework that audits TTS output against language-specific phonological patterns using human speech as a benchmark. Testing Assamese advanced tongue root (ATR) vowel harmony with Meta's MMS TTS, we show that a classifier trained on human speech transfers to synthesized speech with minimal loss. The faithfulness audit reveals that [+ATR] mid vowels are realized as [-ATR] in 1/3 tokens despite an underlying [+ATR] specification, a bias absent in human speech. At the word level, predicted ATR labels classify harmony more accurately than transcription labels, indicating a gap between intended and produced phonology. The framework offers task-specific diagnostics and generalizes to other phonological contrasts with measurable acoustic cues.

面向语音学的多语言文本转语音系统评估 / Towards a Phonology-Informed Evaluation of Multilingual TTS

1️⃣ 一句话总结

本文提出了一种基于分类器的评估框架，通过对比人类语音的特定音系模式（如元音和谐），来检测多语言文本转语音系统是否准确重现了语言中用于区分词义和语法形式的关键声音对比，弥补了传统自然度评分（MOS）无法捕捉此类音系错误的不足。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要