菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-07-02
📄 Abstract - Towards a Phonology-Informed Evaluation of Multilingual TTS

Neural TTS systems can sound natural across languages, but naturalness does not guarantee the preservation of sound contrasts that distinguish words from their grammatical forms. Standard metrics like MOS do not test for this. We propose a classifier-based framework that audits TTS output against language-specific phonological patterns using human speech as a benchmark. Testing Assamese advanced tongue root (ATR) vowel harmony with Meta's MMS TTS, we show that a classifier trained on human speech transfers to synthesized speech with minimal loss. The faithfulness audit reveals that [+ATR] mid vowels are realized as [-ATR] in 1/3 tokens despite an underlying [+ATR] specification, a bias absent in human speech. At the word level, predicted ATR labels classify harmony more accurately than transcription labels, indicating a gap between intended and produced phonology. The framework offers task-specific diagnostics and generalizes to other phonological contrasts with measurable acoustic cues.

顶级标签: audio model evaluation natural language processing
详细标签: multilingual tts phonology classifier framework faithfulness audit vowel harmony 或 搜索:

面向语音学的多语言文本转语音系统评估 / Towards a Phonology-Informed Evaluation of Multilingual TTS


1️⃣ 一句话总结

本文提出了一种基于分类器的评估框架,通过对比人类语音的特定音系模式(如元音和谐),来检测多语言文本转语音系统是否准确重现了语言中用于区分词义和语法形式的关键声音对比,弥补了传统自然度评分(MOS)无法捕捉此类音系错误的不足。

源自 arXiv: 2607.01965