菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-02
📄 Abstract - Acoustic and perceptual differences between standard and accented Chinese speech and their voice clones

Voice cloning is often evaluated in terms of overall quality, but less is known about accent preservation and its perceptual consequences. We compare standard and heavily accented Mandarin speech and their voice clones using a combined computational and perceptual design. Embedding-based analyses show no reliable accented-standard difference in original-clone distances across systems. In the perception study, clones are rated as more similar to their originals for standard than for accented speakers, and intelligibility increases from original to clone, with a larger gain for accented speech. These results show that accent variation can shape perceived identity match and intelligibility in voice cloning even when it is not reflected in an off-the-shelf speaker-embedding distance, and they motivate evaluating speaker identity preservation and accent preservation as separable dimensions.

顶级标签: audio natural language processing model evaluation
详细标签: voice cloning accent preservation speaker similarity intelligibility perceptual evaluation 或 搜索:

标准与带口音中文语音及其语音克隆之间的声学与感知差异 / Acoustic and perceptual differences between standard and accented Chinese speech and their voice clones


1️⃣ 一句话总结

这项研究发现,语音克隆技术在处理带口音的普通话时,虽然客观声学差异不大,但克隆语音在听感上与原声的相似度会降低,同时其清晰度反而会得到比标准语音更大的提升。

源自 arXiv: 2604.01562