跨语言套娃表示学习:跨越语音与文本 / Cross-lingual Matryoshka Representation Learning across Speech and Text
1️⃣ 一句话总结
这项研究开发了一种创新的双语语音-文本嵌入模型,让使用者能用沃洛夫语(一种主要靠口语传播的语言)的语音直接检索法语文本信息,绕过了传统上昂贵且复杂的语音识别和翻译步骤,为资源匮乏的语言群体打破了信息和模态的双重壁垒。
Speakers of under-represented languages face both a language barrier, as most online knowledge is in a few dominant languages, and a modality barrier, since information is largely text-based while many languages are primarily oral. We address this for French-Wolof by training the first bilingual speech-text Matryoshka embedding model, enabling efficient retrieval of French text from Wolof speech queries without relying on a costly ASR-translation pipelines. We introduce large-scale data curation pipelines and new benchmarks, compare modeling strategies, and show that modality fusion within a frozen text Matryoshka model performs best. Although trained only for retrieval, the model generalizes well to other tasks, such as speech intent detection, indicating the learning of general semantic representations. Finally, we analyze cost-accuracy trade-offs across Matryoshka dimensions and ranks, showing that information is concentrated only in a few components, suggesting potential for efficiency improvements.
跨语言套娃表示学习:跨越语音与文本 / Cross-lingual Matryoshka Representation Learning across Speech and Text
这项研究开发了一种创新的双语语音-文本嵌入模型,让使用者能用沃洛夫语(一种主要靠口语传播的语言)的语音直接检索法语文本信息,绕过了传统上昂贵且复杂的语音识别和翻译步骤,为资源匮乏的语言群体打破了信息和模态的双重壁垒。
源自 arXiv: 2602.19991