← 返回列表

arXiv 提交日期: 2026-03-18

📄 Abstract - Modeling Overlapped Speech with Shuffles

We propose to model parallel streams of data, such as overlapped speech, using shuffles. Specifically, this paper shows how the shuffle product and partial order finite-state automata (FSAs) can be used for alignment and speaker-attributed transcription of overlapped speech. We train using the total score on these FSAs as a loss function, marginalizing over all possible serializations of overlapping sequences at subword, word, and phrase levels. To reduce graph size, we impose temporal constraints by constructing partial order FSAs. We address speaker attribution by modeling (token, speaker) tuples directly. Viterbi alignment through the shuffle product FSA directly enables one-pass alignment. We evaluate performance on synthetic LibriSpeech overlaps. To our knowledge, this is the first algorithm that enables single-pass alignment of multi-talker recordings. All algorithms are implemented using k2 / Icefall.

顶级标签: audio natural language processing systems

使用重排操作对重叠语音进行建模 / Modeling Overlapped Speech with Shuffles

1️⃣ 一句话总结

这篇论文提出了一种利用‘重排’概念和部分有序有限状态自动机的新方法，首次实现了对多人同时说话的重叠录音进行单次对齐和说话人归属转录。

👋 没兴趣 ☆ 感兴趣 📌 待读

打开原文 PDF

源自 arXiv: 2603.17769

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要