VorTEX: Various overlap ratio for Target speech EXtraction

📄 Abstract - VorTEX: Various overlap ratio for Target speech EXtraction

Target speech extraction (TSE) aims to recover a target speaker's voice from a mixture. While recent text-prompted approaches have shown promise, most approaches assume fully overlapped mixtures, limiting insight into behavior across realistic overlap ratios. We introduce VorTEX (Various overlap ratio for Target speech EXtraction), a text-prompted TSE architecture with a Decoupled Adaptive Multi-branch (DAM) Fusion block that separates primary extraction from auxiliary regularization pathways. To enable controlled analysis, we construct PORTE, a two-speaker dataset spanning overlap ratios from 0% to 100%. We further propose Suppression Ratio on Energy (SuRE), a diagnostic metric that detects suppression behavior not captured by conventional measures. Experiments show that existing models exhibit suppression or residual interference under overlap, whereas VorTEX achieves the highest separation fidelity across 20-100% overlap (e.g., 5.50 dB at 20% and 2.04 dB at 100%) while maintaining zero SuRE, indicating robust extraction without suppression-driven artifacts.

VorTEX：面向目标语音提取的多种重叠率研究 / VorTEX: Various overlap ratio for Target speech EXtraction

1️⃣ 一句话总结

这篇论文提出了一种名为VorTEX的新模型，它通过一种解耦的多分支融合技术，能够在说话人声音重叠比例从20%到100%的各种真实场景中，更稳健地提取出目标语音，同时避免了现有方法可能产生的抑制或残留干扰问题。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要