菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-16
📄 Abstract - VorTEX: Various overlap ratio for Target speech EXtraction

Target speech extraction (TSE) aims to recover a target speaker's voice from a mixture. While recent text-prompted approaches have shown promise, most approaches assume fully overlapped mixtures, limiting insight into behavior across realistic overlap ratios. We introduce VorTEX (Various overlap ratio for Target speech EXtraction), a text-prompted TSE architecture with a Decoupled Adaptive Multi-branch (DAM) Fusion block that separates primary extraction from auxiliary regularization pathways. To enable controlled analysis, we construct PORTE, a two-speaker dataset spanning overlap ratios from 0% to 100%. We further propose Suppression Ratio on Energy (SuRE), a diagnostic metric that detects suppression behavior not captured by conventional measures. Experiments show that existing models exhibit suppression or residual interference under overlap, whereas VorTEX achieves the highest separation fidelity across 20-100% overlap (e.g., 5.50 dB at 20% and 2.04 dB at 100%) while maintaining zero SuRE, indicating robust extraction without suppression-driven artifacts.

顶级标签: audio multi-modal model evaluation
详细标签: speech extraction overlap ratio dataset evaluation metric speech separation 或 搜索:

VorTEX:面向目标语音提取的多种重叠率研究 / VorTEX: Various overlap ratio for Target speech EXtraction


1️⃣ 一句话总结

这篇论文提出了一种名为VorTEX的新模型,它通过一种解耦的多分支融合技术,能够在说话人声音重叠比例从20%到100%的各种真实场景中,更稳健地提取出目标语音,同时避免了现有方法可能产生的抑制或残留干扰问题。

源自 arXiv: 2603.14803