特征对齐的语音水印技术:提升对重建失真的鲁棒性 / Feature-Aligned Speech Watermarking for Robustness to Reconstruction Distortions
1️⃣ 一句话总结
本文提出一种新的语音水印方法,通过将水印特征与原始语音特征对齐,在保持人耳不可察觉的前提下提高水印能量,从而显著增强水印在语音重建模型攻击下的鲁棒性。
Audio watermarking aims to embed identifiable information into audio while remaining imperceptible. Existing methods adopt high-fidelity, low-energy designs to preserve perceptual quality, but the resulting watermarks lack robustness under suppression by speech reconstruction models. Improving robustness is challenging due to the inherent robustness-fidelity trade-off in existing designs, where increasing watermark energy improves robustness but reduces fidelity. To address this problem, we propose a feature-aligned watermarking method that aligns the watermark with the original speech feature distribution, allowing higher watermark energy to improve robustness while preserving imperceptibility. We use a pretrained speech codec to generate a pseudo-speech watermark and fuse it into the spectrogram of the input audio, with VAD loss and perceptual losses guiding embedding within voiced regions. Experiments show that our method maintains imperceptibility comparable to existing approaches while substantially improving robustness under both seen and unseen speech reconstruction models.
特征对齐的语音水印技术:提升对重建失真的鲁棒性 / Feature-Aligned Speech Watermarking for Robustness to Reconstruction Distortions
本文提出一种新的语音水印方法,通过将水印特征与原始语音特征对齐,在保持人耳不可察觉的前提下提高水印能量,从而显著增强水印在语音重建模型攻击下的鲁棒性。
源自 arXiv: 2606.11828