Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection

📄 Abstract - Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection

Modern Large audio-language models (LALMs) power intelligent voice interactions by tightly integrating audio and text. This integration, however, expands the attack surface beyond text and introduces vulnerabilities in the continuous, high-dimensional audio channel. While prior work studied audio jailbreaks, the security risks of malicious audio injection and downstream behavior manipulation remain underexamined. In this work, we reveal a previously overlooked threat, auditory prompt injection, under realistic constraints of audio data-only access and strong perceptual stealth. To systematically analyze this threat, we propose \textit{AudioHijack}, a general framework that generates context-agnostic and imperceptible adversarial audio to hijack LALMs. \textit{AudioHijack} employs sampling-based gradient estimation for end-to-end optimization across diverse models, bypassing non-differentiable audio tokenization. Through attention supervision and multi-context training, it steers model attention toward adversarial audio and generalizes to unseen user contexts. We also design a convolutional blending method that modulates perturbations into natural reverberation, making them highly imperceptible to users. Extensive experiments on 13 state-of-the-art LALMs show consistent hijacking across 6 misbehavior categories, achieving average success rates of 79\%-96\% on unseen user contexts with high acoustic fidelity. Real-world studies demonstrate that commercial voice agents from Mistral AI and Microsoft Azure can be induced to execute unauthorized actions on behalf of users. These findings expose critical vulnerabilities in LALMs and highlight the urgent need for dedicated defense.

通过上下文无关且难以察觉的听觉提示注入劫持大型音频-语言模型 / Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection

1️⃣ 一句话总结

这项研究揭示了一种针对大型音频-语言模型的新型安全威胁，通过生成一种人耳难以察觉、能融入环境背景音的恶意音频片段，可以劫持智能语音助手，使其在用户不知情的情况下执行未经授权的指令。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要