LivePI:针对间接提示注入的智能体更逼真基准测试 / LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injectio
1️⃣ 一句话总结
该论文提出了一个名为LivePI的结构化基准测试框架,用于在真实虚拟机环境中评估AI智能体(如OpenClaw)应对间接提示注入攻击的风险,覆盖多种输入渠道和攻击目标,并验证了一种两层防御机制的有效性。
AI agents such as OpenClaw are increasingly deployed in local workflows with access to external tools. This creates indirect prompt-injection (IPI) risk: an agent may execute harmful instructions embedded in untrusted inputs such as email, downloaded files, webpages, repositories, or group-chat messages. Existing evaluations are often small, purely simulated, or focused on a narrow set of channels. We introduce LivePI (Live Prompt Injection), a structured benchmark for IPI risk in a production-like but test-controlled environment. LivePI covers seven input surfaces, twelve attack/rendering families, and five malicious goals, including protected-information exfiltration, unauthorized security-control changes, unsafe code retrieval or execution, inbox-summary exfiltration, and cryptocurrency transfer. We run LivePI on a real virtual machine with live but test-controlled email, chat, web, local-file, repository, and wallet interfaces. Across GPT-5.3-Codex, Claude Opus 4.6, Gemini 3.1 Pro, Kimi K2.5, and GLM-5, total attack success rates range from 10.7% to 29.6%. Group-chat injection is uniformly successful across the evaluated backbones in our deployment, and repository-link attacks produce high-severity failures despite a small denominator. We also evaluate a two-layer defense consisting of prompt-level filtering and pre-execution tool-call authorization. In the GPT-5.3-Codex setting, the defense intercepts all tested malicious-goal completions in LivePI before execution while preserving benign utility on PinchBench-derived workloads.
LivePI:针对间接提示注入的智能体更逼真基准测试 / LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injectio
该论文提出了一个名为LivePI的结构化基准测试框架,用于在真实虚拟机环境中评估AI智能体(如OpenClaw)应对间接提示注入攻击的风险,覆盖多种输入渠道和攻击目标,并验证了一种两层防御机制的有效性。
源自 arXiv: 2605.17986