菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-03
📄 Abstract - WebSentinel: Detecting and Localizing Prompt Injection Attacks for Web Agents

Prompt injection attacks manipulate webpage content to cause web agents to execute attacker-specified tasks instead of the user's intended ones. Existing methods for detecting and localizing such attacks achieve limited effectiveness, as their underlying assumptions often do not hold in the web-agent setting. In this work, we propose WebSentinel, a two-step approach for detecting and localizing prompt injection attacks in webpages. Given a webpage, Step I extracts \emph{segments of interest} that may be contaminated, and Step II evaluates each segment by checking its consistency with the webpage content as context. We show that WebSentinel is highly effective, substantially outperforming baseline methods across multiple datasets of both contaminated and clean webpages that we collected. Our code is available at: this https URL.

顶级标签: llm agents systems
详细标签: prompt injection web agents adversarial attacks security detection 或 搜索:

WebSentinel:针对网络代理的提示注入攻击检测与定位 / WebSentinel: Detecting and Localizing Prompt Injection Attacks for Web Agents


1️⃣ 一句话总结

这篇论文提出了一种名为WebSentinel的两阶段方法,能有效检测并定位网页中旨在操控网络代理执行恶意任务的提示注入攻击,其性能显著优于现有基线方法。

源自 arXiv: 2602.03792