AegisUI:面向AI智能体系统中结构化用户界面协议的行为异常检测 / AegisUI: Behavioral Anomaly Detection for Structured User Interface Protocols in AI Agent Systems
1️⃣ 一句话总结
这篇论文提出了一个名为AegisUI的框架,专门用于检测AI智能体动态生成的用户界面中隐藏的恶意行为,例如按钮文字与实际功能不符,并通过实验证明监督学习方法能最有效地识别这类新型安全威胁。
AI agents that build user interfaces on the fly assembling buttons, forms, and data displays from structured protocol payloads are becoming common in production systems. The trouble is that a payload can pass every schema check and still trick a user: a button might say "View invoice" while its hidden action wipes an account, or a display widget might quietly bind to an internal salary field. Current defenses stop at syntax; they were never built to catch this kind of behavioral mismatch. We built AegisUI to study exactly this gap. The framework generates structured UI payloads, injects realistic attacks into them, extracts numeric features, and benchmarks anomaly detectors end-to-end. We produced 4000 labeled payloads (3000 benign, 1000 malicious) spanning five application domains and five attack families: phishing interfaces, data leakage, layout abuse, manipulative UI, and workflow anomalies. From each payload we extracted 18 features covering structural, semantic, binding, and session dimensions, then compared three detectors: Isolation Forest (unsupervised), a benign-trained autoencoder (semi-supervised), and Random Forest (supervised). On a stratified 80/20 split, Random Forest scored best overall (accuracy 0.931, precision 0.980, recall 0.740, F1 0.843, ROC-AUC 0.952). The autoencoder came second (F1 0.762, ROC-AUC 0.863) and needs no malicious labels at training time, which matters when deploying a new system that lacks attack history. Per-attack-type analysis showed that layout abuse is easiest to catch while manipulative UI payloads are hardest. All code, data, and configurations are released for full reproducibility.
AegisUI:面向AI智能体系统中结构化用户界面协议的行为异常检测 / AegisUI: Behavioral Anomaly Detection for Structured User Interface Protocols in AI Agent Systems
这篇论文提出了一个名为AegisUI的框架,专门用于检测AI智能体动态生成的用户界面中隐藏的恶意行为,例如按钮文字与实际功能不符,并通过实验证明监督学习方法能最有效地识别这类新型安全威胁。
源自 arXiv: 2603.05031