Risky-Bench:探究现实世界部署下智能体的安全风险 / Risky-Bench: Probing Agentic Safety Risks under Real-World Deployment
1️⃣ 一句话总结
这篇论文提出了一个名为Risky-Bench的评估框架,旨在系统性地测试作为智能体在真实复杂环境中运行的大型语言模型所面临的安全风险,弥补了现有评估方法覆盖不全且适应性不足的缺陷,并在生活辅助场景中发现了现有先进智能体存在的显著安全隐患。
Large Language Models (LLMs) are increasingly deployed as agents that operate in real-world environments, introducing safety risks beyond linguistic harm. Existing agent safety evaluations rely on risk-oriented tasks tailored to specific agent settings, resulting in limited coverage of safety risk space and failing to assess agent safety behavior during long-horizon, interactive task execution in complex real-world deployments. Moreover, their specialization to particular agent settings limits adaptability across diverse agent configurations. To address these limitations, we propose Risky-Bench, a framework that enables systematic agent safety evaluation grounded in real-world deployment. Risky-Bench organizes evaluation around domain-agnostic safety principles to derive context-aware safety rubrics that delineate safety space, and systematically evaluates safety risks across this space through realistic task execution under varying threat assumptions. When applied to life-assist agent settings, Risky-Bench uncovers substantial safety risks in state-of-the-art agents under realistic execution conditions. Moreover, as a well-structured evaluation pipeline, Risky-Bench is not confined to life-assist scenarios and can be adapted to other deployment settings to construct environment-specific safety evaluations, providing an extensible methodology for agent safety assessment.
Risky-Bench:探究现实世界部署下智能体的安全风险 / Risky-Bench: Probing Agentic Safety Risks under Real-World Deployment
这篇论文提出了一个名为Risky-Bench的评估框架,旨在系统性地测试作为智能体在真实复杂环境中运行的大型语言模型所面临的安全风险,弥补了现有评估方法覆盖不全且适应性不足的缺陷,并在生活辅助场景中发现了现有先进智能体存在的显著安全隐患。
源自 arXiv: 2602.03100