菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-12
📄 Abstract - HomeSafe-Bench: Evaluating Vision-Language Models on Unsafe Action Detection for Embodied Agents in Household Scenarios

The rapid evolution of embodied agents has accelerated the deployment of household robots in real-world environments. However, unlike structured industrial settings, household spaces introduce unpredictable safety risks, where system limitations such as perception latency and lack of common sense knowledge can lead to dangerous errors. Current safety evaluations, often restricted to static images, text, or general hazards, fail to adequately benchmark dynamic unsafe action detection in these specific contexts. To bridge this gap, we introduce HomeSafe-Bench, a challenging benchmark designed to evaluate Vision-Language Models (VLMs) on unsafe action detection in household scenarios. HomeSafe-Bench is contrusted via a hybrid pipeline combining physical simulation with advanced video generation and features 438 diverse cases across six functional areas with fine-grained multidimensional annotations. Beyond benchmarking, we propose Hierarchical Dual-Brain Guard for Household Safety (HD-Guard), a hierarchical streaming architecture for real-time safety monitoring. HD-Guard coordinates a lightweight FastBrain for continuous high-frequency screening with an asynchronous large-scale SlowBrain for deep multimodal reasoning, effectively balancing inference efficiency with detection accuracy. Evaluations demonstrate that HD-Guard achieves a superior trade-off between latency and performance, while our analysis identifies critical bottlenecks in current VLM-based safety detection.

顶级标签: multi-modal agents benchmark
详细标签: vision-language models safety evaluation embodied agents household robotics unsafe action detection 或 搜索:

HomeSafe-Bench:评估视觉语言模型在家庭场景具身智能体不安全动作检测中的表现 / HomeSafe-Bench: Evaluating Vision-Language Models on Unsafe Action Detection for Embodied Agents in Household Scenarios


1️⃣ 一句话总结

这篇论文提出了一个专门用于测试AI家庭机器人安全性的新标准(HomeSafe-Bench),并设计了一个名为HD-Guard的双层智能监控系统,它通过‘快脑’快速筛查和‘慢脑’深度分析相结合的方式,在保证实时响应的同时,更准确地识别家庭环境中的危险动作。

源自 arXiv: 2603.11975