菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-29
📄 Abstract - ProGuard: Towards Proactive Multimodal Safeguard

The rapid evolution of generative models has led to a continuous emergence of multimodal safety risks, exposing the limitations of existing defense methods. To address these challenges, we propose ProGuard, a vision-language proactive guard that identifies and describes out-of-distribution (OOD) safety risks without the need for model adjustments required by traditional reactive approaches. We first construct a modality-balanced dataset of 87K samples, each annotated with both binary safety labels and risk categories under a hierarchical multimodal safety taxonomy, effectively mitigating modality bias and ensuring consistent moderation across text, image, and text-image inputs. Based on this dataset, we train our vision-language base model purely through reinforcement learning (RL) to achieve efficient and concise reasoning. To approximate proactive safety scenarios in a controlled setting, we further introduce an OOD safety category inference task and augment the RL objective with a synonym-bank-based similarity reward that encourages the model to generate concise descriptions for unseen unsafe categories. Experimental results show that ProGuard achieves performance comparable to closed-source large models on binary safety classification, substantially outperforms existing open-source guard models on unsafe content categorization. Most notably, ProGuard delivers a strong proactive moderation ability, improving OOD risk detection by 52.6% and OOD risk description by 64.8%.

顶级标签: multi-modal model training model evaluation
详细标签: safety out-of-distribution detection reinforcement learning vision-language model content moderation 或 搜索:

ProGuard:迈向主动式多模态安全防护 / ProGuard: Towards Proactive Multimodal Safeguard


1️⃣ 一句话总结

这篇论文提出了一个名为ProGuard的主动式多模态安全防护系统,它通过强化学习训练,无需调整现有模型就能识别并描述前所未见的安全风险,在风险检测和描述能力上相比传统被动方法有显著提升。

源自 arXiv: 2512.23573