菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-19
📄 Abstract - Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding

Multimodal Large Language Models struggle to maintain reliable performance under extreme real-world visual degradations, which impede their practical robustness. Existing robust MLLMs predominantly rely on implicit training/adaptation that focuses solely on visual encoder generalization, suffering from limited interpretability and isolated optimization. To overcome these limitations, we propose Robust-R1, a novel framework that explicitly models visual degradations through structured reasoning chains. Our approach integrates: (i) supervised fine-tuning for degradation-aware reasoning foundations, (ii) reward-driven alignment for accurately perceiving degradation parameters, and (iii) dynamic reasoning depth scaling adapted to degradation intensity. To facilitate this approach, we introduce a specialized 11K dataset featuring realistic degradations synthesized across four critical real-world visual processing stages, each annotated with structured chains connecting degradation parameters, perceptual influence, pristine semantic reasoning chain, and conclusion. Comprehensive evaluations demonstrate state-of-the-art robustness: Robust-R1 outperforms all general and robust baselines on the real-world degradation benchmark R-Bench, while maintaining superior anti-degradation performance under multi-intensity adversarial degradations on MMMB, MMStar, and RealWorldQA.

顶级标签: multi-modal model training model evaluation
详细标签: robustness degradation-aware reasoning multimodal llm structured reasoning synthetic data 或 搜索:

Robust-R1:通过显式退化感知推理增强多模态大语言模型的视觉鲁棒性 / Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding


1️⃣ 一句话总结

本文提出了一种名为Robust-R1的新框架,通过一个结构化的、显式的退化感知推理链来显式建模视觉退化,从而显著提升了多模态大语言模型在真实世界视觉退化条件下的鲁棒性和可解释性。


2️⃣ 论文创新点

1. 显式退化感知推理框架

2. 结构化推理链令牌化

3. 两阶段训练策略(SFT+RL)

4. 专用合成数据集构建

5. 基于退化强度的动态推理链长度调整


3️⃣ 主要结果与价值

结果亮点

实际价值


4️⃣ 术语表

源自 arXiv: 2512.17532