IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting

📄 Abstract - IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting

Recent advances in multimodal large language models (MLLMs) have led to impressive progress across various benchmarks. However, their capability in understanding infrared images remains unexplored. To address this gap, we introduce IF-Bench, the first high-quality benchmark designed for evaluating multimodal understanding of infrared images. IF-Bench consists of 499 images sourced from 23 infrared datasets and 680 carefully curated visual question-answer pairs, covering 10 essential dimensions of image understanding. Based on this benchmark, we systematically evaluate over 40 open-source and closed-source MLLMs, employing cyclic evaluation, bilingual assessment, and hybrid judgment strategies to enhance the reliability of the results. Our analysis reveals how model scale, architecture, and inference paradigms affect infrared image comprehension, providing valuable insights for this area. Furthermore, we propose a training-free generative visual prompting (GenViP) method, which leverages advanced image editing models to translate infrared images into semantically and spatially aligned RGB counterparts, thereby mitigating domain distribution shifts. Extensive experiments demonstrate that our method consistently yields significant performance improvements across a wide range of MLLMs. The benchmark and code are available at this https URL.

IF-Bench：基于生成式视觉提示的红外图像多模态大语言模型评测与增强 / IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting

1️⃣ 一句话总结

这篇论文创建了首个用于评估多模态大模型理解红外图像能力的基准测试IF-Bench，并发现了一种无需额外训练、通过将红外图像转换为语义对齐的RGB图像来显著提升模型性能的通用方法。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要