QualiRAG:用于视觉质量理解的检索增强生成框架 / QualiRAG: Retrieval-Augmented Generation for Visual Quality Understanding
1️⃣ 一句话总结
这篇论文提出了一个名为QualiRAG的免训练框架,它通过动态生成并检索四种互补的辅助知识,显著提升了大型多模态模型对图像或视频质量的解释性理解能力,且无需针对特定任务进行训练。
Visual quality assessment (VQA) is increasingly shifting from scalar score prediction toward interpretable quality understanding -- a paradigm that demands \textit{fine-grained spatiotemporal perception} and \textit{auxiliary contextual information}. Current approaches rely on supervised fine-tuning or reinforcement learning on curated instruction datasets, which involve labor-intensive annotation and are prone to dataset-specific biases. To address these challenges, we propose \textbf{QualiRAG}, a \textit{training-free} \textbf{R}etrieval-\textbf{A}ugmented \textbf{G}eneration \textbf{(RAG)} framework that systematically leverages the latent perceptual knowledge of large multimodal models (LMMs) for visual quality perception. Unlike conventional RAG that retrieves from static corpora, QualiRAG dynamically generates auxiliary knowledge by decomposing questions into structured requests and constructing four complementary knowledge sources: \textit{visual metadata}, \textit{subject localization}, \textit{global quality summaries}, and \textit{local quality descriptions}, followed by relevance-aware retrieval for evidence-grounded reasoning. Extensive experiments show that QualiRAG achieves substantial improvements over open-source general-purpose LMMs and VQA-finetuned LMMs on visual quality understanding tasks, and delivers competitive performance on visual quality comparison tasks, demonstrating robust quality assessment capabilities without any task-specific training. The code will be publicly available at this https URL.
QualiRAG:用于视觉质量理解的检索增强生成框架 / QualiRAG: Retrieval-Augmented Generation for Visual Quality Understanding
这篇论文提出了一个名为QualiRAG的免训练框架,它通过动态生成并检索四种互补的辅助知识,显著提升了大型多模态模型对图像或视频质量的解释性理解能力,且无需针对特定任务进行训练。
源自 arXiv: 2601.18195