R4-CGQA:基于检索的视觉语言模型用于计算机图形图像质量评估 / R4-CGQA: Retrieval-based Vision Language Models for Computer Graphics Image Quality Assessment
1️⃣ 一句话总结
这项研究通过构建一个包含详细质量描述的计算机图形图像数据集,并设计一种基于检索增强的两阶段框架,显著提升了现有视觉语言模型在评估计算机图形图像细粒度质量方面的准确性和解释能力。
Immersive Computer Graphics (CGs) rendering has become ubiquitous in modern daily life. However, comprehensively evaluating CG quality remains challenging for two reasons: First, existing CG datasets lack systematic descriptions of rendering quality; and second existing CG quality assessment methods cannot provide reasonable text-based explanations. To address these issues, we first identify six key perceptual dimensions of CG quality from the user perspective and construct a dataset of 3500 CG images with corresponding quality descriptions. Each description covers CG style, content, and perceived quality along the selected dimensions. Furthermore, we use a subset of the dataset to build several question-answer benchmarks based on the descriptions in order to evaluate the responses of existing Vision Language Models (VLMs). We find that current VLMs are not sufficiently accurate in judging fine-grained CG quality, but that descriptions of visually similar images can significantly improve a VLM's understanding of a given CG image. Motivated by this observation, we adopt retrieval-augmented generation and propose a two-stream retrieval framework that effectively enhances the CG quality assessment capabilities of VLMs. Experiments on several representative VLMs demonstrate that our method substantially improves their performance on CG quality assessment.
R4-CGQA:基于检索的视觉语言模型用于计算机图形图像质量评估 / R4-CGQA: Retrieval-based Vision Language Models for Computer Graphics Image Quality Assessment
这项研究通过构建一个包含详细质量描述的计算机图形图像数据集,并设计一种基于检索增强的两阶段框架,显著提升了现有视觉语言模型在评估计算机图形图像细粒度质量方面的准确性和解释能力。
源自 arXiv: 2603.10578