Hospitality-VQA:面向决策的视觉语言模型信息性评估 / Hospitality-VQA: Decision-Oriented Informativeness Evaluation for Vision-Language Models
1️⃣ 一句话总结
这篇论文提出了一个针对酒店和设施图像的视觉问答新框架,通过定义“信息性”来衡量图像和问题对用户决策的帮助程度,并构建了一个专门的评测数据集,发现当前先进的视觉语言模型需要经过特定领域微调才能有效利用关键视觉信息来支持决策。
Recent advances in Vision-Language Models (VLMs) have demonstrated impressive multimodal understanding in general domains. However, their applicability to decision-oriented domains such as hospitality remains largely unexplored. In this work, we investigate how well VLMs can perform visual question answering (VQA) about hotel and facility images that are central to consumer decision-making. While many existing VQA benchmarks focus on factual correctness, they rarely capture what information users actually find useful. To address this, we first introduce Informativeness as a formal framework to quantify how much hospitality-relevant information an image-question pair provides. Guided by this framework, we construct a new hospitality-specific VQA dataset that covers various facility types, where questions are specifically designed to reflect key user information needs. Using this benchmark, we conduct experiments with several state-of-the-art VLMs, revealing that VLMs are not intrinsically decision-aware-key visual signals remain underutilized, and reliable informativeness reasoning emerges only after modest domain-specific finetuning.
Hospitality-VQA:面向决策的视觉语言模型信息性评估 / Hospitality-VQA: Decision-Oriented Informativeness Evaluation for Vision-Language Models
这篇论文提出了一个针对酒店和设施图像的视觉问答新框架,通过定义“信息性”来衡量图像和问题对用户决策的帮助程度,并构建了一个专门的评测数据集,发现当前先进的视觉语言模型需要经过特定领域微调才能有效利用关键视觉信息来支持决策。
源自 arXiv: 2603.07868