菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-06
📄 Abstract - Less Detail, Better Answers: Degradation-Driven Prompting for VQA

Recent advancements in Vision-Language Models (VLMs) have significantly pushed the boundaries of Visual Question Answering (VQA).However,high-resolution details can sometimes become noise that leads to hallucinations or reasoning errors. In this paper,we propose Degradation-Driven Prompting (DDP), a novel framework that improves VQA performance by strategically reducing image fidelity to force models to focus on essential structural information. We evaluate DDP across two distinct tasks. Physical attributes targets images prone to human misjudgment, where DDP employs a combination of 80p downsampling, structural visual aids (white background masks and orthometric lines), and In-Context Learning (ICL) to calibrate the model's focus. Perceptual phenomena addresses various machine-susceptible visual anomalies and illusions, including Visual Anomaly (VA), Color (CI), Motion(MI),Gestalt (GI), Geometric (GSI), and Visual Illusions (VI).For this task, DDP integrates a task-classification stage with specialized tools such as blur masks and contrast enhancement alongside downsampling. Our experimental results demonstrate that less is more: by intentionally degrading visual inputs and providing targeted structural prompts, DDP enables VLMs to bypass distracting textures and achieve superior reasoning accuracy on challenging visual benchmarks.

顶级标签: multi-modal model evaluation natural language processing
详细标签: vision-language models visual question answering prompt engineering image degradation hallucination reduction 或 搜索:

细节越少,答案越好:面向视觉问答的降质驱动提示框架 / Less Detail, Better Answers: Degradation-Driven Prompting for VQA


1️⃣ 一句话总结

这篇论文提出了一种名为‘降质驱动提示’的新方法,通过有策略地降低输入图像的清晰度并添加结构性提示,帮助视觉语言模型忽略干扰性细节,专注于核心结构信息,从而在复杂的视觉问答任务中取得更准确的结果。

源自 arXiv: 2604.04838