用于全面自动化路面状况评估的视觉-语言基础模型 / Vision-Language Foundation Models for Comprehensive Automated Pavement Condition Assessment
1️⃣ 一句话总结
这篇论文通过构建一个大型专业路面数据集并训练一个名为PaveGPT的模型,成功地将通用视觉语言模型改造为能理解工程术语、进行结构化推理并输出符合行业标准的专业路面评估助手,从而用一个对话式工具替代了多个传统专业系统。
General-purpose vision-language models demonstrate strong performance in everyday domains but struggle with specialized technical fields requiring precise terminology, structured reasoning, and adherence to engineering standards. This work addresses whether domain-specific instruction tuning can enable comprehensive pavement condition assessment through vision-language models. PaveInstruct, a dataset containing 278,889 image-instruction-response pairs spanning 32 task types, was created by unifying annotations from nine heterogeneous pavement datasets. PaveGPT, a pavement foundation model trained on this dataset, was evaluated against state-of-the-art vision-language models across perception, understanding, and reasoning tasks. Instruction tuning transformed model capabilities, achieving improvements exceeding 20% in spatial grounding, reasoning, and generation tasks while producing ASTM D6433-compliant outputs. These results enable transportation agencies to deploy unified conversational assessment tools that replace multiple specialized systems, simplifying workflows and reducing technical expertise requirements. The approach establishes a pathway for developing instruction-driven AI systems across infrastructure domains including bridge inspection, railway maintenance, and building condition assessment.
用于全面自动化路面状况评估的视觉-语言基础模型 / Vision-Language Foundation Models for Comprehensive Automated Pavement Condition Assessment
这篇论文通过构建一个大型专业路面数据集并训练一个名为PaveGPT的模型,成功地将通用视觉语言模型改造为能理解工程术语、进行结构化推理并输出符合行业标准的专业路面评估助手,从而用一个对话式工具替代了多个传统专业系统。
源自 arXiv: 2604.08212