Vision-Language Foundation Models for Comprehensive Automated Pavement Condition Assessment

📄 Abstract - Vision-Language Foundation Models for Comprehensive Automated Pavement Condition Assessment

General-purpose vision-language models demonstrate strong performance in everyday domains but struggle with specialized technical fields requiring precise terminology, structured reasoning, and adherence to engineering standards. This work addresses whether domain-specific instruction tuning can enable comprehensive pavement condition assessment through vision-language models. PaveInstruct, a dataset containing 278,889 image-instruction-response pairs spanning 32 task types, was created by unifying annotations from nine heterogeneous pavement datasets. PaveGPT, a pavement foundation model trained on this dataset, was evaluated against state-of-the-art vision-language models across perception, understanding, and reasoning tasks. Instruction tuning transformed model capabilities, achieving improvements exceeding 20% in spatial grounding, reasoning, and generation tasks while producing ASTM D6433-compliant outputs. These results enable transportation agencies to deploy unified conversational assessment tools that replace multiple specialized systems, simplifying workflows and reducing technical expertise requirements. The approach establishes a pathway for developing instruction-driven AI systems across infrastructure domains including bridge inspection, railway maintenance, and building condition assessment.

用于全面自动化路面状况评估的视觉-语言基础模型 / Vision-Language Foundation Models for Comprehensive Automated Pavement Condition Assessment

1️⃣ 一句话总结

这篇论文通过构建一个大型专业路面数据集并训练一个名为PaveGPT的模型，成功地将通用视觉语言模型改造为能理解工程术语、进行结构化推理并输出符合行业标准的专业路面评估助手，从而用一个对话式工具替代了多个传统专业系统。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要