菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-09
📄 Abstract - Vision-Language Foundation Models for Comprehensive Automated Pavement Condition Assessment

General-purpose vision-language models demonstrate strong performance in everyday domains but struggle with specialized technical fields requiring precise terminology, structured reasoning, and adherence to engineering standards. This work addresses whether domain-specific instruction tuning can enable comprehensive pavement condition assessment through vision-language models. PaveInstruct, a dataset containing 278,889 image-instruction-response pairs spanning 32 task types, was created by unifying annotations from nine heterogeneous pavement datasets. PaveGPT, a pavement foundation model trained on this dataset, was evaluated against state-of-the-art vision-language models across perception, understanding, and reasoning tasks. Instruction tuning transformed model capabilities, achieving improvements exceeding 20% in spatial grounding, reasoning, and generation tasks while producing ASTM D6433-compliant outputs. These results enable transportation agencies to deploy unified conversational assessment tools that replace multiple specialized systems, simplifying workflows and reducing technical expertise requirements. The approach establishes a pathway for developing instruction-driven AI systems across infrastructure domains including bridge inspection, railway maintenance, and building condition assessment.

顶级标签: computer vision multi-modal systems
详细标签: vision-language models infrastructure inspection domain adaptation instruction tuning pavement assessment 或 搜索:

用于全面自动化路面状况评估的视觉-语言基础模型 / Vision-Language Foundation Models for Comprehensive Automated Pavement Condition Assessment


1️⃣ 一句话总结

这篇论文通过构建一个大型专业路面数据集并训练一个名为PaveGPT的模型,成功地将通用视觉语言模型改造为能理解工程术语、进行结构化推理并输出符合行业标准的专业路面评估助手,从而用一个对话式工具替代了多个传统专业系统。

源自 arXiv: 2604.08212