菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-26
📄 Abstract - A Hybrid Vision-Language Architecture for Automated Defect Reasoning and Report Generation in Industrial Inspection

Automated industrial inspection requires both precise defect localization and structured maintenance report generation; in current practice these tasks are handled separately, with linguistic interpretation left to human experts. This paper describes a decoupled, edge-deployable pipeline for wind turbine blade inspection built from three components that each handle a distinct sub-task. The Eyes a YOLO26-x-obb oriented bounding-box detector localizes defects at dataset-native resolution. The Bridge a deterministic, parameter-free encoding module maps each detected bounding box to grid-referenced spatial tokens embedded in a structured prompt. The Brain a 4-bit quantized Qwen-2.5-1.5B model adapted with Quantized Low-Rank Adaptation (QLoRA) on 947 synthetically generated maintenance reports generates a structured JSON report from that prompt. Retrieval-Augmented Fine-Tuning (RAFT) further grounds each recommendation in indexed maintenance procedures. Five ablation experiments, scored by BLEU-4, ROUGE-L, Hallucination Rate (HR), and an LLM-as-a-Judge rubric, compare the pipeline against a monolithic vision-language model (VLM) baseline and against partial configurations in which one component is removed. The complete system achieves BLEU-4 0.41, HR=4%, and Expert Score = 8.6/10 compared with 0.07, 65%, and 3.3/10 for the zero-shot VLM baseline. The QLoRA-adapted 1.5B model generates higher-quality reports than a 671B-parameter generalist API model given identical detection evidence, at 47 tokens per second on a single T4-class GPU. The results show that purpose-built decoupled architecture with a small domain-specific training corpus outperforms a generalist end-to-end model on this structured generation task.

顶级标签: computer vision natural language processing systems
详细标签: industrial inspection defect localization report generation vision-language model edge deployment 或 搜索:

面向工业检测的混合视觉-语言架构:自动化缺陷推理与报告生成 / A Hybrid Vision-Language Architecture for Automated Defect Reasoning and Report Generation in Industrial Inspection


1️⃣ 一句话总结

本文提出了一种由目标检测、空间编码和小型语言模型组成的解耦式工业检测流水线,通过在风力涡轮机叶片缺陷检测任务中生成结构化维护报告,以不到1%的参数量实现了比通用大模型高6倍的报告质量,并降低了60%以上的幻觉率。

源自 arXiv: 2605.26533