菜单

🤖 系统
📄 Abstract - Structured Extraction from Business Process Diagrams Using Vision-Language Models

Business Process Model and Notation (BPMN) is a widely adopted standard for representing complex business workflows. While BPMN diagrams are often exchanged as visual images, existing methods primarily rely on XML representations for computational analysis. In this work, we present a pipeline that leverages Vision-Language Models (VLMs) to extract structured JSON representations of BPMN diagrams directly from images, without requiring source model files or textual annotations. We also incorporate optical character recognition (OCR) for textual enrichment and evaluate the generated element lists against ground truth data derived from the source XML files. Our approach enables robust component extraction in scenarios where original source files are unavailable. We benchmark multiple VLMs and observe performance improvements in several models when OCR is used for text enrichment. In addition, we conducted extensive statistical analyses of OCR-based enrichment methods and prompt ablation studies, providing a clearer understanding of their impact on model performance.

顶级标签: computer vision natural language processing multi-modal
详细标签: vision-language models document understanding optical character recognition business process modeling structured data extraction 或 搜索:

利用视觉语言模型从业务流程图中进行结构化信息提取 / Structured Extraction from Business Process Diagrams Using Vision-Language Models


1️⃣ 一句话总结

这篇论文提出了一种新方法,利用视觉语言模型直接从业务流程图的图片中自动提取出结构化的信息,即使没有原始的源文件也能准确识别图中的各种元素和文字。


📄 打开原文 PDF