大语言模型如何理解高级消息序列图? / (How) Do Large Language Models Understand High-Level Message Sequence Charts?
1️⃣ 一句话总结
本文测试了三种主流大语言模型在理解高级消息序列图(一种用于软件架构设计的可视化模型)的语义时的表现,发现它们对基本概念理解尚可(准确率约88%),但在处理抽象、组合等复杂语义推理任务时表现较差(准确率约36%),整体准确率仅约52%,表明当前LLM对这类形式化规约的理解仍非常有限。
Large Language Models (LLMs) are being employed widely to automate tasks across the software development life-cycle. It is, however, unclear whether these tasks are performed consistently with respect to the semantics of the artefacts being handled. This question is particularly under-researched concerning architectural design specification. In this paper, we address this question for High-Level Message Sequence Charts (HMSCs). These are visual models with a rigorous formal semantics that have been used for various purposes, including as a foundation for Sequence Diagrams in the Unified Modelling Language (UML). We examine whether LLMs "understand" the semantics of HMSCs by examining three LLMs (Gemini-3, GPT-5.4, and Qwen-3.6) on how they perform 129 semantic tasks ranging from querying basic semantic constructs in HMSCs (i.e., events and their ordering) to semantic-preserving abstractions and compositions, and calculating the set of traces and trace-equivalent labelled transition systems. The results show that LLMs only have a modest understanding of the formal semantics of HMSCs (ca. 52% overall accuracy), with great variability across different semantic concepts: while LLMs seem to understand the basic semantic concepts of MSCs (ca. 88% accuracy), they struggle with semantic reasoning in tasks involving abstraction and composition (ca. 36% accuracy) and traces and LTSs (ca. 42% accuracy). In particular, all three LLMs struggle with the notions of co-region and explicit causal dependencies and never employed them in semantic-preserving transformations.
大语言模型如何理解高级消息序列图? / (How) Do Large Language Models Understand High-Level Message Sequence Charts?
本文测试了三种主流大语言模型在理解高级消息序列图(一种用于软件架构设计的可视化模型)的语义时的表现,发现它们对基本概念理解尚可(准确率约88%),但在处理抽象、组合等复杂语义推理任务时表现较差(准确率约36%),整体准确率仅约52%,表明当前LLM对这类形式化规约的理解仍非常有限。
源自 arXiv: 2605.13773