菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-07-02
📄 Abstract - Rethinking Complexity Metrics for LLM-Integrated Applications: Beyond Source Code

LLM-integrated applications blend natural language prompts with program code, and much of their runtime behavior originates in the prompt layer rather than in the code itself. Existing complexity metrics, however, operate solely at the code level and therefore overlook this behavioral logic entirely. We present HECATE, the first tool designed to assess complexity in both the prompt and code layers of such applications. Central to HECATE is Prompt-as-Specification, a Hoare-logic-inspired formalism that interprets every prompt as a specification of intended behavior. Grounded in 25 complexity dimensions identified across published taxonomies, the tool generates 52 candidate metrics. We assess each metric against 118 components collected from 18 open-source repositories, relying on maintenance activity derived from version history as an empirical proxy for complexity, and discard any metric that loses significance once code size is accounted for. Only ten metrics withstand this test. Seven belong to our newly introduced set; rather than measuring sheer volume, each tallies structurally distinct elements, such as LLM call sites, memory attributes, and prompt templates, an attribute we call structural breadth. Of the three surviving conventional metrics, RFC exhibits a similar breadth-oriented character, while Halstead N and V survive only as a residual effect of size; our top-performing metrics exceed all three. Crucially, the prompt-layer metrics retain significance even when the strongest code-level metric is added as a covariate, establishing prompt complexity as a dimension in its own right. A final validation on 20 components spanning six held-out repositories shows that the two best-performing metrics continue to predict maintenance effort, supporting their generalizability beyond the training set.

顶级标签: llm systems model evaluation
详细标签: complexity metrics prompt engineering llm-integrated applications maintenance effort hoare logic 或 搜索:

重新思考集成大语言模型应用的复杂度度量:超越源代码层面 / Rethinking Complexity Metrics for LLM-Integrated Applications: Beyond Source Code


1️⃣ 一句话总结

这篇论文提出了一种名为HECATE的新工具,专门用于衡量集成大语言模型(LLM)的应用程序的复杂度,它创新性地将自然语言提示(prompt)也纳入评估,并经过实证检验筛选出十个有效指标,其中七个来自提示层面,证明了在代码之外,提示本身的复杂度也是影响应用维护工作的重要因素。

源自 arXiv: 2607.01903