菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-21
📄 Abstract - Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language

At present, executable visual workflows have emerged as a mainstream paradigm in real-world industrial deployments, offering strong reliability and controllability. However, in current practice, such workflows are almost entirely constructed through manual engineering: developers must carefully design workflows, write prompts for each step, and repeatedly revise the logic as requirements evolve-making development costly, time-consuming, and error-prone. To study whether large language models can automate this multi-round interaction process, we introduce Chat2Workflow, a benchmark for generating executable visual workflows directly from natural language, and propose a robust agentic framework to mitigate recurrent execution errors. Chat2Workflow is built from a large collection of real-world business workflows, with each instance designed so that the generated workflow can be transformed and directly deployed to practical workflow platforms such as Dify and Coze. Experimental results show that while state-of-the-art language models can often capture high-level intent, they struggle to generate correct, stable, and executable workflows, especially under complex or changing requirements. Although our agentic framework yields up to 5.34% resolve rate gains, the remaining real-world gap positions Chat2Workflow as a foundation for advancing industrial-grade automation. Code is available at this https URL.

顶级标签: llm agents benchmark
详细标签: visual workflows execution natural language agentic framework industrial automation 或 搜索:

Chat2Workflow:用自然语言生成可执行可视化工作流的基准测试 / Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language


1️⃣ 一句话总结

本文提出了一个名为Chat2Workflow的基准测试,用于评估大语言模型能否从自然语言描述中自动生成可直接部署的可视化工作流,并设计了一个智能体框架来减少常见错误,实验表明当前模型虽能理解高层意图但生成稳定可执行的工作流仍有很大挑战。

源自 arXiv: 2604.19667