AutoRPA:通过大语言模型驱动的交互代码合成实现高效GUI自动化 / AutoRPA: Efficient GUI Automation through LLM-Driven Code Synthesis from Interactions
1️⃣ 一句话总结
本文提出AutoRPA框架,将大语言模型的灵活决策能力转化为可复用的高效自动化脚本,既能像传统RPA一样快速执行重复任务,又能自动从交互中学习,显著降低计算开销和人工成本。
Large Language Model (LLM) based agents have demonstrated proficiency in multi-step interactions with graphical user interfaces (GUIs). While most research focuses on improving single-task performance, practical scenarios often involve repetitive GUI tasks for which invoking LLM reasoning repeatedly, i.e., the ReAct paradigm, is inefficient. Prior to LLMs, traditional Robotic Process Automation (RPA) offers runtime efficiency but demands significant manual effort to develop and maintain. To bridge this gap, we propose AutoRPA, a framework that automatically distills the decision logic of ReAct-style agents into robust RPA functions. AutoRPA introduces two core innovations: (1) A translator-builder pipeline, where a translator agent converts hard-coded ReAct actions into soft-coded procedures, and a builder agent synthesizes robust RPA functions via retrieval-augmented generation over multiple trajectories; (2) A hybrid repair strategy during code verification, combining RPA execution with ReAct-based fallback for iterative refinement. Experiments across multiple GUI environments demonstrate that RPA functions generated by AutoRPA successfully solve similar tasks while reducing token usage by 82% to 96%, significantly improving runtime efficiency and reusability.
AutoRPA:通过大语言模型驱动的交互代码合成实现高效GUI自动化 / AutoRPA: Efficient GUI Automation through LLM-Driven Code Synthesis from Interactions
本文提出AutoRPA框架,将大语言模型的灵活决策能力转化为可复用的高效自动化脚本,既能像传统RPA一样快速执行重复任务,又能自动从交互中学习,显著降低计算开销和人工成本。
源自 arXiv: 2605.21082