菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-27
📄 Abstract - GUI-CIDER: Mid-training GUI Agents via Causal Internalization and Density-aware Exemplar Reselection

Despite the rapid progress of multimodal large language models in building Graphical User Interface (GUI) agents, their real-world task completion is fundamentally bottlenecked by a lack of world knowledge about GUI operations. Existing solutions typically rely on expensive multi-agent scaffolding or conventional post-training paradigms, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL). However, post-training only allows agents to implicitly absorb world knowledge through action annotations or reward signals, leading to inefficient trajectory memorization rather than genuine comprehension. Therefore, an approach that enables explicit learning of this knowledge is imperative. To this end, we propose GUI-CIDER, a mid-training method that explicitly internalizes GUI world knowledge through Causal Internalization and Density-aware Exemplar Reselection. GUI-CIDER operates in three stages: (1) data synthesis, which distills static planning and dynamic causal knowledge from GUI trajectories into text; (2) exemplar reselection, which filters the corpus by rewarding causal structures and penalizing semantic redundancy; and (3) mid-training, where the refined data is used to embed the acquired knowledge. Extensive experiments on two GUI knowledge benchmarks and three task completion benchmarks demonstrate that GUI-CIDER consistently improves both the agent's understanding of GUI operations and its task success this http URL codes are available at this https URL.

顶级标签: agents multi-modal
详细标签: gui agents mid-training causal internalization exemplar reselection knowledge distillation 或 搜索:

GUI-CIDER:通过因果内化与密度感知示例重选进行GUI智能体中期训练 / GUI-CIDER: Mid-training GUI Agents via Causal Internalization and Density-aware Exemplar Reselection


1️⃣ 一句话总结

本文提出一种名为GUI-CIDER的中期训练方法,通过将图形用户界面的操作知识显式提炼成文本,并利用因果结构奖励和去冗余筛选来优化训练数据,从而让GUI智能体真正理解操作逻辑而不仅仅是机械记忆,显著提升了任务完成能力。

源自 arXiv: 2605.28534