菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-25
📄 Abstract - ProactiveMobile: A Comprehensive Benchmark for Boosting Proactive Intelligence on Mobile Devices

Multimodal large language models (MLLMs) have made significant progress in mobile agent development, yet their capabilities are predominantly confined to a reactive paradigm, where they merely execute explicit user commands. The emerging paradigm of proactive intelligence, where agents autonomously anticipate needs and initiate actions, represents the next frontier for mobile agents. However, its development is critically bottlenecked by the lack of benchmarks that can address real-world complexity and enable objective, executable evaluation. To overcome these challenges, we introduce ProactiveMobile, a comprehensive benchmark designed to systematically advance research in this domain. ProactiveMobile formalizes the proactive task as inferring latent user intent across four dimensions of on-device contextual signals and generating an executable function sequence from a comprehensive function pool of 63 APIs. The benchmark features over 3,660 instances of 14 scenarios that embrace real-world complexity through multi-answer annotations. To ensure quality, a team of 30 experts conducts a final audit of the benchmark, verifying factual accuracy, logical consistency, and action feasibility, and correcting any non-compliant entries. Extensive experiments demonstrate that our fine-tuned Qwen2.5-VL-7B-Instruct achieves a success rate of 19.15%, outperforming o1 (15.71%) and GPT-5 (7.39%). This result indicates that proactivity is a critical competency widely lacking in current MLLMs, yet it is learnable, emphasizing the importance of the proposed benchmark for proactivity evaluation.

顶级标签: agents benchmark multi-modal
详细标签: mobile agents proactive intelligence multimodal llm evaluation benchmark function calling 或 搜索:

ProactiveMobile:一个用于提升移动设备主动智能的综合基准 / ProactiveMobile: A Comprehensive Benchmark for Boosting Proactive Intelligence on Mobile Devices


1️⃣ 一句话总结

这篇论文提出了一个名为ProactiveMobile的综合性基准测试,旨在推动移动设备从被动执行指令向主动预测用户需求并执行行动的智能范式转变,并通过实验证明当前主流模型在此能力上普遍不足,但可以通过学习来提升。

源自 arXiv: 2602.21858