菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-10-22
📄 Abstract - DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile Phone Agents

Mobile Phone Agents (MPAs) have emerged as a promising research direction due to their broad applicability across diverse scenarios. While Multimodal Large Language Models (MLLMs) serve as the foundation for MPAs, their effectiveness in handling multiple mobile phone tasks simultaneously remains limited. Although multitask supervised fine-tuning (SFT) is widely adopted for multitask learning, existing approaches struggle to determine optimal training data compositions for peak performance. To address this challenge, we propose DaMo (Data Mixture Optimizer) - a novel solution employing a trainable network that predicts optimal data mixtures by forecasting downstream task performance for any given dataset ratio. To support comprehensive evaluation, we introduce PhoneAgentBench, the first specialized benchmark to evaluate MLLMs on multimodal mobile phone tasks, comprising 1235 QA pairs spanning diverse real-world industrial mobile application scenarios. Demonstrating strong predictive capability (R^2=0.81) in small-scale pilot experiments, DaMo efficiently extrapolates optimal data mixing configurations. Our results show DaMo achieves a 3.38% performance improvement on PhoneAgentBench compared to alternative methods. Furthermore, extensive experiments across established benchmarks including BFCL-v3, MME-Reasoning, MME-Perception, and OCRBench reveal DaMo's superior generalization, outperforming other approaches by 2.57% in terms of average score. When used solely for MLLM optimization on the BFCL-v3 task, DaMo improves the metrics by 12.47% than other methods. Notably, DaMo maintains robust scalability, preserving its effectiveness when applied to other model architectures. The code and dataset are available at this https URL

顶级标签: multi-modal model training agents
详细标签: data mixing fine-tuning optimization mobile phone agents benchmark performance prediction 或 搜索:

DaMo:通过下游任务性能预测优化多模态大语言模型的数据混合 / DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile Phone Agents


1️⃣ 一句话总结

本文提出了一种名为DaMo的数据混合优化方法,通过一个可训练的神经网络预测不同数据混合比例下的下游任务性能,从而自动寻找最优训练数据配置,并首次构建了用于评估移动手机智能体多模态能力的专用基准PhoneAgentBench。


2️⃣ 论文创新点

1. DaMo(数据混合优化器)

2. PhoneAgentBench基准

3. 模型无关的线性映射扩展


3️⃣ 主要结果与价值

结果亮点

实际价值


4️⃣ 术语表

源自 arXiv: 2510.19336