BAMI:无需训练的图形用户界面定位偏差缓解方法 / BAMI: Training-Free Bias Mitigation in GUI Grounding
1️⃣ 一句话总结
本文提出了一种名为BAMI的无需额外训练的方法,通过粗到细的聚焦和候选选择两步操作,有效减轻了高分辨率图像和复杂界面元素导致的定位偏差,从而显著提升了图形用户界面智能体在复杂场景中的点击和拖拽准确率。
GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance. Utilizing the proposed \textbf{Masked Prediction Distribution (MPD)} attribution method, we identify that the primary sources of errors are twofold: high image resolution (leading to precision bias) and intricate interface elements (resulting in ambiguity bias). To address these challenges, we introduce \textbf{Bias-Aware Manipulation Inference (BAMI)}, which incorporates two key manipulations, coarse-to-fine focus and candidate selection, to effectively mitigate these biases. Our extensive experimental results demonstrate that BAMI significantly enhances the accuracy of various GUI grounding models in a training-free setting. For instance, applying our method to the TianXi-Action-7B model boosts its accuracy on the ScreenSpot-Pro benchmark from 51.9\% to 57.8\%. Furthermore, ablation studies confirm the robustness of the BAMI approach across diverse parameter configurations, highlighting its stability and effectiveness. Code is available at this https URL.
BAMI:无需训练的图形用户界面定位偏差缓解方法 / BAMI: Training-Free Bias Mitigation in GUI Grounding
本文提出了一种名为BAMI的无需额外训练的方法,通过粗到细的聚焦和候选选择两步操作,有效减轻了高分辨率图像和复杂界面元素导致的定位偏差,从而显著提升了图形用户界面智能体在复杂场景中的点击和拖拽准确率。
源自 arXiv: 2605.06664