📄
Abstract - How Coding Agents Fail Their Users: A Large-Scale Analysis of Developer-Agent Misalignment in 20,574 Real-World Sessions
AI coding agents increasingly act directly within software environments, yet existing analyses of their failures rely on benchmark trajectories that miss how developers actually experience misalignment. We present an observational study of 20,574 coding-agent sessions from 1,639 repositories across IDE and CLI workflows. We operationalize misalignment as a breakdown made visible through developer pushback, and annotate each episode along four axes: form, cause, cost, and resolution. We identify seven recurring forms, spanning how agents read projects, interpret developer intent, follow rules, bound their actions, implement and execute code, and report progress. 90.50\% of episodes impose effort and trust costs rather than irreversible system damage, yet 91.49\% of visible resolutions still require explicit user correction. Misalignment patterns also differ across IDE and CLI settings, persist across adjacent sessions, and shift over time: while overall rates decline, constraint violations and inaccurate self-reporting grow in share. Our findings inform the design of training, evaluation, and interfaces for keeping coding agents aligned with real developer workflows.
编程助手为何让用户失望:基于20,574个真实会话的大规模开发者与智能体错位分析 /
How Coding Agents Fail Their Users: A Large-Scale Analysis of Developer-Agent Misalignment in 20,574 Real-World Sessions
1️⃣ 一句话总结
本文通过对逾两万个真实编程会话的深入分析,揭示了AI编程助手在理解项目、把握意图、遵循规则、行动边界、代码执行和进度汇报中反复出现的七类典型问题,其中九成故障虽不造成系统损坏,但绝大多数仍需用户手动纠正,为改进训练数据和交互设计提供了实证依据。