CutVerse:面向媒体后期剪辑的模块化GUI智能体基准测试 / CutVerse: A Compositional GUI Agents Benchmark for Media Post-Production Editing
1️⃣ 一句话总结
本文提出了CutVerse基准测试,专门用于评估AI智能体在专业媒体后期制作(如视频剪辑、图像编辑)中的操作能力,并揭示了现有智能体在复杂、长任务流程中成功率仅36%的显著局限。
While GUI agents have made significant progress in web navigation and basic operating system tasks, their capabilities in professional creative workflows remain largely underexplored. To bridge this gap, we introduce Cutverse, a benchmark designed to systematically evaluate autonomous GUI agents in realistic media post-production environments. We curate expert demonstrations across 7 professional applications (e.g., Premiere Pro, Photoshop), covering 186 complex, long-horizon tasks grounded in authentic editing workflows, involving dense multimodal interfaces and tightly coupled interaction sequences. To support scalable evaluation, we develop a lightweight parser that transforms raw screen recordings and low-level interaction logs into structured, compositional GUI action trajectories with precise grounding. Extensive evaluations reveal that existing agents achieve only 36.0\% task success on realistic media editing tasks, underscoring the challenges posed by complex, long-horizon media post-production workflows in our this http URL current models demonstrate promising spatial grounding, multimodal alignment, and coordinated action execution, they remain limited in long-horizon reliability and domain-specific planning.
CutVerse:面向媒体后期剪辑的模块化GUI智能体基准测试 / CutVerse: A Compositional GUI Agents Benchmark for Media Post-Production Editing
本文提出了CutVerse基准测试,专门用于评估AI智能体在专业媒体后期制作(如视频剪辑、图像编辑)中的操作能力,并揭示了现有智能体在复杂、长任务流程中成功率仅36%的显著局限。
源自 arXiv: 2605.19484