EcomBench:面向电子商务领域的基础智能体综合评估基准 / EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce
1️⃣ 一句话总结
这篇论文提出了一个名为EcomBench的综合性评估基准,它基于真实的全球电商平台用户需求构建,旨在全面测试智能体在复杂、动态的真实电商环境中的深度信息检索、多步推理和跨源知识整合等核心能力。
Foundation agents have rapidly advanced in their ability to reason and interact with real environments, making the evaluation of their core capabilities increasingly important. While many benchmarks have been developed to assess agent performance, most concentrate on academic settings or artificially designed scenarios while overlooking the challenges that arise in real applications. To address this issue, we focus on a highly practical real-world setting, the e-commerce domain, which involves a large volume of diverse user interactions, dynamic market conditions, and tasks directly tied to real decision-making processes. To this end, we introduce EcomBench, a holistic E-commerce Benchmark designed to evaluate agent performance in realistic e-commerce environments. EcomBench is built from genuine user demands embedded in leading global e-commerce ecosystems and is carefully curated and annotated through human experts to ensure clarity, accuracy, and domain relevance. It covers multiple task categories within e-commerce scenarios and defines three difficulty levels that evaluate agents on key capabilities such as deep information retrieval, multi-step reasoning, and cross-source knowledge integration. By grounding evaluation in real e-commerce contexts, EcomBench provides a rigorous and dynamic testbed for measuring the practical capabilities of agents in modern e-commerce.
EcomBench:面向电子商务领域的基础智能体综合评估基准 / EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce
这篇论文提出了一个名为EcomBench的综合性评估基准,它基于真实的全球电商平台用户需求构建,旨在全面测试智能体在复杂、动态的真实电商环境中的深度信息检索、多步推理和跨源知识整合等核心能力。
源自 arXiv: 2512.08868