DeepTest工具竞赛2026:基于大语言模型的汽车助手基准测试 / DeepTest Tool Competition 2026: Benchmarking an LLM-Based Automotive Assistant
1️⃣ 一句话总结
这篇论文介绍了2026年ICSE会议上举办的首届大语言模型测试竞赛,通过让四个测试工具挑战一个基于大语言模型的汽车手册问答应用,来评估它们发现系统遗漏安全警告等缺陷的能力和测试用例的多样性。
This report summarizes the results of the first edition of the Large Language Model (LLM) Testing competition, held as part of the DeepTest workshop at ICSE 2026. Four tools competed in benchmarking an LLM-based car manual information retrieval application, with the objective of identifying user inputs for which the system fails to appropriately mention warnings contained in the manual. The testing solutions were evaluated based on their effectiveness in exposing failures and the diversity of the discovered failure-revealing tests. We report on the experimental methodology, the competitors, and the results.
DeepTest工具竞赛2026:基于大语言模型的汽车助手基准测试 / DeepTest Tool Competition 2026: Benchmarking an LLM-Based Automotive Assistant
这篇论文介绍了2026年ICSE会议上举办的首届大语言模型测试竞赛,通过让四个测试工具挑战一个基于大语言模型的汽车手册问答应用,来评估它们发现系统遗漏安全警告等缺陷的能力和测试用例的多样性。
源自 arXiv: 2604.12615