← 返回列表

arXiv 提交日期: 2026-04-14

📄 Abstract - DeepTest Tool Competition 2026: Benchmarking an LLM-Based Automotive Assistant

This report summarizes the results of the first edition of the Large Language Model (LLM) Testing competition, held as part of the DeepTest workshop at ICSE 2026. Four tools competed in benchmarking an LLM-based car manual information retrieval application, with the objective of identifying user inputs for which the system fails to appropriately mention warnings contained in the manual. The testing solutions were evaluated based on their effectiveness in exposing failures and the diversity of the discovered failure-revealing tests. We report on the experimental methodology, the competitors, and the results.

顶级标签: llm benchmark model evaluation

DeepTest工具竞赛2026：基于大语言模型的汽车助手基准测试 / DeepTest Tool Competition 2026: Benchmarking an LLM-Based Automotive Assistant

1️⃣ 一句话总结

这篇论文介绍了2026年ICSE会议上举办的首届大语言模型测试竞赛，通过让四个测试工具挑战一个基于大语言模型的汽车手册问答应用，来评估它们发现系统遗漏安全警告等缺陷的能力和测试用例的多样性。

👋 没兴趣 ☆ 感兴趣 📌 待读

打开原文 PDF

源自 arXiv: 2604.12615

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要