← 返回列表

arXiv 提交日期: 2026-05-20

📄 Abstract - MTR-Suite: A Framework for Evaluating and Synthesizing Conversational Retrieval Benchmarks

Accurate evaluation of conversational retrieval is pivotal for advancing Retrieval-Augmented Generation (RAG) systems. However, existing conversational retrieval benchmarks suffer from costly, sparse human annotation or rigid, unnatural automated heuristics. To address these challenges, we introduce MTR-Suite, a unified framework for auditing, synthesizing, and benchmarking retrieval. It features: (1) MTR-Eval, an LLM-based auditor quantifying alignment gaps in previous benchmarks; (2) MTR-Pipeline, a multi-agent system using greedy traversal clustering to generate high-fidelity dialogues at 1/400th human cost; and (3) MTR-Bench, a rigorous general-domain benchmark. MTR-Bench mimics production-style challenges (hard topic switching, verbosity), offering superior discriminative power. We make our code and data publicly available to facilitate future research at this https URL.

顶级标签: llm multi-agents benchmark

MTR-Suite：一个用于评估和合成对话检索基准的框架 / MTR-Suite: A Framework for Evaluating and Synthesizing Conversational Retrieval Benchmarks

1️⃣ 一句话总结

本文提出了MTR-Suite框架，通过大语言模型驱动的自动审计工具和低成本的对话生成系统，解决了现有对话检索基准中人工标注昂贵、自动化数据不自然的问题，并构建了一个更具区分力的通用基准测试集。

👋 没兴趣 ☆ 感兴趣 📌 待读

打开原文 PDF

源自 arXiv: 2605.20729

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要