菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-20
📄 Abstract - MTR-Suite: A Framework for Evaluating and Synthesizing Conversational Retrieval Benchmarks

Accurate evaluation of conversational retrieval is pivotal for advancing Retrieval-Augmented Generation (RAG) systems. However, existing conversational retrieval benchmarks suffer from costly, sparse human annotation or rigid, unnatural automated heuristics. To address these challenges, we introduce MTR-Suite, a unified framework for auditing, synthesizing, and benchmarking retrieval. It features: (1) MTR-Eval, an LLM-based auditor quantifying alignment gaps in previous benchmarks; (2) MTR-Pipeline, a multi-agent system using greedy traversal clustering to generate high-fidelity dialogues at 1/400th human cost; and (3) MTR-Bench, a rigorous general-domain benchmark. MTR-Bench mimics production-style challenges (hard topic switching, verbosity), offering superior discriminative power. We make our code and data publicly available to facilitate future research at this https URL.

顶级标签: llm multi-agents benchmark
详细标签: retrieval-augmented generation conversational retrieval evaluation framework dialogue synthesis 或 搜索:

MTR-Suite:一个用于评估和合成对话检索基准的框架 / MTR-Suite: A Framework for Evaluating and Synthesizing Conversational Retrieval Benchmarks


1️⃣ 一句话总结

本文提出了MTR-Suite框架,通过大语言模型驱动的自动审计工具和低成本的对话生成系统,解决了现有对话检索基准中人工标注昂贵、自动化数据不自然的问题,并构建了一个更具区分力的通用基准测试集。

源自 arXiv: 2605.20729