菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-10
📄 Abstract - MILE-RefHumEval: A Reference-Free, Multi-Independent LLM Framework for Human-Aligned Evaluation

We introduce MILE-RefHumEval, a reference-free framework for evaluating Large Language Models (LLMs) without ground-truth annotations or evaluator coordination. It leverages an ensemble of independently prompted evaluators guided by a human-aligned schema, supporting both discrete and continuous scoring judgement. With task-specific prompts from best candidate selection, summarization and image captioning to dialogue, MILE-RefHumEval provides flexible, interpretable, and scalable assessments. Experiments show it aligns closely with human judgments, outperforms prior methods, and reduces computational overhead, offering an efficient, robust, and human-aligned solution for real-world LLM evaluation.

顶级标签: llm model evaluation benchmark
详细标签: reference-free evaluation human alignment ensemble methods scoring framework llm assessment 或 搜索:

MILE-RefHumEval:一种无需参考答案、多独立大语言模型的人类对齐评估框架 / MILE-RefHumEval: A Reference-Free, Multi-Independent LLM Framework for Human-Aligned Evaluation


1️⃣ 一句话总结

这篇论文提出了一个名为MILE-RefHumEval的新评估框架,它通过让多个独立的大语言模型按照一套符合人类偏好的标准进行打分,从而能在没有标准答案的情况下,高效、可靠地评估其他大语言模型在各种任务上的表现。

源自 arXiv: 2602.09624