感知评判:以人为中心的偏好驱动判断框架 / SenseJudge: Human-Centric Preference-Driven Judgment Framework
1️⃣ 一句话总结
本文提出了一种名为SenseJudge的灵活判断框架,它能够根据用户的个性化偏好对AI模型回答进行自动评估和排名,从而克服了传统固定偏好评判方法难以适应真实对话场景的局限。
Large Language Models (LLMs) as judges across various scenarios such as assessing model responses is becoming an increasingly accepted paradigm. However, existing judgment approaches often rely on trained judgers using fixed preference data, which tend to overlook diverse user preferences and struggle to adapt to real-world human-AI dialogue scenarios. To address these limitations, we propose SenseJudge, a customizable judgment framework driven by human preferences and SenseBench, a diverse and challenging instruction-following benchmark derived from real-world multi-turn interactions. We applied the automatic judgment framework and benchmark to two tasks: (1) LLMs as personalized judges, and (2) model ranking. We conducted extensive experiments, and the results demonstrate that the SenseJudge framework surpasses other judgment methods and models in the LLMs-as-personalized-judges task and achieves model ranking that aligns with real human sense. Additionally, we conducted analyses on position bias and consistency, alongside ablation studies, which affirmed the robustness of SenseJudge.
感知评判:以人为中心的偏好驱动判断框架 / SenseJudge: Human-Centric Preference-Driven Judgment Framework
本文提出了一种名为SenseJudge的灵活判断框架,它能够根据用户的个性化偏好对AI模型回答进行自动评估和排名,从而克服了传统固定偏好评判方法难以适应真实对话场景的局限。
源自 arXiv: 2606.03189