利用LLM评审员和有限人工进行最优选项识别 / Best Arm Identification with LLM Judges and Limited Human
1️⃣ 一句话总结
这篇论文提出了一种新算法,在只能有限使用昂贵人工审核的情况下,通过智能结合有偏见的AI(如大语言模型)评分和少量人工反馈,高效且准确地从多个选项中找出最佳选择,解决了传统方法可能选错或资源浪费的问题。
We study fixed-confidence best-arm identification (BAI) where a cheap but potentially biased proxy (e.g., LLM judge) is available for every sample, while an expensive ground-truth label can only be acquired selectively when using a human for auditing. Unlike classical multi-fidelity BAI, the proxy is biased (arm- and context-dependent) and ground truth is selectively observed. Consequently, standard multi-fidelity methods can mis-select the best arm, and uniform auditing, though accurate, wastes scarce resources and is inefficient. We prove that without bias correction and propensity adjustment, mis-selection probability may not vanish (even with unlimited proxy data). We then develop an estimator for the mean of each arm that combines proxy scores with inverse-propensity-weighted residuals and form anytime-valid confidence sequences for that estimator. Based on the estimator and confidence sequence, we propose an algorithm that adaptively selects and audits arms. The algorithm concentrates audits on unreliable contexts and close arms and we prove that a plug-in Neyman rule achieves near-oracle audit efficiency. Numerical experiments confirm the theoretical guarantees and demonstrate the superior empirical performance of the proposed algorithm.
利用LLM评审员和有限人工进行最优选项识别 / Best Arm Identification with LLM Judges and Limited Human
这篇论文提出了一种新算法,在只能有限使用昂贵人工审核的情况下,通过智能结合有偏见的AI(如大语言模型)评分和少量人工反馈,高效且准确地从多个选项中找出最佳选择,解决了传统方法可能选错或资源浪费的问题。
源自 arXiv: 2601.21471