对决赌博机中孔多塞胜者识别的采样复杂度 / The Sampling Complexity of Condorcet Winner Identification in Dueling Bandits
1️⃣ 一句话总结
这篇论文提出了一种新的识别方法,通过利用所有候选选项之间的两两比较信息,而非仅仅关注胜者与其他选项的比较,显著降低了在随机对决赌博机模型中准确找出最优选项所需的最小样本量,并首次给出了该问题的理论最优样本复杂度界限。
We study best-arm identification in stochastic dueling bandits under the sole assumption that a Condorcet winner exists, i.e., an arm that wins each noisy pairwise comparison with probability at least $1/2$. We introduce a new identification procedure that exploits the full gap matrix $\Delta_{i,j}=q_{i,j}-\tfrac12$ (where $q_{i,j}$ is the probability that arm $i$ beats arm $j$), rather than only the gaps between the Condorcet winner and the other arms. We derive high-probability, instance-dependent sample-complexity guarantees that (up to logarithmic factors) improve the best known ones by leveraging informative comparisons beyond those involving the winner. We complement these results with new lower bounds which, to our knowledge, are the first for Condorcet-winner identification in stochastic dueling bandits. Our lower-bound analysis isolates the intrinsic cost of locating informative entries in the gap matrix and estimating them to the required confidence, establishing the optimality of our non-asymptotic bounds. Overall, our results reveal new regimes and trade-offs in the sample complexity that are not captured by asymptotic analyses based only on the expected budget.
对决赌博机中孔多塞胜者识别的采样复杂度 / The Sampling Complexity of Condorcet Winner Identification in Dueling Bandits
这篇论文提出了一种新的识别方法,通过利用所有候选选项之间的两两比较信息,而非仅仅关注胜者与其他选项的比较,显著降低了在随机对决赌博机模型中准确找出最优选项所需的最小样本量,并首次给出了该问题的理论最优样本复杂度界限。
源自 arXiv: 2603.15189