菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-05
📄 Abstract - Almost Asymptotically Optimal Active Clustering Through Pairwise Observations

We propose a new analysis framework for clustering $M$ items into an unknown number of $K$ distinct groups using noisy and actively collected responses. At each time step, an agent is allowed to query pairs of items and observe bandit binary feedback. If the pair of items belongs to the same (resp.\ different) cluster, the observed feedback is $1$ with probability $p>1/2$ (resp.\ $q<1/2$). Leveraging the ubiquitous change-of-measure technique, we establish a fundamental lower bound on the expected number of queries needed to achieve a desired confidence in the clustering accuracy, formulated as a sup-inf optimization problem. Building on this theoretical foundation, we design an asymptotically optimal algorithm in which the stopping criterion involves an empirical version of the inner infimum -- the Generalized Likelihood Ratio (GLR) statistic -- being compared to a threshold. We develop a computationally feasible variant of the GLR statistic and show that its performance gap to the lower bound can be accurately empirically estimated and remains within a constant multiple of the lower bound.

顶级标签: theory machine learning model evaluation
详细标签: active clustering pairwise queries bandit feedback asymptotic optimality change-of-measure 或 搜索:

基于成对观测的几乎渐进最优主动聚类 / Almost Asymptotically Optimal Active Clustering Through Pairwise Observations


1️⃣ 一句话总结

这篇论文提出了一种新的主动学习框架,通过智能地询问物品之间的相似性来高效地对它们进行分组,并设计了一个在理论上接近最优效率的算法。

源自 arXiv: 2602.05690