菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-16
📄 Abstract - One Good Source is All You Need: Near-Optimal Regret for Bandits under Heterogeneous Noise

We study $K$-armed Multiarmed Bandit (MAB) problem with $M$ heterogeneous data sources, each exhibiting unknown and distinct noise variances $\{\sigma_j^2\}_{j=1}^M$. The learner's objective is standard MAB regret minimization, with the additional complexity of adaptively selecting which data source to query from at each round. We propose Source-Optimistic Adaptive Regret minimization (SOAR), a novel algorithm that quickly prunes high-variance sources using sharp variance-concentration bounds, followed by a `balanced min-max LCB-UCB approach' that seamlessly integrates the parallel tasks of identifying the best arm and the optimal (minimum-variance) data source. Our analysis shows SOAR achieves an instance-dependent regret bound of $\tilde{O}\left({\sigma^*}^2\sum_{i=2}^K \frac{\log T}{\Delta_i} + \sqrt{K \sum_{j=1}^M \sigma_j^2}\right)$, up to preprocessing costs depending only on problem parameters, where ${\sigma^*}^2 := \min_j \sigma_j^2$ is the minimum source variance and $\Delta_i$ denotes the suboptimality gap of the $i$-th arm. This result is both surprising as despite lacking prior knowledge of the minimum-variance source among $M$ alternatives, SOAR attains the optimal instance-dependent regret of standard single-source MAB with variance ${\sigma^*}^2$, while incurring only an small (and unavoidable) additive cost of $\tilde O(\sqrt{K \sum_{j=1}^M \sigma_j^2})$ towards the optimal (minimum variance) source identification. Our theoretical bounds represent a significant improvement over some proposed baselines, e.g. Uniform UCB or Explore-then-Commit UCB, which could potentially suffer regret scaling with $\sigma_{\max}^2$ in place of ${\sigma^*}^2$-a gap that can be arbitrarily large when $\sigma_{\max} \gg \sigma^*$. Experiments on multiple synthetic problem instances and the real-world MovieLens\;25M dataset, demonstrating the superior performance of SOAR over the baselines.

顶级标签: theory machine learning reinforcement learning
详细标签: multi-armed bandit regret minimization heterogeneous noise adaptive source selection instance-dependent bounds 或 搜索:

一个优质信源足矣:异构噪声下赌博机问题的近最优遗憾 / One Good Source is All You Need: Near-Optimal Regret for Bandits under Heterogeneous Noise


1️⃣ 一句话总结

这篇论文提出了一种名为SOAR的新算法,它能在多个具有不同噪声水平的数据源中,快速识别并主要利用噪声最小的那个‘优质信源’,从而在解决多臂赌博机问题时,达到与事先知道最佳信源时几乎相同的性能上限,显著优于传统方法。

源自 arXiv: 2602.14474