Leveraging Similarities in Multi-Armed Bandits

📄 Abstract - Leveraging Similarities in Multi-Armed Bandits

In many online learning and bandit problems, the actions we consider possess inherent similarities--for instance because they share latent traits, tags, or hierarchical structure. We study online learning with a similarity-structured action set, encoded by a rooted tree whose leaves are the actions and whose levels quantify how closely two actions are related. The loss sequence is assumed tree-compatible: losses of similar actions are constrained to be close. We establish an impossibility result showing that usual one-point bandit feedback cannot, in general, leverage range or tree-induced similarity, even under very strong similarity constraints. We then provide a unified set of algorithms which adapt to a wide range of richer feedback models, from semi-bandit feedback down to multi-point bandit protocols, including the minimal two-point feedback setting. We show these algorithms exhibit best-of-both-worlds guarantees and provably exploit action similarities by replacing the number of actions $K$ by a similarity-aware effective number of actions $K_{\mathrm{eff}}$ in the regret bounds. As an application, we show that under two-point feedback, it is possible to achieve $\sqrt{T}$ regret in Lipschitz bandits when $d \leq 2$.

利用多臂老虎机中的相似性 / Leveraging Similarities in Multi-Armed Bandits

1️⃣ 一句话总结

本文针对具有相似性的动作集合（例如共享潜在特征或层级结构）的在线学习问题，证明了传统单点反馈无法利用这些相似性，随后提出了一套能在更丰富的反馈模式下（如半赌博反馈或两点反馈）自动利用相似性、降低遗憾上界的通用算法，并在二维Lipschitz老虎机中实现了√T级别的遗憾。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要