菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-19
📄 Abstract - Flickering Multi-Armed Bandits

We introduce Flickering Multi-Armed Bandits (FMAB), a new MAB framework where the set of available arms (or actions) can change at each round, and the available set at any time may depend on the agent's previously selected arm. We model this constrained, evolving availability using random graph processes, where arms are nodes and the agent's movement is restricted to its local neighborhood. We analyze this problem under two random graph models: an i.i.d. Erdős--Rényi (ER) process and an Edge-Markovian process. We propose and analyze a two-phase algorithm that employs a lazy random walk for exploration to efficiently identify the optimal arm, followed by a navigation and commitment phase for exploitation. We establish high-probability and expected sublinear regret bounds for both graph settings. We show that the exploration cost of our algorithm is near-optimal by establishing a matching information-theoretic lower bound for this problem class, highlighting the fundamental cost of exploration under local-move constraints. We complement our theoretical guarantees with numerical simulations, including a scenario of a robotic ground vehicle scouting a disaster-affected region.

顶级标签: theory reinforcement learning agents
详细标签: multi-armed bandits graph processes regret analysis exploration-exploitation random walk 或 搜索:

闪烁多臂老虎机 / Flickering Multi-Armed Bandits


1️⃣ 一句话总结

这篇论文提出了一个名为‘闪烁多臂老虎机’的新框架,用于解决在每一轮决策中可选‘手臂’(或行动)会动态变化且受先前选择限制的强化学习问题,并通过结合随机游走探索和导航利用的两阶段算法,在多种随机图模型下实现了接近最优的后悔上界。

源自 arXiv: 2602.17315