Geometry-Aware Offline-to-Online Learning in Linear Contextual Bandits

📄 Abstract - Geometry-Aware Offline-to-Online Learning in Linear Contextual Bandits

We study offline-to-online learning in linear contextual bandits with biased offline regression data: the offline parameter need not match the online one, so history should not be treated as a single warm start. We model directional transfer with a shift certificate $(M_{\mathrm{shift}},\rho)$ and offline ridge estimation, yielding a geometry-aware confidence region for the online parameter rather than an isotropic radius. We propose \emph{Ellipsoidal-MINUCB}, which combines a standard online branch with an offline-informed pooled branch and uses offline information only when it tightens uncertainty. With high probability, regret is bounded by the minimum of a standard SupLinUCB-style fallback and a pooled term that separates statistical width from a certificate-weighted shift penalty. Under a simple alignment condition, the pooled term further simplifies to a rate governed by an effective dimension induced by the offline geometry. We also show that a purely Euclidean (scalar) shift bound, by itself, does not determine which feature directions are transferable. Beyond this fixed certificate, we show how to learn a data-driven certificate from data at finitely many refresh times and establish a high-probability regret bound for Ellipsoidal-MINUCB with epoch-wise learned certificates. Experiments match the main prediction: gains are strongest at intermediate horizons when offline coverage and transferability align, while the method otherwise tracks the safe online baseline.

线性上下文多臂赌博机中基于几何结构的离线到在线学习 / Geometry-Aware Offline-to-Online Learning in Linear Contextual Bandits

1️⃣ 一句话总结

本文提出了一种新方法，让在线学习系统在遇到与历史数据不完全匹配的离线参数时，能够智能地利用离线信息中仅有用的部分来指导决策，从而在保持安全性的同时提升学习效率，并通过实验验证了在离线数据覆盖方向与当前任务一致时效果最佳。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要