菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-01-29
📄 Abstract - Partial Feedback Online Learning

We study partial-feedback online learning, where each instance admits a set of correct labels, but the learner only observes one correct label per round; any prediction within the correct set is counted as correct. This model captures settings such as language generation, where multiple responses may be valid but data provide only a single reference. We give a near-complete characterization of minimax regret for both deterministic and randomized learners in the set-realizable regime, i.e., in the regime where sublinear regret is generally attainable. For deterministic learners, we introduce the Partial-Feedback Littlestone dimension (PFLdim) and show it precisely governs learnability and minimax regret; technically, PFLdim cannot be defined via the standard version space, requiring a new collection version space viewpoint and an auxiliary dimension used only in the proof. We further develop the Partial-Feedback Measure Shattering dimension (PMSdim) to obtain tight bounds for randomized learners. We identify broad conditions ensuring inseparability between deterministic and randomized learnability (e.g., finite Helly number or nested-inclusion label structure), and extend the argument to set-valued online learning, resolving an open question of Raman et al. [2024b]. Finally, we show a sharp separation from weaker realistic and agnostic variants: outside set realizability, the problem can become information-theoretically intractable, with linear regret possible even for $|H|=2$. This highlights the need for fundamentally new, noise-sensitive complexity measures to meaningfully characterize learnability beyond set realizability.

顶级标签: theory machine learning
详细标签: online learning partial feedback regret analysis minimax version space 或 搜索:

部分反馈在线学习 / Partial Feedback Online Learning


1️⃣ 一句话总结

这篇论文研究了在每次学习时只有一个正确答案被揭示、但多个答案都算对的学习场景,提出了新的理论框架来精确刻画在这种‘部分反馈’设定下,确定性和随机性学习算法能达到的最佳性能界限,并揭示了其与更宽松学习设定之间的根本性差异。

源自 arXiv: 2601.21462