部分反馈在线学习 / Partial Feedback Online Learning
1️⃣ 一句话总结
这篇论文研究了在每次学习时只有一个正确答案被揭示、但多个答案都算对的学习场景,提出了新的理论框架来精确刻画在这种‘部分反馈’设定下,确定性和随机性学习算法能达到的最佳性能界限,并揭示了其与更宽松学习设定之间的根本性差异。
We study partial-feedback online learning, where each instance admits a set of correct labels, but the learner only observes one correct label per round; any prediction within the correct set is counted as correct. This model captures settings such as language generation, where multiple responses may be valid but data provide only a single reference. We give a near-complete characterization of minimax regret for both deterministic and randomized learners in the set-realizable regime, i.e., in the regime where sublinear regret is generally attainable. For deterministic learners, we introduce the Partial-Feedback Littlestone dimension (PFLdim) and show it precisely governs learnability and minimax regret; technically, PFLdim cannot be defined via the standard version space, requiring a new collection version space viewpoint and an auxiliary dimension used only in the proof. We further develop the Partial-Feedback Measure Shattering dimension (PMSdim) to obtain tight bounds for randomized learners. We identify broad conditions ensuring inseparability between deterministic and randomized learnability (e.g., finite Helly number or nested-inclusion label structure), and extend the argument to set-valued online learning, resolving an open question of Raman et al. [2024b]. Finally, we show a sharp separation from weaker realistic and agnostic variants: outside set realizability, the problem can become information-theoretically intractable, with linear regret possible even for $|H|=2$. This highlights the need for fundamentally new, noise-sensitive complexity measures to meaningfully characterize learnability beyond set realizability.
部分反馈在线学习 / Partial Feedback Online Learning
这篇论文研究了在每次学习时只有一个正确答案被揭示、但多个答案都算对的学习场景,提出了新的理论框架来精确刻画在这种‘部分反馈’设定下,确定性和随机性学习算法能达到的最佳性能界限,并揭示了其与更宽松学习设定之间的根本性差异。
源自 arXiv: 2601.21462