菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-17
📄 Abstract - Reinforcement Learning Foundation Models Should Already Be A Thing

Foundation models for language and vision are powered by internet-scale data, while structured domains (tabular prediction, time-series forecasting, graph learning, reinforcement learning) are not. The substitute is synthetic data, which shifts the burden from collection to prior design. Such priors already exist for many structured tasks: TabPFN and its successors solve tabular classification with a transformer pretrained on a synthetic Bayesian prior. We make two points. \textbf{First}, reinforcement learning is the conspicuous gap: sampling a synthetic MDP is as feasible as sampling a synthetic tabular dataset, yet no in-context RL work treats prior design as a primary objective. \textbf{Second}, MDPs admit a fixed-size sufficient statistic, independent of the episodes observed and tabular in shape, which makes them directly amenable to the attention-based architectures used for tabular foundation models, with a policy head replacing the supervised target. Together these define the agenda for an RL foundation model. As a proof of concept, we train one model entirely on synthetic MDPs and show that, with no task-specific tuning, it solves held-out tabular benchmarks in context, both online and offline: online, in far fewer episodes than UCB-VI and tabular Q-learning, and offline, competitively with VI-LCB.

顶级标签: reinforcement learning machine learning model training
详细标签: foundation model synthetic mdp in-context learning tabular reinforcement learning attention architecture 或 搜索:

强化学习基础模型应当已成现实 / Reinforcement Learning Foundation Models Should Already Be A Thing


1️⃣ 一句话总结

本文指出,如同表格预测领域利用合成数据成功构建基础模型一样,强化学习也能通过合成马尔可夫决策过程(MDP)来预训练一个通用的上下文学习模型,并用实验证明该模型无需微调即可高效解决在线和离线任务。

源自 arXiv: 2606.18812