Relational In-Context Learning via Synthetic Pre-training with Structural Prior

📄 Abstract - Relational In-Context Learning via Synthetic Pre-training with Structural Prior

Relational Databases (RDBs) are the backbone of modern business, yet they lack foundation models comparable to those in text or vision. A key obstacle is that high-quality RDBs are private, scarce and structurally heterogeneous, making internet-scale pre-training infeasible. To overcome this data scarcity, We introduce $\textbf{RDB-PFN}$, the first relational foundation model trained purely via $\textbf{synthetic data}$. Inspired by Prior-Data Fitted Networks (PFNs) where synthetic data generated from Structural Causal Models (SCMs) enables reasoning on single tables, we design a $\textbf{Relational Prior Generator}$ to create an infinite stream of diverse RDBs from scratch. Pre-training on $\textbf{over 2 million}$ synthetic single-table and relational tasks, RDB-PFN learns to adapt to any new database instantly via genuine $\textbf{in-context learning}$. Experiments verify RDB-PFN achieves strong few-shot performance on 19 real-world relational prediction tasks, outperforming graph-based and single-table foundation-model baselines (given the same DFS-linearized inputs), while using a lightweight architecture and fast inference. The code is available at this https URL

基于结构先验合成预训练的关系型上下文学习 / Relational In-Context Learning via Synthetic Pre-training with Structural Prior

1️⃣ 一句话总结

这篇论文提出了首个完全基于合成数据训练的关系型数据库基础模型RDB-PFN，它通过生成大量多样化的模拟数据库进行预训练，从而能够快速适应任何新的真实数据库，并在少量样本下完成多种预测任务，性能优于现有方法。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要