Pocket Foundation Models: Distilling TFMs into CPU-Ready Gradient-Boosted Trees

📄 Abstract - Pocket Foundation Models: Distilling TFMs into CPU-Ready Gradient-Boosted Trees

A fraud scorer needs to answer in under 2 ms. The best tabular foundation models (TFMs) take 151-1,275 ms on GPU. We close this gap by distilling the TFM offline into an XGBoost or CatBoost student that runs natively on CPU. The central obstacle is specific to in-context learning (ICL) teachers: they leak labels when scoring their own training set, so the soft targets collapse to near-one-hot vectors with no inter-class structure left to distill. Stratified out-of-fold (OOF) teacher labeling prevents this. Across 153 classification datasets drawn from TALENT, OpenML-CC18, TabZilla, and TabArena, distilling TabICLv2 into XGBoost gives 0.882 macro-mean AUC (96.5% of teacher AUC) at 1.9 ms on CPU, a 38x to 860x speedup across teacher-student pairs with a statistically significant edge over a tuned CatBoost baseline (Wilcoxon p = 0.0008; 51% win rate). Four further findings: teacher rank transfers exactly to student rank; gains concentrate on low-dimensional data (< 21 features: +0.011 over CatBoost vs. >21 features: +0.001); multi-teacher averaging helps MLP students (+0.006, p = 0.003) but adds less than 0.001 for tree students; and on high-dimensional tasks where the teacher itself trails CatBoost, distillation makes things worse rather than better. The full pipeline is open-sourced as part of the TabTune library.

袖珍基础模型：将表格基础模型蒸馏为可在CPU上运行的高效梯度提升树 / Pocket Foundation Models: Distilling TFMs into CPU-Ready Gradient-Boosted Trees

1️⃣ 一句话总结

本文提出一种将大型表格基础模型（TFM）的知识蒸馏为轻量级梯度提升树（如XGBoost和CatBoost）的方法，核心创新在于采用分层式留出法（Stratified OOF）解决教师模型在自训练集上标签泄露导致软标签失效的问题，使学生模型在仅需1.9毫秒的CPU推理时间内达到教师模型96.5%的AUC性能，实现了数十至数百倍的加速。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要