TabKD:通过习得特征分箱的交互多样性实现表格数据知识蒸馏 / TabKD: Tabular Knowledge Distillation through Interaction Diversity of Learned Feature Bins
1️⃣ 一句话总结
本文提出了一种名为TabKD的新方法,它通过自动学习数据特征的分组并生成能广泛覆盖不同特征组合的合成数据,从而在无需原始隐私数据的情况下,有效地将大型表格预测模型的知识压缩到小型模型中。
Data-free knowledge distillation enables model compression without original training data, critical for privacy-sensitive tabular domains. However, existing methods does not perform well on tabular data because they do not explicitly address feature interactions, the fundamental way tabular models encode predictive knowledge. We identify interaction diversity, systematic coverage of feature combinations, as an essential requirement for effective tabular distillation. To operationalize this insight, we propose TabKD, which learns adaptive feature bins aligned with teacher decision boundaries, then generates synthetic queries that maximize pairwise interaction coverage. Across 4 benchmark datasets and 4 teacher architectures, TabKD achieves highest student-teacher agreement in 14 out of 16 configurations, outperforming 5 state-of-the-art baselines. We further show that interaction coverage strongly correlates with distillation quality, validating our core hypothesis. Our work establishes interaction-focused exploration as a principled framework for tabular model extraction.
TabKD:通过习得特征分箱的交互多样性实现表格数据知识蒸馏 / TabKD: Tabular Knowledge Distillation through Interaction Diversity of Learned Feature Bins
本文提出了一种名为TabKD的新方法,它通过自动学习数据特征的分组并生成能广泛覆盖不同特征组合的合成数据,从而在无需原始隐私数据的情况下,有效地将大型表格预测模型的知识压缩到小型模型中。
源自 arXiv: 2603.15481