TabClustPFN: A Prior-Fitted Network for Tabular Data Clustering

📄 Abstract - TabClustPFN: A Prior-Fitted Network for Tabular Data Clustering

Clustering tabular data is a fundamental yet challenging problem due to heterogeneous feature types, diverse data-generating mechanisms, and the absence of transferable inductive biases across datasets. Prior-fitted networks (PFNs) have recently demonstrated strong generalization in supervised tabular learning by amortizing Bayesian inference under a broad synthetic prior. Extending this paradigm to clustering is nontrivial: clustering is unsupervised, admits a combinatorial and permutation-invariant output space, and requires inferring the number of clusters. We introduce TabClustPFN, a prior-fitted network for tabular data clustering that performs amortized Bayesian inference over both cluster assignments and cluster cardinality. Pretrained on synthetic datasets drawn from a flexible clustering prior, TabClustPFN clusters unseen datasets in a single forward pass, without dataset-specific retraining or hyperparameter tuning. The model naturally handles heterogeneous numerical and categorical features and adapts to a wide range of clustering structures. Experiments on synthetic data and curated real-world tabular benchmarks show that TabClustPFN outperforms classical, deep, and amortized clustering baselines, while exhibiting strong robustness in out-of-the-box exploratory settings. Code is available at this https URL.

TabClustPFN：一种用于表格数据聚类的先验拟合网络 / TabClustPFN: A Prior-Fitted Network for Tabular Data Clustering

1️⃣ 一句话总结

这篇论文提出了一种名为TabClustPFN的新型AI模型，它通过预先在大量合成数据上学习通用的聚类模式，能够直接对新的表格数据进行快速、准确的自动聚类，无需针对每个新数据集重新训练或调整参数，从而解决了表格数据因特征复杂多变而难以聚类的问题。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要