TabClustPFN: A Prior-Fitted Network for Tabular Data Clustering

View PDF

Abstract:Clustering tabular data is a fundamental yet challenging problem due to heterogeneous feature types, diverse data-generating mechanisms, and the absence of transferable inductive biases across datasets. Prior-fitted networks (PFNs) have recently demonstrated strong generalization in supervised tabular learning by amortizing Bayesian inference under a broad synthetic prior. Extending this paradigm to clustering is nontrivial: clustering is unsupervised, admits a combinatorial and permutation-invariant output space, and requires inferring the number of clusters. We introduce TabClustPFN, a prior-fitted network for tabular data clustering that performs amortized Bayesian inference over both cluster assignments and cluster cardinality. Pretrained on synthetic datasets drawn from a flexible clustering prior, TabClustPFN clusters unseen datasets in a single forward pass, without dataset-specific retraining or hyperparameter tuning. The model naturally handles heterogeneous numerical and categorical features and adapts to a wide range of clustering structures. Experiments on synthetic data and curated real-world tabular benchmarks show that TabClustPFN outperforms classical, deep, and amortized clustering baselines, while exhibiting strong robustness in out-of-the-box exploratory settings. Code is available at this https URL.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2601.21656 [cs.LG]
	(or arXiv:2601.21656v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2601.21656 arXiv-issued DOI via DataCite

Submission history

From: Qiong Zhang [view email]
[v1] Thu, 29 Jan 2026 12:56:41 UTC (5,185 KB)
[v2] Fri, 30 Jan 2026 07:18:19 UTC (10,372 KB)
[v3] Thu, 14 May 2026 04:07:25 UTC (5,268 KB)