Abstract:Clustering is widely used for unsupervised structure discovery, yet it offers limited insight into how reliable each individual assignment is. Diagnostics, such as convergence behavior or objective values, may reflect global quality, but they do not indicate whether particular instances are assigned confidently, especially for initialization-sensitive algorithms like k-means. This assignment-level instability can undermine both accuracy and robustness. Ensemble approaches improve global consistency by aggregating multiple runs, but they typically lack tools for quantifying pointwise confidence in a way that combines cross-run agreement with geometric support from the learned cluster structure. This work introduces CAKE (Confidence in Assignments via K-partition Ensembles), a framework that evaluates each point using two complementary statistics computed over a clustering ensemble: assignment stability and consistency of local geometric fit. These are combined into a single, interpretable score in [0,1]. The theoretical analysis shows that CAKE remains effective under noise and separates stable from unstable points. Experiments on synthetic and real-world datasets indicate that CAKE effectively highlights ambiguous points and stable core members, providing a confidence ranking over instances that can be used for selection or prioritization in downstream clustering workflows.
| Comments: | 37 pages, including appendix |
| Subjects: | Machine Learning (cs.LG) |
| Cite as: | arXiv:2602.18435 [cs.LG] |
| (or arXiv:2602.18435v2 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2602.18435 arXiv-issued DOI via DataCite |
|
| Journal reference: | Machine Learning with Applications, Volume 24, 2026, Article 100915 |
| Related DOI: | https://doi.org/10.1016/j.mlwa.2026.100915
DOI(s) linking to related resources |
Submission history
From: Aggelos Semoglou [view email]
[v1]
Fri, 20 Feb 2026 18:59:53 UTC (3,859 KB)
[v2]
Thu, 14 May 2026 14:58:46 UTC (4,207 KB)
