Abstract:Clinical dietary assessment can generate detailed but high-dimensional nutrient and food-group information that is difficult to translate quickly into counselling priorities. This paper proposes an explainable unsupervised-to-supervised machine learning framework for discovering, reproducing and interpreting dietary patterns using public UK National Diet and Nutrition Survey data. Adult participants aged 19 years and above from NDNS Years 12-15 were represented using 25 energy-adjusted nutrient and food-group features. K-means, Gaussian Mixture Models and Agglomerative Clustering were compared across k = 2-8, with stability and dietetic interpretability used alongside internal validation metrics. The selected K-means k = 4 solution identified four interpretable dietary patterns: high fat/meat and sodium, higher fibre fruit-vegetable micronutrient, high free-sugar snacks and sugary drinks, and dairy/cereal calcium-rich saturated-fat. A supervised surrogate classifier reproduced held-out cluster membership with high test performance (macro-F1 = 0.963), but was interpreted only as an explanatory surrogate rather than as an independent clinical prediction model. SHAP analysis linked predictions to dietetically meaningful drivers, suggesting potential value for dietitian-in-the-loop assessment, counselling prioritisation and follow-up monitoring.
| Comments: | 12 pages, 6 figures, 9 tables. Accepted by the 14th International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA 2026) |
| Subjects: | Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG) |
| ACM classes: | I.2.6; I.5.3; J.3 |
| Cite as: | arXiv:2605.08242 [q-bio.QM] |
| (or arXiv:2605.08242v1 [q-bio.QM] for this version) | |
| https://doi.org/10.48550/arXiv.2605.08242 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Chun Yin Chiu [view email]
[v1]
Thu, 7 May 2026 09:05:14 UTC (779 KB)
