Learning with Shallow Neural Networks on Cluster-Structured Features

View PDF HTML (experimental)

Abstract:The success of deep learning in high-dimensional settings is often attributed to the presence of low-dimensional structure in real-world data. While standard theoretical models typically assume that this structure lies in the target function, projecting unstructured inputs onto a low-dimensional subspace, data such as images, text or genomic sequences exhibit strong spatial correlations within the input space itself. In this paper, we propose a tractable model to study how these correlations affect the sample complexity of learning with gradient descent on shallow neural networks. Specifically, we consider targets that depend on a small number of latent Boolean variables, and input features grouped into clusters and correlated with the latent variables. Under an identifiability assumption, we show that for a layerwise gradient-descent variant, the sample complexity scales with the number of hidden variables and, when the signal-to-noise ratio is sufficiently high, is independent of the input dimension, up to logarithmic terms. We empirically test our theoretical findings on both synthetic and real data.

Comments:	10 pages main body, 2 figures
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2605.14927 [cs.LG]
	(or arXiv:2605.14927v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2605.14927 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Elisabetta Cornacchia [view email]
[v1] Thu, 14 May 2026 15:02:24 UTC (828 KB)