Abstract:In many business settings, task-specific labeled data are scarce or costly to obtain, limiting supervised learning on a target task. A classical response is transfer learning (TL). Many TL works study how to transfer information from related sources. We study, for linear regression and classification, when to transfer via sample sharing: in a multi-source setting, we greedily decide from which sources and how many samples to incorporate into the target dataset. Our method uses an accept/reject rule based on a data-dependent estimate of the transfer gain, i.e the marginal decrease in target predictive error, computed conditionally on the observed target samples. We analyze our approach and show that how the derived statistical test enforces positive transfer with high probability. Under additional standard conditions, we also study the transfer gain itself and characterize when transfer is beneficial. Experiments on synthetic and real data show consistent gains over classical and recent strong baselines while avoiding negative transfer.
| Subjects: | Machine Learning (stat.ML); Machine Learning (cs.LG); Other Statistics (stat.OT) |
| Cite as: | arXiv:2510.16986 [stat.ML] |
| (or arXiv:2510.16986v2 [stat.ML] for this version) | |
| https://doi.org/10.48550/arXiv.2510.16986 arXiv-issued DOI via DataCite |
Submission history
From: Hamza Cherkaoui PhD [view email]
[v1]
Sun, 19 Oct 2025 20:03:48 UTC (130 KB)
[v2]
Wed, 13 May 2026 08:20:57 UTC (161 KB)
