Abstract:Training deep neural networks at scale can benefit from domain decomposition, where the network is split into subdomains trained in parallel and coupled by a global trust-region mechanism. Building on the Additively Preconditioned Trust-Region Strategy (APTS), we propose a non-monotone variant with a nonlinear additive Schwarz preconditioner that combines parallel subdomain corrections with global coarse-space directions. A windowed acceptance criterion allows controlled objective increases, avoiding needless rejection of effective coarse steps. The resulting non-monotone APTS (NAPTS) preserves accuracy while reducing CPU time by 30\% and cutting rejected steps to one third of those in APTS.
| Comments: | 7 pages, 2 figures, |
| Subjects: | Optimization and Control (math.OC); Machine Learning (cs.LG) |
| Cite as: | arXiv:2605.14860 [math.OC] |
| (or arXiv:2605.14860v1 [math.OC] for this version) | |
| https://doi.org/10.48550/arXiv.2605.14860 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Bindi Capriqi [view email]
[v1]
Thu, 14 May 2026 14:06:51 UTC (166 KB)
