Abstract:Decision-focused learning trains predictive models end-to-end against downstream decision loss, but online settings suffer delayed feedback: outcomes may not arrive for many environment interactions. We identify \emph{staleness amplification}, a failure mode unique to bilevel optimization under delay, in which gradient staleness couples with inner-solver sensitivity to inflate regret beyond single-level delay theory. We prove that any black-box delayed optimizer incurs an irreducible regret cost from inner-solver approximation error, and that gradient staleness contributes a quadratically growing transport error without bilevel-aware correction. Our algorithm, \textbf{IGT-OMD}, applies Implicit Gradient Transport to hypergradients within Online Mirror Descent, re-evaluating stale gradients at the current parameters using stored inner solutions. This method reduces transport error from a quadratic to a linear dependence on delay and achieves the first sublinear regret bound for delayed bilevel optimization with queue-length-adaptive step sizes. Controlled experiments provide a \emph{mechanistic fingerprint}: transport benefit is exactly $0.0\%$ ($p=1.00$) at unit delay and grows monotonically to $9.5\%$ at fifty rounds ($p<0.001$), isolating the correction's effect. On Linear Quadratic Regulator, Warcraft shortest-path, and Sinkhorn optimal transport, IGT-OMD reduces decision loss by $17$--$55\%$ relative to single-level baselines, with phase transitions matching the theory.
| Comments: | 9 pages, 4 figures, NeurIPS 2026 conference |
| Subjects: | Machine Learning (cs.LG) |
| MSC classes: | 68Q25, 90C25, 90C47 |
| ACM classes: | I.2.6; G.1.6 |
| Cite as: | arXiv:2605.12693 [cs.LG] |
| (or arXiv:2605.12693v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2605.12693 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Wesley Marrero [view email]
[v1]
Tue, 12 May 2026 19:43:49 UTC (121 KB)
