Low-Rank Adapters Initialization via Gradient Surgery for Continual Learning

View PDF HTML (experimental)

Abstract:LoRA is widely adopted for continual fine-tuning of Large Language Models due to its parameter efficiency, modularity across tasks, and compatibility with replay strategies. However, LoRA-based continual learning remains vulnerable to catastrophic forgetting, whose severity depends on how successive task gradients interact: when consecutive task gradients conflict, standard adapter initializations channel updates into subspaces that overwrite previously learned directions. We propose SLICE, a gradient-surgery-based initialization for LoRA adapters in continual learning. SLICE accumulates gradients from both the current task and a replay buffer of prior tasks, reconciles them through a projection operator, and decomposes the result via truncated SVD to initialize the adapter weights. We evaluate SLICE on the TRACE benchmark and sequences of Super-NI tasks, including a set of adversarial Super-NI sequences that we construct by mining task pairs with maximally opposing gradients. Compared to vanilla LoRA, LoRA-GA, and LoRAM, SLICE consistently achieves a better stability-plasticity trade-off, improving Average Performance, Final Performance and Forgetting metrics while preserving General Performance and In Context Performance across both standard and adversarial continual learning sequences.

Subjects:	Machine Learning (cs.LG)
ACM classes:	I.2.6
Cite as:	arXiv:2605.12752 [cs.LG]
	(or arXiv:2605.12752v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2605.12752 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Lucas Kupssinskü [view email]
[v1] Tue, 12 May 2026 21:06:03 UTC (248 KB)