Understanding Catastrophic Forgetting In LoRA via Mean-Field Attention Dynamics

View PDF HTML (experimental)

Abstract:Low-Rank Adaptation (LoRA) is the dominant parameter-efficient fine-tuning method due to its favorable compute-performance trade-off, yet it suffers from catastrophic forgetting. We study forgetting through a tractable _mean-field self-attention_ toy model, where tokens evolve as an interacting particle system and LoRA acts as a low-rank perturbation. Using tools from partial differential equations and dynamical systems, we characterize regimes suggesting a phase transition between forgetting and non-forgetting behavior. We show that one phase transition appears with respect to the norm of the perturbation, and the other with respect to the depth of the Transformers. We further bound the time-to-deviation in terms of the perturbation size and spectral quantities, and corroborate the predicted trends with experiments and exploratory analyses on real models under LoRA fine-tuning.

Comments:	New version accepted at ICML 2026, with new results and without previous results
Subjects:	Machine Learning (cs.LG); Dynamical Systems (math.DS); Machine Learning (stat.ML)
Cite as:	arXiv:2402.15415 [cs.LG]
	(or arXiv:2402.15415v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2402.15415 arXiv-issued DOI via DataCite

Submission history

From: Hugo Koubbi [view email]
[v1] Fri, 23 Feb 2024 16:26:01 UTC (16,763 KB)
[v2] Wed, 13 May 2026 16:26:02 UTC (6,719 KB)