Abstract:Continual learning in multimodal large language models (MLLMs) aims to sequentially acquire knowledge while mitigating catastrophic forgetting, yet existing methods face inherent limitations: architecture-based approaches incur additional computational overhead and often generalize poorly to new tasks, rehearsal-based methods rely on storing historical data, raising privacy and storage concerns, and conventional regularization-based strategies alone are insufficient to fully prevent parameter interference. We propose Octopus, a two-stage continual learning framework based on History-Free Gradient Orthogonalization (HiFGO), which enforces gradient-level orthogonality without historical task data. Our proposed two-stage finetuning strategy decouples task adaptation from regularization, achieving a principled balance between plasticity and stability. Experiments on UCIT show that Octopus establishes state-of-the-art performance, surpassing prior SOTA by 2.14% and 6.82% in terms of Avg and Last.
| Subjects: | Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV) |
| Cite as: | arXiv:2605.14938 [cs.LG] |
| (or arXiv:2605.14938v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2605.14938 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Yuehao Liu [view email]
[v1]
Thu, 14 May 2026 15:13:24 UTC (1,636 KB)
