Abstract:Parameter-efficient fine-tuning (PEFT) has become a practical route for adapting large language models to downstream tasks, with LoRA-style methods being particularly attractive because they are inexpensive to train and easy to deploy. Most LoRA variants, however, revise the update rule within the weight space of each layer and leave the intermediate representations formed by deeper layers largely unused. We propose Echo-LoRA, a cross-layer representation injection method for parameter-efficient fine-tuning. During training, Echo-LoRA collects boundary hidden states from deeper source layers, aggregates them into a sample-level echo representation, and uses lightweight projection and gating networks to inject the resulting signal into shallow LoRA or DoRA modules. Answer-only masking, masked distillation, and stochastic routing are used to keep this auxiliary path stable and to reduce the gap between training and inference. On eight commonsense reasoning benchmarks, Echo-LoRA exceeds the reported LoRA baselines by 5.7 percentage points on average across LLaMA-7B, LLaMA2-7B, and LLaMA3-8B. Under reproduced LoRA baselines in our unified implementation, the average gain is 3.0 points; when combined with DoRA, the gain is 2.7 points. The Echo path is discarded after training, so the deployed model keeps the original low-rank LoRA/DoRA form and adds neither inference-time parameters nor inference computation.
| Subjects: | Machine Learning (cs.LG); Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2605.08177 [cs.LG] |
| (or arXiv:2605.08177v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2605.08177 arXiv-issued DOI via DataCite |
Submission history
From: Peng Jin [view email]
[v1]
Tue, 5 May 2026 08:26:11 UTC (2,809 KB)
