Abstract:Context: Retrieval-augmented code generation relies on cross-file repository context, but retrieved snippets may come from obsolete project states.
Objectives: We study whether temporally stale repository snippets act as harmless noise or actively induce current-state-incompatible code.
Methods: We conduct a controlled diagnostic study on a curated 17-sample set of production-helper signature changes from five Python repositories. For each sample, we compare current-only, stale-only, no-retrieval, and mixed current/stale retrieval conditions under prompts that hide commit freshness and expected current signatures.
Results: Under neutralized prompts, stale-only retrieval induces stale helper references on 15/17 Qwen2.5-Coder-7B-Instruct samples and 13/17 gpt-4.1-mini samples, corresponding to 88.2 and 76.5 percentage-point increases over current-only retrieval. No retrieval produces zero stale references but only 1/17 passing completions. The two models share 75.0% Jaccard overlap among stale-triggering samples, and mixed conditions show that adding valid current evidence largely rescues stale-only failures.
Conclusion: Temporal validity of retrieved repository context is a distinct diagnostic variable for Code RAG robustness: stale context can actively bias models toward obsolete repository state rather than merely removing useful evidence.
| Comments: | 31 pages, 2 tables. Submitted to Information and Software Technology (Elsevier) |
| Subjects: | Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL) |
| ACM classes: | D.2.5; D.2.7; I.2.7 |
| Cite as: | arXiv:2605.14478 [cs.SE] |
| (or arXiv:2605.14478v1 [cs.SE] for this version) | |
| https://doi.org/10.48550/arXiv.2605.14478 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Haojun Weng [view email]
[v1]
Thu, 14 May 2026 07:18:30 UTC (22 KB)
