When Knowledge Is Not Free: Cost-Aware Evidence Selection in Retrieval-Augmented Generation

View PDF HTML (experimental)

Abstract:Retrieval-Augmented Generation (RAG) typically assumes that external knowledge is free, but many high-quality sources are paywalled, licensed, restricted, or otherwise costly to access. We introduce cost-aware RAG, a setting where retrieved evidence is assigned access-cost tiers and systems must answer under an explicit evidence-access budget. We instantiate this setting by augmenting MS MARCO v2.1 with access-friction tiers and evaluate budgeted evidence selection across general-domain and domain-specific QA benchmarks. Our results show that static selection is brittle: no fixed selector uniformly dominates, and larger budgets do not reliably improve answer quality, even when costly evidence is domain-matched. We then study agentic cost-aware RAG, where an LLM decides when to retrieve, which tier to access, and when to stop. Agents show strong promise as adaptive evidence-acquisition controllers, but their behavior remains highly model- and task-dependent. These findings suggest that cost-aware evidence acquisition is a central challenge for the next generation of RAG systems. All code and data are available at this https URL.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2606.02245 [cs.CL]
	(or arXiv:2606.02245v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2606.02245 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Mingyan Wu [view email]
[v1] Mon, 1 Jun 2026 13:39:39 UTC (3,473 KB)