RAG-Enhanced Large Language Models for Dynamic Content Expiration Prediction in Web Search

View PDF HTML (experimental)

Abstract:In commercial web search, aligning content freshness with user intent remains challenging due to the highly varied lifespans of information. Traditional industrial approaches rely on static time-window filtering, resulting in "one-size-fits-all" rankings where content may be chronologically recent but semantically expired. To address the limitation, we present a novel Large Language Models (LLMs)-based Query-Aware Dynamic Content Expiration Prediction Framework deployed in Baidu search, reformulating timeliness as a dynamic validity inference task. Our framework extracts fine-grained temporal contexts from documents and leverages LLMs to deduce a query-specific "validity horizon"-a semantic boundary defining when information becomes obsolete based on user intent. Integrated with robust hallucination mitigation strategies to ensure reliability, our approach has been evaluated through offline and online A/B testing on live production traffic. Results demonstrate significant improvements in search freshness and user experience metrics, validating the effectiveness of LLM-driven reasoning for solving semantic expiration at an industrial scale.

Comments:	Accepted at SIGIR 2026. Final version: this https URL
Subjects:	Information Retrieval (cs.IR); Computation and Language (cs.CL)
Cite as:	arXiv:2605.13052 [cs.IR]
	(or arXiv:2605.13052v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2605.13052 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Li Gao [view email]
[v1] Wed, 13 May 2026 06:20:28 UTC (1,934 KB)