Abstract:In commercial web search, aligning content freshness with user intent remains challenging due to the highly varied lifespans of information. Traditional industrial approaches rely on static time-window filtering, resulting in "one-size-fits-all" rankings where content may be chronologically recent but semantically expired. To address the limitation, we present a novel Large Language Models (LLMs)-based Query-Aware Dynamic Content Expiration Prediction Framework deployed in Baidu search, reformulating timeliness as a dynamic validity inference task. Our framework extracts fine-grained temporal contexts from documents and leverages LLMs to deduce a query-specific "validity horizon"-a semantic boundary defining when information becomes obsolete based on user intent. Integrated with robust hallucination mitigation strategies to ensure reliability, our approach has been evaluated through offline and online A/B testing on live production traffic. Results demonstrate significant improvements in search freshness and user experience metrics, validating the effectiveness of LLM-driven reasoning for solving semantic expiration at an industrial scale.
| Comments: | Accepted at SIGIR 2026. Final version: this https URL |
| Subjects: | Information Retrieval (cs.IR); Computation and Language (cs.CL) |
| Cite as: | arXiv:2605.13052 [cs.IR] |
| (or arXiv:2605.13052v1 [cs.IR] for this version) | |
| https://doi.org/10.48550/arXiv.2605.13052 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Li Gao [view email]
[v1]
Wed, 13 May 2026 06:20:28 UTC (1,934 KB)
