Abstract:In many real-world settings, machine learning models and interactive systems have access to both structured knowledge, e.g., knowledge graphs or tables, and unstructured content, e.g., natural language documents. Yet, most rely on either. Semi-Structured Knowledge Bases (SKBs) bridge this gap by linking unstructured content to nodes within structured data. In this work, we present Autofocus-Retriever (AF-Retriever), a modular framework for SKB-based, multi-hop question answering. It combines structural and textual retrieval through novel integration steps and optimizations, achieving the best zero- and one-shot results across all three STaRK QA benchmarks, which span diverse domains and evaluation metrics. AF-Retriever's average first-hit rate surpasses the second-best method by 32.1%. Its performance is driven by (1) leveraging exchangeable large language models (LLMs) to extract entity attributes and relational constraints for both parsing and reranking the top-k answers, (2) vector similarity search for ranking both extracted entities and final answers, (3) a novel incremental scope expansion procedure that prepares for the reranking on a configurable amount of suitable candidates that fulfill the given constraints the most, and (4) a hybrid retrieval strategy that reduces error susceptibility. In summary, while constantly adjusting the focus like an optical autofocus, AF-Retriever delivers a configurable amount of answer candidates in four constraint-driven retrieval steps, which are then supplemented and ranked through four additional processing steps. An ablation study and a detailed error analysis, including a comparison of three different LLM reranking strategies, provide component-level insights. The source code is available at this https URL .
| Subjects: | Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL) |
| Cite as: | arXiv:2505.09246 [cs.IR] |
| (or arXiv:2505.09246v4 [cs.IR] for this version) | |
| https://doi.org/10.48550/arXiv.2505.09246 arXiv-issued DOI via DataCite |
|
| Journal reference: | Transactions on Machine Learning Research 2026 |
Submission history
From: Derian Boer [view email]
[v1]
Wed, 14 May 2025 09:35:56 UTC (406 KB)
[v2]
Mon, 12 Jan 2026 20:38:50 UTC (604 KB)
[v3]
Wed, 14 Jan 2026 14:49:17 UTC (604 KB)
[v4]
Thu, 14 May 2026 11:26:44 UTC (1,582 KB)
