Abstract:Large Language Models (LLMs) provide flexible natural language processing capabilities, while knowledge graphs (KGs) offer explicit and structured knowledge. Integrating these two in a complementary manner enables the development of reliable and verifiable AI systems. In particular, knowledge graph question answering (KGQA) has attracted attention as a means to reduce LLM hallucinations and to leverage knowledge beyond the training data. However, existing KGQA benchmark datasets are biased toward encyclopedic knowledge, limited to a single modality, and lack fine-grained spatiotemporal data, which limits their applicability to real-world scenarios targeted by Embodied AI. We introduce HOME-KGQA, a novel KGQA benchmark dataset built on a multimodal KG of daily household activities. HOME-KGQA consists of complex, multi-hop natural language questions paired with graph database query languages. Compared to existing benchmarks, it includes more challenging questions that involve multi-level spatiotemporal reasoning, multimodal grounding, and aggregate functions. Experimental results show that the LLM-based KGQA methods fail to achieve performance comparable to that on existing datasets when evaluated on HOME-KGQA. This highlights significant challenges that should be addressed for the real-world deployment of KGQA systems. Our dataset is available at this https URL
| Comments: | 12 pages, 4 figures, 7 tables, accepted at LREC2026 |
| Subjects: | Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Databases (cs.DB); Multimedia (cs.MM) |
| ACM classes: | H.3.3; H.2.8; I.2.4; I.2.7 |
| Cite as: | arXiv:2605.09348 [cs.CL] |
| (or arXiv:2605.09348v1 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2605.09348 arXiv-issued DOI via DataCite (pending registration) |
|
| Related DOI: | https://doi.org/10.63317/25xhew5rnydb
DOI(s) linking to related resources |
Submission history
From: Shusaku Egami [view email]
[v1]
Sun, 10 May 2026 06:00:29 UTC (496 KB)
