Authors:Jie-Jing Shao, Bo-Wen Zhang, Xiao-Wen Yang, Baizhi Chen, Si-Yu Han, Jinghao Pang, Wen-Da Wei, Guohao Cai, Zhenhua Dong, Lan-Zhe Guo, Yu-Feng Li
Abstract:Travel planning stands out among real-world applications of \emph{Language Agents} because it couples significant practical demand with a rigorous constraint-satisfaction challenge. However, existing benchmarks primarily operate on a slot-filling paradigm, restricting agents to synthetic queries with pre-defined constraint menus, which fails to capture the open-ended nature of natural language interaction, where user requirements are compositional, diverse, and often implicitly expressed. To address this gap, we introduce \emph{ChinaTravel}, with four key contributions: 1) a practical sandbox aligned with the multi-day, multi-POI travel planning, 2) a compositionally generalizable domain-specific language (DSL) for scalable evaluation, covering feasibility, constraint satisfaction, and preference comparison 3) an open-ended dataset that integrates diverse travel requirements and implicit intent from 1154 human participants, and 4) fine-grained analysis reveal the potential of neuro-symbolic agents in travel planning, achieving a 37.0% constraint satisfaction rate on human queries, a 10 \times improvement over purely neural models, yet highlighting significant challenges in compositional generalization. Overall, ChinaTravel provides a foundation for advancing language agents through compositional constraint validation in complex, real-world planning scenarios. Project Page: this https URL
| Comments: | ICLR 2026. Webpage: this https URL |
| Subjects: | Artificial Intelligence (cs.AI); Computation and Language (cs.CL) |
| Cite as: | arXiv:2412.13682 [cs.AI] |
| (or arXiv:2412.13682v5 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2412.13682 arXiv-issued DOI via DataCite |
Submission history
From: Jie-Jing Shao [view email]
[v1]
Wed, 18 Dec 2024 10:10:12 UTC (8,471 KB)
[v2]
Fri, 20 Dec 2024 15:08:25 UTC (8,471 KB)
[v3]
Fri, 30 May 2025 13:35:50 UTC (14,181 KB)
[v4]
Sat, 6 Sep 2025 01:26:12 UTC (14,181 KB)
[v5]
Wed, 29 Apr 2026 16:45:39 UTC (24,128 KB)
