Abstract:Geographic text, or textual data rich in geographic (geo-) information is a valuable source for various geographic applications, e.g., tourism management. Making such information accessible to speakers of other languages further enhances its utility; thus, accurate machine translation (MT) is essential for equity in multilingual geo-information access. To facilitate in-depth analysis for geographic text, we introduce ATD-Trans, a geographically grounded Japanese--English travelogue translation dataset, which enables evaluation of MT quality at both the overall and geo-entity levels across domestic (within Japan) and overseas regions. Our experiments on existing language models examine two factors: model language focus and geographic regions. The results highlight advantages of Japanese-enhanced models and greater difficulty in translating domestic-region geo-entities mentioned in travel blogs.
| Subjects: | Computation and Language (cs.CL) |
| Cite as: | arXiv:2605.12933 [cs.CL] |
| (or arXiv:2605.12933v1 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2605.12933 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Shohei Higashiyama [view email]
[v1]
Wed, 13 May 2026 03:11:54 UTC (113 KB)
