Abstract:This work investigates the use of large language models (LLMs) for tasks in smart cities. The core idea is to leverage remote sensing imagery to characterize the built environment, including design suggestions, constructability assessment, landuse patterns, and risk identification. We examine remote sensing imagery at multiple spatial scales as inputs for multimodal language modeling and evaluate their effects on built-environment-related reasoning. In addition, we compare state-of-the-art LLMs, including InternVL and Qwen, in terms of accuracy and reliability when generating built environment recommendations. The results demonstrate the potential of integrating remote sensing imagery with large language models to assist smart cities and decision-making.
| Comments: | Published in the International Conference on Industrialized Construction 2026 |
| Subjects: | Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Emerging Technologies (cs.ET) |
| Cite as: | arXiv:2605.08404 [cs.CL] |
| (or arXiv:2605.08404v1 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2605.08404 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Dongdong Wang [view email]
[v1]
Fri, 8 May 2026 19:10:30 UTC (259 KB)
