Abstract
Large language models (LLMs) have advanced geospatial artificial intelligence; however, geospatial knowledge-base question answering (GeoKBQA) remains underdeveloped. Prior systems have relied on handcrafted rules and have omitted the splitting of datasets into training, validation, and test sets, thereby hindering fair evaluation. To address these gaps, we propose a prompt-based multi-agent LLM framework (based on GPT-4o) that translates natural-language questions into executable GeoSPARQL. The architecture comprises an intent analyzer, multi-grained retrievers that ground concepts and properties in the OSM tagging schema and map geospatial relations to the GeoSPARQL/OGC operator inventory, an operator-aware intermediate representation aligned with SPARQL/GeoSPARQL 1.1, and a query generator. Our approach was evaluated on the GeoKBQA test set using 20 few-shot exemplars per agent. It achieved 85.49 EM (GPT-4o) with less supervision than fine-tuned baselines trained on 3574 instances and substantially outperformed a single-agent GPT-4o prompt. Additionally, we evaluated GPT-4o-mini, which achieved 66.74 EM in a multi-agent configuration versus 47.10 EM with a single agent. The observations showed that the multi-agent gain was higher for the larger model. Our results indicate that, beyond scale, the framework’s structure is important; thus, principled agentic decomposition yields a sample-efficient, execution-faithful path beyond template-centric GeoKBQA under a fair, hold-out evaluation protocol.