1. Introduction
Urban areas concentrate population, economic activity, and infrastructure, while simultaneously amplifying exposure to climate-related hazards, environmental stressors, and socio-economic inequalities. Rapid urbanization, in some parts of the Global South, has intensified challenges related to informal settlements, poverty, governance deficits, and uneven access to services. Traditional urban analysis tools often struggle to capture the multidimensional, nonlinear nature of these dynamics.
GeoAI integrates geospatial technologies with machine learning and artificial intelligence, offering new analytical capabilities for processing large volumes of spatial and temporal data, identifying hidden patterns, and generating predictive insights [
1,
2,
3].
Urban systems must be understood as coupled socio-environmental systems in which climate dynamics, demographic change, economic restructuring, and governance processes interact across spatial and temporal scales. In this context, cities are not only drivers of global environmental change but also hotspots of vulnerability, where climate hazards intersect with poverty, inequality, and institutional fragility. The growing frequency of extreme events (heatwaves, floods, droughts, and air pollution episodes) has exposed the limitations of conventional urban planning tools, which often rely on static datasets, linear assumptions, and sectoral approaches. These limitations are evident in rapidly urbanizing regions of the Global South, where informal development, limited data availability, and governance constraints complicate risk assessment and policy implementation. Against this backdrop, GeoAI has emerged as a critical methodological frontier, enabling the integration of heterogeneous data sources, the capture of nonlinear processes, and adaptive, evidence-based urban decision-making.
The integration of GeoAI into urban analytics necessitates deliberate attention to responsible implementation, balancing technological capabilities with ethical imperatives. Recent frameworks emphasize that responsible AI in urban planning transcends abstract ethical principles (beneficence, non-maleficence, autonomy, justice, and explicability) and requires translating these principles into concrete system requirements through iterative evaluation of the fit between tasks, data, users, and technology. The case study of the Building and Establishment Automated Mapper (BEAM) documented critical trade-offs when deploying deep learning for informal settlement mapping: while the system achieved 94% accuracy in detecting building footprints from 2019 aerial imagery, generalizability deteriorated across years due to variations in atmospheric conditions, sensor specifications, and imaging protocols, necessitating costly model retraining [
4]. Shifting the task definition from informal settlement-focused service planning to city-wide building footprint mapping has introduced an algorithmic bias, leading the model to misidentify formal structures and larger buildings in non-informal areas. This process undermines equity in resource allocation decisions [
4].
The aim of this review is threefold: to synthesize the state of the art in GeoAI-driven urban analytics; to critically assess methodological limitations, ethical concerns, and contextual constraints; and to explore how GeoAI can contribute to resilient, inclusive, and livable cities, under climate change and socio-economic stress. This review is framed at the conceptual and analytical levels, examining how GeoAI is reshaping contemporary debates in urban analytics, spatial planning, and policy formulation. The discussion is deliberately limited to applications and perspectives directly relevant to urban decision-making, climate resilience, social inclusion, and governance, with particular emphasis on the uneven conditions that characterize urban contexts in the Global North and the Global South. This review synthesizes representative and influential contributions that highlight dominant trajectories, unresolved tensions, and emerging opportunities at the interface between spatial data science and urban policy. By situating GeoAI within broader institutional, ethical, and socio-spatial frameworks, this review advances a critical understanding of GeoAI as an enabling yet not self-sufficient component of sustainable and equitable urban development, whose societal value ultimately depends on its embedding in planning practices, governance structures, and participatory processes. This review is organized around seven interconnected thematic areas that structure the analysis: urban morphology and building extraction, informal settlement and inequality mapping, urban green infrastructure and environment, climate risk and urban resilience, human mobility and transport systems, socio-economic indicators and livability, and governance (
Table 1).
Recent advances in artificial intelligence include the rapid emergence of large language models (LLMs) and multimodal foundation models that integrate text, imagery, and structured data [
5]. While these approaches are relevant to urban studies (in policy analysis, document processing, and human AI interaction), they are not the primary focus of this review. This delimitation is intentional. The primary emphasis of this paper is on spatially explicit GeoAI methods that operate directly on georeferenced data and generate map-based or quantitative spatial outputs for urban planning and decision support [
6,
7]. In current urban analytics practice, LLMs mainly function as interpretive or orchestration layers, not as the main engines of spatial inference. They are acknowledged as an emerging direction, not a replacement for convolutional, graph-based, or ensemble learning approaches, and a systematic review of these methods lies beyond the intended scope of this article.
Despite the rapid growth of GeoAI applications in urban studies, existing review papers tend to focus either on methodological innovation in isolation (e.g., algorithmic performance, benchmarking, or architectural comparisons) or on highly specific application domains, such as disaster risk mapping, land use classification, or smart city infrastructure. Recent reviews emphasize technical advances in deep learning for remote sensing or urban form extraction, or survey spatially explicit GeoAI applications in urban geography [
8,
9,
10], yet often remain detached from the realities of spatial planning practice, governance constraints, and decision-making processes. Relatively few reviews address the uneven conditions under which GeoAI is deployed, including data inequality, institutional capacity, and ethical governance.
This review addresses these gaps by adopting a decision-oriented and policy-aware perspective on GeoAI in urban analytics. Providing an exhaustive technical comparison of models, this paper synthesizes how different GeoAI paradigms (deep learning, graph-based learning, and ensemble methods) contribute to spatial decision-making, climate resilience, and social inclusion in urban contexts. The novelty of this work lies in its integrative framing: GeoAI is examined not only as a set of analytical techniques, but as a socio-technical component embedded within planning systems and governance structures. By systematically linking methodological advances to planning relevance and contextual constraints, this review offers a contribution that complements existing technical surveys while directly addressing the needs of urban planners, policymakers, and decision-makers.
The remainder of this paper is structured as follows.
Section 2 presents the review methodology, introduces the thematic synthesis framework (
Table 1), and reviews GeoAI applications across two major methodological streams: deep learning methods (
Section 2.1) and ensemble and tree-based methods (
Section 2.2).
Section 3 examines cross-cutting challenges, including data inequality and the Global South, model interpretability and trust, and ethical and governance concerns.
Section 4 identifies opportunities for inclusive and fair urban development, including addressing poverty and social vulnerability, ensuring fair and transparent urban policies, promoting livability and well-being, supporting human mobility, and enhancing urban resilience.
Section 5 advances future research directions structured around the seven thematic areas identified in
Table 1, grounding each recommendation in the specific gaps documented by the reviewed literature.
Section 6 concludes the paper.
2. GeoAI Applications in Urban Analytics
This study adopts a structured narrative review approach, positioned between a fully systematic review and a conceptual synthesis. The objective is not to exhaustively catalog all GeoAI publications, but to synthesize a representative and influential body of research that illuminates methodological trends and governance implications of GeoAI in urban analytics. This approach is suited to an interdisciplinary field such as GeoAI, where rapid methodological evolution, heterogeneous applications, and diverse disciplinary traditions complicate the pursuit of strict, systematic aggregation. The literature search was conducted using major academic databases, including Web of Science Core Collection, Scopus, and Google Scholar, from October 2025 to February 2026. These databases were selected to ensure coverage of peer-reviewed journals across urban studies, remote sensing, geoinformatics, and planning science. The search strategy combined topic-specific terms (e.g., “GeoAI,” “urban analytics,” “deep learning AND urban,” “graph neural network AND city,” “ensemble learning AND land cover”) with application-oriented keywords (e.g., “spatial planning,” “climate resilience,” “informal settlement mapping”). For the initial search, covering publications from 2017 to early 2025, 45 publications were selected for detailed synthesis, supplemented by 10 foundational references for a total of 55 publications.
This review is subject to several intrinsic limitations. First, despite the structured search strategy, the rapidly evolving nature of GeoAI implies that some recent preprints or emerging applications may not be fully captured. Second, this review prioritizes peer-reviewed research published in English, which may underrepresent locally grounded studies, technical reports, or planning documents. Third, the emphasis on interpretive judgment in assessing the applicability of planning may vary across institutional and cultural contexts. Finally, this review does not aim to provide a quantitative meta-analysis of model performance, as differences in datasets, metrics, and experimental setups limit direct comparability. These limitations are acknowledged as part of the trade-off between methodological breadth and analytical depth. To provide a structured overview of the reviewed literature,
Table 1 synthesizes the main thematic areas addressed in the selected studies, the dominant GeoAI methods employed, the primary data sources, and the main planning implications.
2.1. Deep Learning Methods
2.1.1. CNN-Based Segmentation for Urban Features
Convolutional Neural Networks (CNNs) have emerged as foundational tools in GeoAI for urban studies, enabling automated mapping of buildings and land cover from high-resolution imagery. Modern approaches predominantly frame these tasks as semantic segmentation problems, wherein each pixel is classified as building, road, vegetation, or other urban features [
8]. Early fully convolutional networks (FCNs) introduced end-to-end segmentation architectures, though they exhibited limited receptive fields. This limitation prompted the development of advanced encoder–decoder architectures, U-Net, which employs symmetric downsampling and upsampling with skip connections to preserve spatial context [
11].
U-Net has achieved widespread adoption in urban feature extraction, consistently outperforming traditional machine learning classifiers such as Random Forests. For instance, a U-Net model achieved approximately 22 percentage points higher accuracy than Random Forest in classifying urban land cover from crewless aerial vehicle (UAV) imagery, underscoring the importance of deep architectures in capturing spatial context [
12]. CNN-based approaches have enabled accurate delineation of urban features, including buildings and roads, in satellite and aerial imagery, with pixel-level accuracy in multi-class urban segmentation tasks [
13].
Quantitative evaluation of segmentation quality relies on standard metrics, including Intersection over Union (IoU) and F1-score, which are measured against ground-truth maps. Contemporary studies report IoU scores of 0.8–0.9 for building extraction on benchmark datasets, reflecting substantial performance gains achieved by deep learning relative to earlier computational approaches [
13].
Earlier spatial metric approaches to urban analysis provide an important conceptual foundation for contemporary GeoAI-driven urban analytics. A representative example is the functional fragmentation framework developed for the city of Craiova, which operationalized urban morphology using relative and absolute functional fragmentation indices derived from satellite imagery and GIS-based spatial metrics. By decomposing urban space into functional zones and quantifying their spatial configuration, this approach demonstrated how functional imbalance and excessive fragmentation can reduce urban competitiveness and accessibility at both site and city scales. These methods relied on classical spatial metrics rather than machine learning; they anticipated several core objectives of GeoAI, namely the systematic extraction of spatial structure from remotely sensed data and the translation of spatial patterns into planning-relevant indicators. In this sense, early fragmentation indices can be interpreted as precursors to current GeoAI models that automate the detection of functional zones, urban form discontinuities, and land use heterogeneity, while extending their analytical capacity through scalability, handling of nonlinearity, and data fusion across multiple spatial resolutions [
14].
Earlier GIS- and machine learning-based frameworks for large-scale spatial assessment already demonstrated the feasibility of semi-automated, indicator-driven decision support in urban and environmental planning contexts. For example, national-scale spatial analyses integrating remote sensing, GIS, and machine learning classifiers were used to prioritize ecosystem services, biodiversity protection, and policy interventions across heterogeneous territorial contexts. While these approaches did not rely on deep neural architectures, they anticipated core objectives of contemporary GeoAI, including multi-source data integration, spatial prioritization, and policy mapping, thereby providing an empirical foundation for current deep learning-based urban analytics [
15].
The GeoAI pipeline for urban analytics is structured into three interrelated layers that reflect the progression from data processing to decision support (
Figure 1). Circular nodes represent core analytical methods and learning paradigms, including convolutional neural networks, graph-based models, and ensemble learning techniques, which constitute the primary engines for spatial data processing and pattern extraction. Square nodes denote potential and commonly used data sources and intermediate representations, including high-resolution optical imagery, LiDAR-derived height models, multispectral indices, and segmented urban features, that provide the empirical foundation for GeoAI models. Triangular nodes indicate decision-oriented outputs and planning-relevant deliverables, including building footprints, risk maps, livability indicators, and policy support tools, which translate analytical results into actionable knowledge for urban governance. The connectivity among nodes emphasizes functional relationships besides a strictly linear sequence: methods can ingest multiple data sources, data representations may support different analytical approaches, and a single model output can feed several decision-making applications. The central positioning and high connectivity of the “building footprints” deliverable underscore its role as a key integrative product that links diverse GeoAI methods to a wide range of urban planning and management tasks. The figure illustrates GeoAI’s pipeline philosophy as a data and modular ecosystem in which methodological choices, data availability, and policy needs are dynamically interconnected without a hierarchical or sequential order.
2.1.2. Building Footprint Extraction and Boundary Delineation
Building footprint extraction is a central application of GeoAI in urban studies, and recent research emphasizes methodologies to improve boundary accuracy. Standard CNN models, including U-Net, SegNet, and FCN, sometimes produce blobby outputs with smoothed edges, prompting researchers to refine architectures specifically for sharper building boundaries [
16].
One methodological advancement involves incorporating boundary-aware modules into CNN architectures. Li et al. [
16] introduced an Attraction Field Module (AFM) within a CNN to model and preserve building edges, yielding improved boundary delineation compared with baseline FCN and SegNet models. A quantitative evaluation by the same authors shows that integrating the Attraction Field Module improves building boundary delineation by approximately 6–7 percentage points in F1-score and approximately 8–10 percentage points in IoU relative to baseline FCN and SegNet architectures on high-resolution urban imagery. These gains are primarily attributable to improved edge localization and reduced boundary smoothing, demonstrating the effectiveness of modeling attraction vectors toward building contours. This approach addresses the common issue of boundary smoothing in standard segmentation networks.
An alternative innovative direction employs Graph Convolutional Networks (GCNs) to capture spatial relationships beyond the conventional pixel grid. Shi et al. [
17] proposed a Gated Graph CNN (GGCN) with deep structured feature embedding to better model connectivity among urban structures. This graph-based approach demonstrated substantially higher performance than conventional CNNs on the ISPRS Potsdam dataset and generated sharper footprint boundaries by refining weak predictions via gating mechanisms. On the ISPRS Potsdam dataset, the GGCN achieved strong performance, outperforming U-Net and SegNet by several percentage points in F1-score and IoU in dense urban blocks where long-range spatial dependencies are critical [
17].
These architectural advances directly address the fundamental challenge of precise boundary delineation in dense urban environments. Modern building extraction models now produce vector-ready footprints with accuracy and precision approaching those of manual digitization, a development crucial for applications such as cadastral mapping and three-dimensional city modeling [
16].
In addition to architectural modifications, recent studies emphasize the role of boundary-aware loss functions in improving building footprint delineation. Standard pixel-wise losses, such as cross-entropy, tend to under-penalize boundary errors, leading to smoothed or fragmented contours. To address this limitation, researchers incorporate Dice-based losses, boundary IoU losses, or topology-aware constraints that emphasize edge fidelity during training. These loss formulations improve geometric accuracy without increasing architectural complexity and have been shown to enhance vector-ready footprint extraction in dense urban settings. Contemporary best practices combine boundary-aware network modules with boundary-sensitive loss functions, such as Dice loss, boundary IoU loss, or topology-aware losses. For example, AFM-based architectures trained with a combined cross-entropy and boundary IoU loss report improvements of approximately 3–5% in boundary F1-score relative to pixel-wise cross-entropy alone, yielding sharper and more cartographically consistent building outlines in dense urban environments.
2.1.3. Advanced Segmentation Architectures: U-Net Variants and DeepLabv3+
Research in urban GeoAI has embraced advanced CNN architectures originally developed for general computer vision, adapted specifically for overhead imagery. Encoder–decoder networks remain popular, with numerous variants building upon the U-Net foundation. Nested architectures, such as U-Net++, introduce dense skip connections to refine multi-scale feature fusion and have been successfully tested for building extraction applications [
13].
DeepLabv3+ has gained substantial traction in urban segmentation tasks due to its Atrous Spatial Pyramid Pooling (ASPP) module, which effectively captures multi-scale context. By extending the receptive field without sacrificing resolution, DeepLabv3+ proves effective for complex urban scenes containing objects of varying sizes [
18].
In a study on building footprint mapping using the Inria dataset, a DeepLabv3+ model with a ResNet-50 backbone achieved the best IoU (≈0.90) among compared architectures (U-Net, U-Net++), outperforming them by approximately 2–4 percentage points [
13].
Transfer learning is a critical component of modern urban segmentation pipelines, in which model weights are initialized using parameters pre-trained on large-scale image datasets such as COCO or ImageNet. This approach improves segmentation accuracy and convergence while mitigating the risk of overfitting when training data are limited. Models such as DeepLabv3, pre-trained on large-scale image datasets and subsequently fine-tuned on urban data, not only achieve higher IoU values but also demonstrate improved generalization with limited training samples [
13]. The use of pre-trained encoders, typically based on ResNet or VGG architectures, has become standard practice in urban image segmentation, providing consistent feature representations for artificial structures while reducing dependence on massive labeled datasets.
DeepLab’s use of atrous (dilated) convolutions and skip connections, combined with fine-tuning strategies, has proven effective for segmenting informal settlements with complex urban morphology. In a multi-city study covering Dhaka (Bangladesh), Nairobi (Kenya), and Lagos (Nigeria), DeepLabv3+ achieved pixel-level accuracies of 90–96% for informal settlement mapping, demonstrating strong performance across heterogeneous urban morphologies in the Global South. However, representativeness remains constrained to large metropolitan contexts [
12]. Its capacity to parse multi-scale context, from narrow alleys to large buildings, positioned it as the top-performing architecture for mapping irregular settlement layouts. These results demonstrate how CNN architectures have been adapted to address the multifaceted challenges of urban feature segmentation, delivering high accuracy across applications ranging from slum mapping to urban land use classification.
While DeepLabv3+ remains a strong and widely adopted benchmark architecture for urban semantic segmentation, recent comparative studies indicate that it no longer represents the absolute state of the art across all urban contexts. Architectures incorporating attention mechanisms, high-resolution feature preservation, or hybrid CNN Transformer designs have demonstrated superior performance on specific datasets, particularly in scenarios involving complex roof geometries or dense informal settlements. In practice, DeepLabv3+ is best positioned as a reference model whose performance can be exceeded through task-specific architectural enhancements instead of a driver of technocratic governance.
2.1.4. Attention Mechanisms and Enhancement Strategies
To further enhance CNN performance in urban applications, researchers have integrated attention mechanisms and other novel modules into deep networks. Attention modules, such as the Convolutional Block Attention Module (CBAM), enable networks to focus on the most relevant spatial and channel features for specific tasks. Jonnala et al. [
19] developed an attention-augmented U-Net (DSIA U-Net) incorporating deep–shallow feature interaction and CBAM blocks to improve water body segmentation in urban scenes. The inclusion of attention gates yielded notable gains in segmentation accuracy and computational efficiency, specifically enhancing water boundary delineation and reducing false positives in spectrally mixed areas such as urban shadows erroneously classified as water.
While this particular application focuses on water bodies, the underlying technique is broadly generalizable: attention mechanisms allow urban segmentation models to better differentiate target classes (buildings, roads, vegetation) from confounding backgrounds by dynamically weighting relevant feature maps. Similarly, gating mechanisms have demonstrated utility beyond graph networks. Channel and spatial–attention mechanisms within CNN blocks can dynamically recalibrate features, providing substantial utility in resolving inter-class confusions common in urban imagery, such as distinguishing buildings from bright concrete surfaces [
19].
Multi-branch architectures represent another innovation, separating the processing of high-frequency details from that of global context before fusing the analogous representations. These enhancements directly address the fundamental limitation of vanilla CNNs: their inability to capture both fine details and broad context simultaneously. The literature demonstrates that augmenting segmentation models with attention and context modules yields more precise maps of urban elements; for example, in water body segmentation, attention-equipped models such as the DSIA U-Net achieved F1-scores of approximately 95% [
19], with analogous gains expected in urban segmentation tasks. In aggregate, attention mechanisms, gating approaches, and multi-scale fusion strategies represent the state of the art in deep learning methods tailored to the intricate patterns of urban landscapes.
Beyond convolution-centric attention mechanisms, recent GeoAI research has explored Vision Transformer (ViT)-based and hybrid CNN–Transformer architectures for urban feature segmentation. Transformers introduce global self-attention, enabling explicit modeling of long-range spatial dependencies that conventional convolutions capture only implicitly. Architectures such as Swin Transformer, SegFormer, and hybrid models like TransUNet and UNETR have demonstrated strong performance in high-resolution remote sensing tasks by combining local feature extraction with global contextual reasoning. In urban environments characterized by repetitive structures and complex spatial organization, transformer-based models improve discrimination between spectrally similar classes and enhance consistency across large spatial extents. However, empirical evidence indicates that pure Transformer models often require substantial training data and computational resources. Hybrid CNN–Transformer architectures offer a more balanced trade-off among accuracy, efficiency, and data requirements. As a result, transformer-enhanced segmentation frameworks are not viewed as a wholesale replacement for CNN-based urban segmentation pipelines.
Recent comparative studies highlight clear trade-offs between pure Transformer-based architectures and hybrid CNN–Transformer models in urban remote sensing and GeoAI applications. Vision Transformer-based models benefit from global self-attention and explicit modeling of long-range spatial dependencies, and they typically require substantially larger training datasets and higher computational resources to achieve stable performance, reflecting the weaker inductive biases of pure Transformer designs [
9,
20]. Vision Transformer-based models generally require larger training datasets and more computational resources than CNN-based baselines, though hybrid CNN–Transformer designs such as Swin Transformer [
21] and TransUNet [
22] achieve competitive accuracy with improved data efficiency. In contrast, hybrid CNN–Transformer models, such as TransUNet or Swin-based encoder–decoder frameworks, combine convolutional inductive biases with Transformer-based contextual reasoning, enabling improved data efficiency and competitive segmentation accuracy under limited training data conditions [
13,
22]. In the context of urban analytics—where the availability of labeled data, computational infrastructure, and scalability remain critical constraints—these findings suggest that hybrid CNN-Transformer architectures currently offer a more favorable balance among predictive performance, data efficiency, and operational feasibility for large-scale or resource-constrained urban applications.
2.1.5. Multimodal Data Fusion: Integration of RGB, Height, and LiDAR Data
Urban mapping applications often benefit from integrating multiple data sources. A prominent methodological trend in contemporary GeoAI research involves multimodal data fusion, in which CNNs ingest not only RGB imagery but also auxiliary layers such as elevation data or multispectral bands. Height information from Digital Surface Models (DSMs) or normalized Digital Surface Models (nDSMs) has proven valuable for distinguishing buildings from other land cover types, as buildings protrude above ground level [
18].
The multimodal fusion mechanisms underlying height-aware urban feature extraction illustrate the complete workflow for generating normalized Digital Surface Models (nDSMs) from stereophotogrammetric imagery and LiDAR point clouds (
Figure 2). The sequence includes image matching and dense point cloud generation, digital surface model (DSM) construction, ground point classification and digital terrain model (DTM) interpolation, and pixel-wise height normalization, visualizing the mathematical relationship nDSM = DSM − DTM, which shows how elevation information is derived and integrated into GeoAI pipelines for urban mapping and analysis.
Recent deep learning models incorporate DSM data through channel extension (e.g., using four-channel RGB + DSM images) or dual-stream architectures. Dabove et al. [
23] demonstrated that adding a DSM channel to a DeepLabv3 model for building footprint extraction in Turin, Italy, improved overall segmentation performance, consistent with the known advantage of elevation data in resolving spectral ambiguities in complex urban scenes.
Quantitative analysis reveals that multimodal fusion substantially enhances accuracy. In an experiment using 0.1 m aerial orthophotos, introducing an nDSM band yielded improvements of 3.3% in F1-score and 5.9% in IoU compared with RGB-only segmentation. A U-Net/LinkNet model trained in Turkey achieved an outstanding F1-score of approximately 97% on the validation set when incorporating four channels (R, G, B, nDSM), confirming that height data assist CNN models in resolving building outlines and separating rooftops from adjacent features, including trees and open land [
18].
Beyond raster data, three-dimensional point cloud data from LiDAR technology is combined with imagery in deep learning workflows. Ballouch et al. [
24] presented a methodology for automatic semantic segmentation of airborne LiDAR point clouds using deep models, including fusion of LiDAR data with aerial images. By extending the PointNet architecture, originally designed for point cloud processing, and integrating image features, they achieved effective three-dimensional classification of urban features. The fused approach improved identification of buildings and other urban objects in three-dimensional city models, demonstrating that two-dimensional CNN outputs and three-dimensional point-based networks can be jointly employed to produce richer urban representations. Fusion of 2D CNN outputs with 3D point-based networks improves vertical structure recognition, roof delineation, and object separation in dense urban scenes. For example, Ballouch et al. [
24] propose a methodology combining LiDAR point clouds with aerial imagery for 3D semantic segmentation of urban features, drawing on prior work demonstrating improvements in classification accuracy from multimodal fusion.
Multimodal CNN frameworks that process RGB and depth data via dual-stream architectures have become prevalent in urban GeoAI applications, as cities inherently exhibit multidimensional characteristics [
25]. The methodological consensus affirms that data fusion provides a performance advantage for fine-grained tasks such as high-rise building detection, where reliance on optical data alone may prove insufficient [
23]. Additional modalities enable networks to learn more discriminative features, producing higher accuracy and more reliable extraction of urban elements.
Despite the demonstrated performance gains from multimodal data fusion, integrating auxiliary data sources introduces practical challenges that warrant consideration. Height data derived from LiDAR or photogrammetric reconstruction requires precise spatial alignment and consistent preprocessing, while acquisition costs and data availability remain limiting factors in many regions. Naive channel-stacking approaches may underutilize cross-modal complementarities, motivating the adoption of dual-stream or attention-based fusion architectures. Recent research explores the generation of synthetic DSMs from stereo imagery and the use of self-supervised fusion strategies to mitigate these constraints, indicating that while multimodal approaches offer clear advantages, their operational deployment requires careful methodological design.
2.1.6. Evolving Methods for Diverse Urban Challenges
The application of GeoAI in urban studies has undergone rapid evolution, with methodologies tailored to specific challenges, including informal settlement mapping, green space monitoring, and high-resolution urban change detection. In contexts involving informal settlements with irregular building layouts and limited training data, deep learning models have demonstrated surprising effectiveness [
12], with analogous results observed in UAV-based urban scene segmentation under constrained training conditions [
26].
State-of-the-art CNNs, including FCN, U-Net, and DeepLabv3+, have been applied to segment informal versus formal areas across multiple developing cities [
12]. Their optimal model, DeepLabv3+, achieved over 90% accuracy in identifying the extents of informal settlements—a noteworthy performance given the heterogeneity of informal settlements. This result shows the capacity of modern segmentation networks to generalize effectively across diverse urban morphologies.
Urban environmental applications have similarly benefited from CNN methodologies. Urban green area segmentation, traditionally addressed through manual mapping processes, has been automated using CNN-based semantic segmentation. Pešek et al. [
27] report that CNN-based models (U-Net) achieved an overall accuracy of 88–95% for urban green area segmentation on Sentinel-2 data, compared with 81–93% for Random Forest classifiers. The performance gap narrows for simpler binary vegetation/non-vegetation classification tasks. The resulting high-resolution maps of urban greenery, typically derived from multispectral satellite imagery, enable planners to assess the distribution and equity of green space with unprecedented spatial detail.
An emerging application involves deep network-based urban change detection and time-series analysis. Research is employing recurrent CNNs or Transformer-based models to identify urban growth and land use changes across temporal sequences, leveraging the foundational principles of segmentation models [
13]. Furthermore, deep learning has advanced into urban environmental quality modeling. Bai et al. [
28] applied a ResNet-based CNN to assess the relationship between urban morphology and particulate air pollution, demonstrating that convolutional networks can extract urban-form features associated with PM2.5 exposure levels. This exemplifies how convolutional networks can extract latent features of urban morphology that correlate with environmental phenomena, such as air quality, extending GeoAI methods beyond pure mapping into predictive modeling.
Across these diverse applications, methodological innovations have consistently aimed to improve accuracy and computational efficiency. Techniques such as transfer learning, data augmentation, and ensemble modeling are commonly used to improve performance on limited urban datasets. In comparative evaluations, multiple architectures often achieve similarly high accuracy on very high-resolution imagery, indicating that the field has matured [
13]. In this case, U-Net and DeepLabv3+ achieved overall accuracies of 94–97% in building extraction from 0.3 m aerial orthophotos, with F1-scores of 89–91%, suggesting that, with sufficient image detail and training data, even simpler encoder–decoder architectures can perform well. Nevertheless, consensus indicates that model selection and configuration should align with specific task requirements. For mapping tasks that demand fine-grained boundary detail or multi-class differentiation, architectures incorporating advanced features, such as attention mechanisms and multi-scale context modules, offer additional advantages. Rigorous evaluation of models on benchmark datasets, including SpaceNet for buildings and ISPRS Vaihingen/Potsdam for urban land cover, has established best practices in model design and training strategy within the GeoAI community.
Deep learning models achieve high accuracy in detecting informal settlements, but recent evidence presents limitations in cross-city generalization. Models trained on a single urban context often perform poorly when applied to morphologically distinct cities, reflecting their sensitivity to local construction patterns and socio-spatial configurations. Studies demonstrate that training on multi-city datasets substantially improves generalization but does not fully eliminate domain-shift effects. Consequently, informal settlement mapping remains a data-intensive task, and transfer learning and geographically diverse training samples are critical to ensure scalable deployment.
2.2. Ensemble and Tree-Based Methods in Urban GeoAI Applications
Ensemble learning, especially Random Forest (RF) and gradient-boosting variants such as XGBoost and LightGBM, constitutes an additional methodological stream within GeoAI for urban studies, offering solid performance in scenarios where labeled datasets are limited, heterogeneous, or sourced from multiple sensors. In contrast to deep learning architectures, which typically require extensive annotated datasets and substantial computational capacity, RF-based approaches provide interpretable, computationally efficient alternatives that demonstrate strong performance on structured and semi-structured geospatial features, including spectral indices, textural metrics, and multi-temporal composites [
25].
For example, several studies report pronounced domain-shift effects when models trained on well-mapped European cities (e.g., Potsdam) are applied to morphologically distinct cities in the Global South, such as Nairobi and Dhaka. Studies examining cross-city generalization report performance degradation of approximately 10–15 percentage points in IoU or F1-score when models are transferred between morphologically distinct urban contexts [
12].
Unlike convolutional neural networks, which operate on regular raster grids, graph convolutional networks (GCNs) represent urban space as an irregular graph, enabling direct modeling of spatial relationships among geographic entities. In urban GeoAI applications, graph construction typically follows two dominant strategies: (i) pixel or superpixel-based graphs, where nodes represent spatial units connected by adjacency relations, and (ii) object or entity-based graphs, where nodes correspond to meaningful urban elements such as buildings, parcels, road segments, or neighborhoods and edges encode spatial proximity, functional connectivity, or similarity in socio-environmental attributes. The latter formulation is relevant for urban planning, as it aligns model structure with decision-relevant geographic entities rather than raster artifacts.
A graph neural network (GNN)-based analytical workflow for urban micromobility applications outlines the sequence of steps that transform heterogeneous spatial and mobility datasets into graph-structured representations suitable for learning (
Figure 3). The workflow delineates the principal methodological stages, including data preprocessing, graph construction through explicit node and edge definition, adjacency matrix formulation, and node–edge feature enrichment. These representations are then integrated into spatio-temporal modeling frameworks to capture both spatial dependencies and the temporal dynamics inherent in urban mobility systems. By visualizing the end-to-end modeling pipeline, the figure elucidates the methodological rationale by which GNN architectures enable predictive tasks, such as micromobility demand forecasting, hotspot detection, and operational planning.
The Gated Graph Convolutional Network extends standard GCN formulations by introducing a gating mechanism that adaptively regulates information propagation across graph neighborhoods. Conceptually, the gate serves as a learnable filter that controls the contributions of neighboring nodes based on feature compatibility and structural relevance, mitigating the over-smoothing effects commonly observed in deep graph models. In dense urban environments, where long-range spatial dependencies coexist with sharp functional or morphological discontinuities, gating enables the model to balance contextual aggregation with boundary preservation, explaining its superior performance in refining building footprints and complex urban structures.
Tree-based ensemble methods, including Random Forest (RF), XGBoost, and LightGBM, exhibit distinct inductive biases when applied to heterogeneous urban datasets comprising spectral metrics, socio-economic indicators, and multi-temporal features. Random Forest aggregates decorrelated decision trees through bootstrap sampling, yielding strong robustness to noise, collinearity, and variable scaling, which is advantageous in data-scarce or mixed-feature urban contexts. However, RF’s averaging mechanism limits its ability to capture higher-order feature interactions.
Gradient-boosting methods such as XGBoost and LightGBM iteratively optimize residuals, enabling more effective modeling of complex nonlinear interactions common in urban socio-environmental systems. XGBoost emphasizes regularization and loss optimization, improving generalization in moderate-sized datasets. In contrast, LightGBM employs histogram-based splitting and leaf-wise growth, offering computational efficiency and scalability for large, high-dimensional urban datasets. As a result, boosting methods often outperform RF in socio-economic classification and predictive modeling tasks, although at the cost of increased sensitivity to hyperparameter tuning and potential overfitting.
2.2.1. Tree-Based Methods for Urban Vegetation and Green Infrastructure Assessment
Tree-based ensemble methods have demonstrated particular effectiveness in assessing urban vegetation and green infrastructure, where spectral metrics serve as primary discriminators. Ramdani & Furqon [
29] found that XGBoost substantially outperformed Random Forest, ANN, and SVM in urban forest classification, achieving the lowest RMSE (1.56 vs. 4.33–7.45 for the other methods), demonstrating the relative advantage of gradient boosting under limited training data.
Random Forest remains widely adopted for urban greenness mapping, with recent work incorporating spectral indices such as the NDVI, SAVI, and NDWI into ensemble-based analytical pipelines for urban green infrastructure assessment (
Figure 4). These methodological approaches offer particular operational appeal to municipalities, as they rely on open multispectral imagery and require only moderate computational resources.
2.2.2. Multi-Temporal and Multi-Sensor Data Fusion Using Ensemble Methods
In multi-temporal and multi-sensor fusion contexts, ensemble methods have demonstrated substantial capacity to leverage temporal variability for improved class separability. Early studies demonstrated the value of Sentinel-1 SAR data for land cover mapping using machine learning classifiers [
30], and subsequent work integrating multitemporal Sentinel-1 and Sentinel-2 optical data via RF and XGBoost further enhanced the detection of built-up areas by capturing seasonal and phenological differences [
31]. The availability of benchmark datasets such as Sentinel2GlobalLULC further supports these workflows by providing standardized Sentinel-2 imagery for training and validating deep learning models across diverse land cover categories at the global scale [
32].
In a comparative land cover classification study in a forested rural district of northern Vietnam, RF achieved higher overall accuracy (96.3%) than XGBoost (94.8%), demonstrating RF’s robustness to radar-derived backscatter features from Sentinel-1 [
33]. Cloud-based platforms such as Google Earth Engine further extend the accessibility of Landsat-derived analyses, enabling urban heat island assessments using surface temperature data even in contexts with limited local computing infrastructure [
34]. At a finer scale, machine learning-assisted downscaling of land surface temperature has been combined with land cover change analysis to reveal intra-urban thermal dynamics in European cities, demonstrating the practical integration of ensemble classifiers with remote-sensing time series for spatio-temporal urban environmental monitoring [
35].
2.2.3. Ensemble Methods for Socio-Environmental Modeling and Vegetation Classification
Beyond land cover classification, ensemble classifiers have been successfully applied to socio-environmental modeling, including the prediction of the urban-level Human Development Index (HDI) and demographic stratification. Comparative benchmarking of RF, XGBoost, and LightGBM provides updated performance baselines for future investigations [
36].
Additionally, species-level vegetation classification using RF with multi-resolution inputs demonstrates the value of spatial resolution complementarity: high-detail imagery from sensors such as Pléiades provides fine-scale crown delineation, while coarser sensors, including Sentinel-2, supplement contextual gradients, jointly improving classification for urban forestry applications [
37].
2.2.4. Methodological Position of Ensemble Methods in Urban GeoAI
Taken in aggregate, the literature demonstrates that ensemble methods provide a versatile methodological toolkit for urban GeoAI in contexts involving (1) limited training data, (2) structured feature spaces, or (3) multi-temporal and multi-sensor data fusion. While deep learning models dominate semantic segmentation of continuous surfaces, RF and boosting techniques remain methodologically relevant due to their interpretability, stability across input modalities, and strong performance under data scarcity constraints (
Figure 5). Their operational simplicity renders them well suited for urban planning agencies and environmental authorities seeking evidence-based mapping and monitoring without the substantial cost burdens associated with large-scale labeling or GPU-intensive training regimes [
13].
From a methodological perspective, ensemble learning and deep learning should be viewed as scientific additions, yet competing paradigms within urban GeoAI. While deep neural networks dominate tasks requiring pixel-level semantic interpretation and complex spatial pattern recognition, ensemble methods retain advantages in interpretability, stability, and efficiency for structured feature spaces and multi-temporal analyses. This dual-methodological ecosystem reflects a maturing research field in which model selection is driven by task-specific constraints, data availability, and operational feasibility, rather than by algorithmic novelty alone.
3. Doubts and Difficulties in GeoAI-Driven Urban Planning
Building on the thematic structure presented in
Table 1, this section examines the cross-cutting challenges that emerge across the reviewed GeoAI applications in urban analytics. This aggregation shows how different methodological streams address distinct urban challenges, moving beyond descriptive model comparison toward thematic and relevant insights to planning policies.
Data inequality in the Global South is a significant limitation of GeoAI. This data scarcity is often described as a digital divide or data desert [
38]. Many cities in the Global South suffer from fragmented, outdated, or inaccessible spatial data, which constrains model reliability and reinforces digital divides. This raises concerns about the transferability of GeoAI solutions developed primarily in data-rich contexts in the Global North. A recent analysis of OpenStreetMap (OSM) building completeness reveals a paradox: while some low-income regions have high coverage due to humanitarian mapping campaigns, many medium-income cities in the Global South remain largely unmapped [
39]. Consequently, models trained on data-rich European cities often suffer from domain shift when applied to these contexts, producing unreliable outputs for local planning. The unequal geography of spatial data production raises fundamental questions of data justice and representation. Urban areas that are poorly mapped or inconsistently monitored often coincide with populations that are politically marginalized or economically disadvantaged. As a result, GeoAI models trained on incomplete or biased datasets risk reinforcing existing power asymmetries by rendering some urban realities invisible. This challenge is acute in informal settlements, where legal ambiguity and infrastructural precarity complicate data collection. Addressing these issues requires not only technical solutions, such as transfer learning and uncertainty quantification, but also institutional strategies that promote open data, community mapping initiatives [
40], and international data-sharing frameworks. Without such efforts, the transformative potential of GeoAI may remain unevenly distributed across regions and social groups.
GIS-based accessibility studies of urban services illustrate how spatial analytics can directly inform planning and policy decisions, particularly in relation to equity and service provision. An illustrative case is the analysis of pharmacy distribution and accessibility in the city of Craiova, which combined spatial proximity measures, population density data, and transport network information to identify service gaps and uneven accessibility across urban districts. The results revealed a strong centralization of pharmacies along major transport corridors, alongside peripheral residential areas with reduced access, despite overall compliance with national demographic regulations. While the study relied on classical GIS techniques such as buffer analysis and Euclidean distance, this logic aligns closely with contemporary GeoAI applications. Recent GeoAI models extend this paradigm by enabling dynamic accessibility assessment, the integration of multi-source data streams, and predictive evaluation of service location scenarios. GIS-based service accessibility analyses serve as a critical bridge between traditional spatial planning tools and advanced GeoAI-driven urban planning for decision-making [
41].
Model Interpretability and Trust—Black-box AI models challenge transparency and accountability in planning decisions. For policymakers and local stakeholders, limited interpretability undermines trust and complicates the integration of GeoAI outputs into formal planning processes. Interpretability is not a purely technical preference; it is a prerequisite for planning legitimacy in contexts where model outputs can influence zoning, investment priorities, risk classification, and the distribution of public services. In urban governance, decisions are expected to be justifiable, contestable, and consistent with administrative law and democratic accountability. When GeoAI systems provide predictions without intelligible reasoning, planners face difficulties in defending decisions to elected officials, courts, and affected communities. This challenge is amplified in equity-sensitive domains (informal settlement upgrading, relocation from hazard-prone areas, or targeted climate adaptation) where opaque models are perceived as technocratic instruments that reinforce exclusion instead of reducing vulnerability. Interpretability should be treated as an operational design requirement instead of an add-on to otherwise high-performing models [
42]. Intrinsic interpretability prioritizes model classes that are transparent by construction (e.g., generalized additive models, monotonic constraints, sparse models), which can be beneficial in regulatory settings but may limit predictive performance in highly complex tasks. Post hoc explainability applies model-agnostic methods, such as Shapley value-based explanations, to complex models, including deep neural networks, offering both global and local insights into feature contributions.
Such applications introduce distinct challenges, and explanations must preserve geographic context and account for spatial dependence, scale effects, and the risk of misleading interpretations (e.g., a variable appearing important globally while driving outcomes only in specific neighborhoods). Recent geospatial XAI research indicates that conventional explainability plots often strip away geographic meaning and therefore require map-based visual analytics to maintain spatial interpretability and usability for planning stakeholders [
7]. A critical and often underreported dimension is the communication of uncertainty. Urban decision-making rarely requires only a point prediction; it requires understanding confidence, sensitivity to data gaps, and stability across plausible scenarios. This is important in the Global South, where datasets may be incomplete, outdated, or unevenly distributed across formal and informal areas. In such settings, interpretability must be coupled with explicit disclosure of uncertainty (e.g., confidence intervals, spatially explicit error maps, sensitivity to missing data) to avoid false precision and to support risk-aware policy choices. Good-practice work in interpretable machine learning emphasizes that explanations can be misused if they are treated as causal evidence, if model assumptions are ignored, or if spatial confounding is not addressed; therefore, interpretable GeoAI should be framed as decision support that complements, not replaces, domain expertise and participatory deliberation [
43]. There is a growing shift toward spatially explicit, graph-based explainability in urban analytics, reflecting that many urban systems (transportation, neighborhood interactions, and infrastructure networks) are naturally represented as graphs. Graph neural networks and related models can achieve strong predictive performance. Still, their black-box nature can be partially mitigated through graph-specific explainers that attribute influence to nodes, edges, and neighborhood structure. Such approaches offer a promising direction for interpretable GeoAI in urban systems analysis, when combined with interactive dashboards that allow planners to explore explanations geographically and examine trade-offs across policy scenarios [
44].
Ethical and Governance Concerns—GeoAI applications may inadvertently reproduce social biases embedded in data, leading to exclusionary or unjust outcomes. Issues related to surveillance, privacy, and algorithmic governance are sensitive in marginalized communities and informal settlements. Ethical governance of GeoAI extends beyond data privacy and algorithmic bias to encompass broader questions of accountability, transparency, and public participation [
45]. Urban planning decisions informed by GeoAI (e.g., zoning changes, risk classifications, or infrastructure prioritization) can have profound social consequences, including displacement and unequal access to resources. Ensuring ethical GeoAI, therefore, requires institutional mechanisms for oversight, stakeholder engagement, and grievance redress. In the absence of such mechanisms, there is a risk that algorithmic authority may substitute democratic deliberation in contexts with weak governance structures. Embedding ethical principles into GeoAI workflows is thus essential for maintaining the legitimacy of technology-supported planning. The deployment of AI systems in urban planning introduces ethical challenges that extend beyond technical performance, encompassing bias, privacy violations, and accountability deficits, with profound implications for social equity. A comprehensive review of ethical concerns in AI-driven urban planning identified five interconnected risk areas: (1) bias arising from skewed training data and historical inequities, which perpetuates unequal outcomes disproportionately affecting marginalized communities; (2) privacy violations through extensive data collection via sensors, surveillance cameras, and mobile applications without adequate informed consent; (3) accountability ambiguity regarding responsibility when AI systems produce harmful outcomes; (4) transparency deficits stemming from the “black-box” nature of many machine learning algorithms; and (5) equity erosion when AI-driven planning tools remain inaccessible to low-income populations lacking digital literacy or broadband infrastructure. If historical data used to train urban planning algorithms favor investment in affluent neighborhoods over informal settlements, the AI model will likely recommend resource allocations that perpetuate this bias, systematizing discrimination at scale. Mitigating these risks requires urban planners to adopt a human presence approach that incorporates human oversight, participatory co-design with affected communities, transparent documentation of algorithmic reasoning, regular algorithmic audits to detect bias, and reliable data security protocols, including encryption and mechanisms for informed consent. Planning organizations must prioritize multilingual public engagement, provide digital literacy training to bridge the divide, and embed equity considerations throughout all phases of GeoAI development (from data collection through model validation), ensuring that AI functions as a tool for inclusive, urban development instead of a mechanism for perpetuating systemic inequalities [
46].
4. Opportunities for Inclusive and Fair Urban Development
Addressing poverty and social vulnerability with GeoAI offers significant opportunities to identify spatial concentrations of poverty, service deprivation, and environmental injustice. When combined with participatory approaches, GeoAI can support targeted interventions and more equitable allocation of public resources. Recent GeoAI applications have demonstrated the capacity to systematically map informal settlements and intra-urban inequalities across rapidly urbanizing regions [
47]. A representative example of GeoAI applied to socio-economic assessment in data-scarce contexts is provided by Yeh et al. [
48], who estimated spatial patterns of economic well-being across several countries in sub-Saharan Africa, including Nigeria, Uganda, and Tanzania. The analysis was conducted at the level of regular grid cells, with a spatial resolution of 1–5 km, enabling sub-national comparisons in the absence of reliable household survey data. The study relied primarily on multi-temporal Landsat imagery, from which convolutional neural networks were trained to extract latent features of the built and natural environment that correlate with asset-based poverty indicators. By linking satellite-derived features to available survey-based wealth indices, the authors demonstrated that GeoAI models can capture meaningful spatial variation in economic well-being, offering a scalable and cost-effective complement to traditional data collection methods in regions where official socio-economic statistics are sparse or outdated. This approach illustrates the potential of GeoAI to support evidence-based urban and regional policy design under severe data constraints. Supporting fair and transparent urban policies, explainable AI, open data platforms, and transparent modeling workflows can enhance democratic governance. GeoAI-based decision support systems have the potential to make planning processes more inclusive, evidence-based, and accountable. Evidence from flood risk modeling shows that climate adaptation investments often disproportionately benefit wealthier areas. At the same time, socially vulnerable communities remain exposed, underscoring the need for equity-sensitive GeoAI frameworks in urban planning [
49].
Enhancing livability and well-being, GeoAI can contribute to livable cities by optimizing the distribution of green space, reducing exposure to environmental hazards, and improving access to services. These applications directly support health, well-being, and quality-of-life objectives. Livability is widely recognized as a multidimensional concept encompassing environmental quality, accessibility, safety, and social cohesion. GeoAI contributes to livability assessment by enabling fine-scale analyses of green space accessibility, heat exposure, noise pollution, and service availability. When combined with health data, these analyses provide evidence for interventions to reduce environmental health disparities. In cities across the Global North and the Global South, GeoAI-driven livability indicators can inform integrated urban policies that align with objectives of climate adaptation, public health, and social inclusion. Translating these analytical insights into tangible improvements requires sustained political commitment and cross-sectoral coordination.
Recent empirical studies demonstrate that GeoAI and closely related spatial AI methods can meaningfully contribute to the assessment of urban livability and well-being by enabling fine-scale, spatially explicit analyses of environmental exposure and access to urban amenities. GeoAI-driven models have been applied to map inequitable exposure to climate-related hazards such as flooding and heat stress, revealing systematic disparities that disproportionately affect socially vulnerable populations [
49]. Image-based deep learning approaches using street-level and aerial imagery have been shown to capture latent characteristics of urban environments, such as greenery, building density, and street morphology, that are strongly associated with perceived livability and neighborhood-level quality-of-life indicators [
50]. At the intra-urban scale, convolutional neural network-based segmentation of green spaces from satellite imagery enables detailed assessments of green infrastructure availability and accessibility, providing evidence for evaluating environmental equity and supporting health-oriented urban planning interventions [
27]. These studies illustrate how GeoAI-based livability metrics can support evidence-informed urban policies by linking environmental conditions, accessibility, and social outcomes within a unified analytical framework.
Human mobility is a rapidly expanding domain within GeoAI-driven urban analytics, with direct implications for spatial decision-making, resilience, and social inclusion. Recent studies progressively adopt deep spatio-temporal learning frameworks and graph neural networks to analyze large-scale mobility data from mobile phone records, GPS trajectories, public transport smart cards, and traffic sensors, aiming to model dynamic movement patterns across urban systems [
51]. These approaches enable the identification of mobility inequalities, congestion hotspots, and accessibility gaps that are not readily observable through static land use or infrastructure data alone. In the context of climate risk and disaster management, GeoAI-based mobility models have been used to support evacuation planning, emergency response optimization, and population exposure assessment during extreme events, strengthening urban resilience strategies [
51,
52]. Mobility-informed GeoAI analyses contribute to socially inclusive planning by revealing how transport constraints disproportionately affect low-income or peripheral communities, providing evidence to guide equitable investments in public transport and service provision. Human mobility analytics represent a critical extension of GeoAI, moving from static spatial mapping to dynamic, behavior-aware urban support systems.
Urban resilience requires integrated assessments of climate, environmental, social, and economic risks. GeoAI enables the development of composite resilience indicators and scenario-based simulations that inform adaptation strategies. In climate-vulnerable regions, in the Global South, GeoAI can support early warning systems, adaptive infrastructure planning, and nature-based solutions [
52].
5. Future Research Directions
The following research directions are grounded directly in the thematic gaps identified and synthesized across the reviewed literature. Rather than proposing generic methodological advances, each recommendation responds to a specific unresolved challenge documented by the studies reviewed in this paper, structured around the seven thematic areas: urban morphology and building extraction, informal settlement mapping, urban green infrastructure, climate risk and resilience, human mobility, socio-economic indicators and livability, and governance.
5.1. Urban Morphology and Building Extraction: From Accuracy to Generalizability
Research on CNN-based building extraction has demonstrated near-manual accuracy in well-mapped urban contexts [
13,
16,
17,
18,
24]. Yet, limited cross-city generalization and high labeling and computational costs are the dominant remaining gaps. Studies consistently show that models trained on European benchmark datasets such as ISPRS Vaihingen and Potsdam suffer significant performance degradation of approximately 10–15 percentage points in IoU or F1-score when applied to morphologically distinct cities in the Global South [
4,
12]. The responsible AI case study by Tonnarelli and Mora [
4] further documented how shifts in task definition introduce algorithmic focus bias, causing systematic misclassification of formal structures in non-informal areas. Future research should therefore prioritize domain-adaptive and few-shot learning architectures that reduce reliance on large annotated datasets, alongside the construction of benchmark datasets that better represent the morphological diversity of Global South cities. Addressing computational cost requires parallel investment in lightweight encoder–decoder architectures deployable without GPU clusters, making building extraction operationally feasible for municipalities with constrained technical infrastructure.
5.2. Informal Settlements and Inequality Mapping: Embedding Ethics into Analytical Workflows
Deep learning models, including DeepLabv3+ and transfer learning approaches, have achieved over 90% accuracy in identifying the extents of informal settlements across morphologically diverse developing cities [
12], demonstrating that cross-context transferability is technically feasible even with limited local training data. The domain shift, bias, and ethical risks are the central unresolved challenges in this thematic area. As Tonnarelli and Mora [
4] demonstrated, task redefinition without ethical oversight can cause GeoAI systems to misclassify and render invisible the very populations they are systematically intended to serve, undermining equitable resource allocation. Future research must embed bias auditing, uncertainty quantification, and ethical review protocols directly into GeoAI workflows for informal settlement contexts. Critically, this requires not only technical solutions such as geographically diverse training samples and uncertainty-aware predictions, but also institutional mechanisms, including community validation and participatory mapping that bring local knowledge into the modeling process and ensure that analytical outputs reflect, rather than obscure, the realities of marginalized populations.
5.3. Urban Green Infrastructure and Environment: Integrating Spatial Mapping with Health and Social Outcomes
Tree-based ensemble methods and CNN-based segmentation have achieved strong results in urban green space mapping. XGBoost achieved the lowest RMSE in urban forest classification, substantially outperforming Random Forest, ANN, and SVM [
29], while CNN-based models (U-Net) achieved an overall accuracy of 88–95% for green area segmentation from multispectral Sentinel-2 imagery, outperforming Random Forest classifiers [
27]. Random Forest pipelines incorporating spectral indices such as the NDVI, SAVI, and NDWI have further demonstrated strong performance in mapping green cover, water bodies, built-up surfaces, and barren land. The weak integration between health and social data is the critical remaining gap. These GeoAI-derived green infrastructure maps are rarely connected to neighborhood-level health records, social vulnerability indices, or environmental equity indicators. Future research should develop integrated frameworks that link spatially explicit green infrastructure assessments to health outcome data and socio-economic stratification, enabling genuinely equity-sensitive evaluations of environmental access. Such integration would shift GeoAI applications in this domain from descriptive coverage mapping toward evidence-based tools for planning interventions that directly address environmental health disparities.
5.4. Climate Risk and Urban Resilience: Advancing Equity-Sensitive and Uncertainty-Aware Frameworks
Spatio-temporal deep learning models and ensemble classifiers have substantially advanced flood and heat risk mapping at fine spatial scales [
34,
35,
49], enabling scenario-based planning and early warning systems in both the Global North and the Global South [
53,
54]. We consider here two interconnected gaps: equity issues in the distribution of climate adaptation benefits and limited communication of model output uncertainty. Wing et al. [
49] demonstrated that flood risk and climate adaptation investments systematically favor wealthier areas, while socially vulnerable communities remain disproportionately exposed. This pattern of inequitable climate risk distribution cannot be addressed by technical accuracy improvements alone. Future research must develop equity-sensitive GeoAI frameworks that incorporate social vulnerability weighting into risk models, ensuring that planning outputs prioritize exposed and marginalized communities rather than reinforcing existing patterns of investment. Equally, model outputs must be accompanied by spatially explicit uncertainty communication that includes confidence intervals, sensitivity analyses, and error maps so that decisions in data-scarce Global South contexts are not driven by false precision. Interdisciplinary collaboration between urban analytics, climate science, and community engagement, as emphasized by Lobo et al. [
53] and Ayadi et al. [
55], is essential to ensure that adaptation strategies are both scientifically grounded and socially inclusive.
5.5. Human Mobility and Transport Systems: Advancing Privacy-Preserving and Interpretable Graph Models
Graph Neural Networks and spatio-temporal deep learning frameworks have demonstrated strong analytical capacity for traffic forecasting, evacuation planning, and the identification of mobility inequalities across urban systems [
51]. By modeling heterogeneous urban data streams, including mobile phone records, GPS trajectories, transport sensors, and smart card data, these approaches capture relational urban dynamics that static land use or infrastructure datasets cannot reveal. Privacy and interpretability challenges are the primary remaining gaps. The collection and processing of granular mobility data raises fundamental concerns about surveillance, informed consent, and the rights of individuals in communities already subject to heightened monitoring. Future research should advance privacy-preserving GNN architectures, including federated learning and differential privacy mechanisms, that enable population-level mobility analysis without requiring access to individual-level trajectory data. The development of graph-specific explainability methods (capable of attributing predictions to specific nodes, edges, and neighborhood structures) would make GNN-based mobility outputs both interpretable to planning practitioners and legally defensible in contexts where algorithmic decisions have direct consequences for infrastructure investment and service provision.
5.6. Socio-Economic Indicators and Livability: Validating Proxy Inference Against Community Realities
CNN-based proxy modeling of economic well-being from satellite imagery has demonstrated the potential to generate high-resolution socio-economic approximations in data-scarce contexts. Yeh et al. [
48] successfully estimated spatial patterns of economic well-being across Nigeria, Uganda, and Tanzania using Landsat-derived features linked to asset-based poverty indices, providing a scalable complement to conventional survey-based data collection. GeoAI-driven livability analyses from street-view imagery and aerial data have similarly captured neighborhood-level characteristics, including greenery, building density, and street morphology, that correlate strongly with perceived quality of life [
50]. Indirect inference and ethical concerns are the unresolved challenges in this domain. Reliance on proxy indicators derived from satellite imagery introduces systematic risks: the model reflects what is visible from above rather than what is experienced on the ground, and may reproduce historical biases embedded in the training data. As Sanchez et al. [
46] demonstrated, AI-driven planning tools trained on skewed data disproportionately affect marginalized communities by systematizing discrimination at scale. Future research must therefore develop validation frameworks that cross-check satellite-derived socio-economic proxies against community-generated qualitative data, participatory assessments, and locally validated indicators. This ensures that GeoAI-based livability tools complement, rather than substitute for, direct engagement with the populations whose living conditions they are intended to assess.
5.7. Governance Support Systems: Moving from Pilot Applications to Institutional Embedding
Explainable AI methods and hybrid GeoAI–LLM frameworks have demonstrated improvements in the capacity for evidence-based and participatory planning support [
7,
42,
44]. Spatially explicit XAI approaches and interactive dashboards enable planners to explore model explanations at a geographic scale and examine trade-offs across policy scenarios [
44]. At the same time, Shapley value-based methods provide both global and local insights into feature contributions [
43]. However, limited institutional uptake is the most persistent gap across all governance-related applications. This institutional gap is not primarily a technical problem. As Sanchez et al. [
46] identified, five interconnected governance risks (bias, privacy violations, accountability ambiguity, transparency deficits, and equity erosion) require institutional mechanisms rather than algorithmic fixes: human-in-the-loop oversight, participatory co-design with affected communities, transparent algorithmic documentation, regular bias audits, and accessible data security protocols. Xing and Sieber [
42] further argued that interpretability in GeoAI must be treated as an operational design requirement embedded from the outset, not as a post hoc addition to otherwise high-performing models. Future research should therefore focus on co-designing operational GeoAI workflows with planning agencies, embedding outputs into statutory planning instruments and policy evaluation cycles, and developing governance protocols that make algorithmic outputs accountable to democratic deliberation. The growing potential of hybrid GeoAI and LLM architectures for participatory planning, as illustrated in
Figure 6, represents a promising direction for bridging spatial analytics with stakeholder engagement, provided that LLMs are positioned as interpretive and facilitative components rather than substitutes for spatial inference or democratic decision-making.
An important frontier for future GeoAI research is the integration of spatial learning models with emerging foundation and language models. Replacing spatial inference methods, such hybrid frameworks, may facilitate more effective integration of spatial analytics, planning documents, and participatory decision-making processes. Developing transparent, uncertainty-aware interfaces between GeoAI outputs and language-based reasoning systems represents a promising direction for advancing explainable, urban intelligence.
The cumulative evolution of spatial knowledge systems has advanced GeoAI, yet emerging spatially oriented large language models (LLMs) remain fundamentally grounded in the principles of cartography and topography (
Figure 7). While GISs introduced computational handling of spatial data and AI enabled pattern recognition and predictive modeling, these advances do not replace foundational spatial concepts such as scale, projection, spatial reference, and terrain representation. Instead, recent developments indicate a layered paradigm in which GeoAI performs spatial inference on georeferenced data, and LLMs operate as interpretive, integrative, and communicative interfaces that support model orchestration, knowledge synthesis, and decision translation. The figure therefore reflects both the present state of the field (characterized by hybrid GIS–GeoAI workflows) and its future trajectory, in which spatial intelligence is progressively augmented by language-based systems while remaining epistemically dependent on classical geographic knowledge for accuracy, validity, and policy relevance.
Across all seven thematic areas, a critical cross-cutting priority is the development of bidirectional knowledge flows between local and global GeoAI research. Models developed using globally available datasets and generalized architectures must be iteratively refined through insights from local applications in data-scarce, rapidly urbanizing regions, thereby avoiding the uncritical transfer of Global North-centric assumptions [
54]. In data-rich cities of the Global North, future GeoAI research can focus on advancing interpretability, communicating uncertainty, and aligning model outputs with statutory planning frameworks. In cities of the Global South, where informal settlements, rapid urban expansion, fragmented governance, and heightened climate exposure prevail, the priority is lightweight, scalable, and transferable GeoAI models that integrate remote sensing data with proxy socio-economic indicators and remain operational under severe data and infrastructure constraints [
52]. Institutionalizing GeoAI as a routine component of urban planning in both contexts requires sustained investment in interdisciplinary education, open-source tools, and knowledge transfer mechanisms that build local technical capacity within planning institutions rather than consolidating expertise among external technology providers.
6. Conclusions
While technological advances are substantial, their societal value depends on addressing data inequality, ethical concerns, and governance challenges. GeoAI has significant potential to enhance climate resilience, promote social inclusion, and support fair and livable cities, provided it is embedded within transparent, inclusive, and policy-important frameworks. GeoAI has moved beyond an experimental analytical approach to become a central component of contemporary urban science. Its capacity to integrate heterogeneous spatial data, model complex interactions, and support scenario-based planning makes it well suited to addressing the intertwined challenges of climate change, inequality, and rapid urbanization.
The effectiveness of GeoAI in supporting resilient, inclusive, and livable cities ultimately depends on its embedding within institutional frameworks. In the Global South, GeoAI offers unprecedented opportunities to address data scarcity and support evidence-based planning, but only if issues of representation, bias, and governance are addressed. In the Global North, GeoAI can enhance the sophistication of climate adaptation and sustainability strategies, provided that equity considerations remain central.
Future research must therefore adopt an integrative perspective that aligns methodological innovation with social responsibility and policy relevance. By embedding GeoAI within transparent, inclusive, and climate-resilient planning frameworks, urban science can contribute not only to better cities but to more just and sustainable urban futures.
The most critical gap identified through this review is not algorithmic but institutional. Current GeoAI research disproportionately invests in improving model accuracy by incremental margins, while the question of how analytical outputs actually enter planning workflows, budgeting decisions, and regulatory procedures receives far less attention. Experience with GIS-based spatial analysis in Romanian and European urban and environmental contexts suggests that even well-established geospatial methods struggle to gain traction in municipal planning processes when institutional capacity, data governance, and interagency coordination remain weak. Advanced deep learning models will face the same obstacles at a larger scale unless future research addresses the organizational and procedural conditions under which GeoAI can be operationalized.
A further implication of this review is the need to move from experimental or project-based GeoAI applications to institutionalized, operational urban intelligence systems. Many successful GeoAI use cases remain confined to pilot studies, research projects, or isolated municipal initiatives, limiting their long-term impact. Future progress will depend on the capacity of urban institutions to integrate GeoAI tools into routine planning cycles, monitoring systems, and policy evaluation frameworks. This includes aligning GeoAI outputs with administrative procedures, budgeting mechanisms, and legal mandates to ensure continuity beyond individual projects or political cycles. Without such institutional embedding, GeoAI risks remaining an advanced analytical layer disconnected from actual decision-making.
Another critical dimension relates to capacity building and asymmetries in technical expertise. The effective use of GeoAI requires not only data and algorithms, but also skilled professionals capable of interpreting outputs, communicating uncertainty, and translating analytical results into policy insights. In many Global South contexts, limited technical capacity within planning institutions may constrain the uptake of GeoAI, even when data and tools are available. Conversely, Global North cities face challenges stemming from organizational silos and an over-reliance on external technology providers. Addressing these issues calls for sustained investment in interdisciplinary education, open-source tools, and knowledge-transfer mechanisms that empower local institutions.
The importance of feedback loops between local and global urban knowledge. While GeoAI models are often developed using globally available datasets and generalized architectures, urban realities remain highly context-specific. Future research should therefore promote iterative learning processes in which insights from local applications in data-scarce, rapidly urbanizing regions inform the refinement of global models and methodologies. Such bidirectional knowledge flows can help avoid the uncritical transfer of Global North-centric assumptions and foster more context-aware, adaptable GeoAI frameworks.
Recent advances in GeoAI and the emerging incorporation of spatially oriented large language models remain epistemically dependent on the foundational principles of cartography, topography, and GISs, which provide the spatial reference, scale consistency, and representational logic required for reliable analysis. While GeoAI enhances the capacity to model complex spatial patterns and generate predictive insights, and LLMs contribute to the interpretation, integration, and communication of analytical results, neither can operate meaningfully in isolation from geographic theory and domain expertise. The future of the field lies in hybrid human–AI systems, in which classical geographic knowledge anchors advanced computational intelligence within transparent, context-aware, and policy-based links to decision-making frameworks.
As GeoAI becomes embedded in urban governance, it is important to reflect on its long-term societal and political implications. The growing role of algorithmic systems in shaping spatial priorities, risk classifications, and investment decisions raises fundamental questions about power, accountability, and democratic control. Urban science has a responsibility to critically engage with these dynamics, ensuring that GeoAI acts as an instrument of collective benefit rather than as a driver of technocratic governance. Embedding reflexivity, transparency, and public oversight into GeoAI-enabled planning processes will be essential for maintaining trust and legitimacy as cities navigate the complex challenges of climate change, inequality, and sustainable development.