Towards Trustworthy Urban Land Use Classification: A Synergistic Fusion of Deep Learning and Explainable Machine Learning with a Nanning Case Study

Zheng, Yusheng; Huang, Xinying; Yao, Huanmei

doi:10.3390/land15010158

Open AccessArticle

Towards Trustworthy Urban Land Use Classification: A Synergistic Fusion of Deep Learning and Explainable Machine Learning with a Nanning Case Study

by

Yusheng Zheng

^1,2,

Xinying Huang

^1,2 and

Huanmei Yao

^1,2,*

¹

School of Resources, Environment and Materials, Guangxi University, Nanning 530004, China

²

Key Laboratory of Environmental Protection (Guangxi University), Education Department of Guangxi Zhuang Autonomous Region, Nanning 530004, China

^*

Author to whom correspondence should be addressed.

Land 2026, 15(1), 158; https://doi.org/10.3390/land15010158

Submission received: 4 December 2025 / Revised: 10 January 2026 / Accepted: 12 January 2026 / Published: 13 January 2026

(This article belongs to the Special Issue Feature Papers for "Land Innovations—Data and Machine Learning" Section: 3rd Edition)

Download

Browse Figures

Versions Notes

Abstract

While artificial intelligence (AI) has advanced urban land use classification, its application in high-stakes decision making, such as urban planning, demands not only high accuracy but also transparency and interpretability. This study evaluates the potential of Google Satellite Embeddings (GSE), a ready-to-use dataset of AI-generated numerical features that capture deep land cover characteristics, for land use classification in the central urban area of Nanning in 2022. A synergistic analytical framework was constructed by integrating the 64 high-dimensional features of GSE data with the feature attribution of Shapley Additive Explanations (SHAP), merging deep learning features with explainable machine learning. The results demonstrate that the XGBoost model (OA = 85.00% ± 2.24%) significantly outperformed the Random Forest (RF) model (OA = 81.87% ± 1.72%) overall. Key abstract features were successfully interpreted as comprehensible geographic semantics, with A51 and A36 corresponding to built-up intensity and vegetation cover, respectively. Moreover, XGBoost enabled more refined decisions than Random Forest (RF) due to its superior ability to distinguish between functionally distinct classes that have similar physical appearances. This framework provides a scalable and transferable analytical solution for the challenges of feature limitations and insufficient model transparency in urban land use classification.

Keywords:

urban land use; machine learning; satellite embeddings; explainable AI; semantic analysis

1. Introduction

Driven by global urbanization and the rise in smart cities, the practice of governing cities, which encompasses urban planning and resource management, is transitioning from traditional, static approaches to data-driven, dynamic, and fine-grained paradigms [1,2,3]. In this context, accurate and continuously updated urban land use information has become the bedrock of scientific decision making and resource allocation [4]. This foundational data underpins critical applications ranging from environmental management, such as modeling urban heat islands, to strategic planning for public infrastructure and services. While artificial intelligence (AI) has significantly enhanced the accuracy of land use classification, the increasing complexity accompanying these performance gains has rendered model decision making opaque, introducing trust deficits in practical applications [5,6]. For high-stakes domains such as urban planning, a model must not only deliver precise results but also articulate the logic and evidence behind its decisions [7,8]. Therefore, enhancing the interpretability of high-performance AI models, especially complex deep learning and ensemble algorithms, is a critical prerequisite for their trustworthy application in planning.

To accurately map complex urban landscapes, current research has increasingly integrated multi-source data and advanced algorithms [6]. On one hand, the fusion of traditional remote sensing features like spectral and texture information with social sensing data from sources such as OpenStreetMap (OSM) and Points of Interest (POI) has shifted the unit of analysis from pixels to urban parcels, significantly improving semantic representation [9,10,11]. This shift from pixels to parcels is a crucial methodological decision grounded in the distinction between land cover and land use. While pixel-based and object-based image analysis excel at classifying physical land cover (surfaces such as asphalt, grass, and concrete), they are often less suited for identifying the socio-economic land use, which involves distinguishing between functions like commercial and residential activities [10,12]. The urban parcel, defined by roads and property boundaries, represents the fundamental semantic unit at which urban functions are organized and managed [13]. Therefore, adopting a parcel-based approach aligns the unit of analysis with the actual phenomenon being studied. The consensus in the literature is that aggregating image features within these parcels creates a more holistic and stable representation of function, which mitigates the salt-and-pepper noise inherent in pixel-based methods and yields spatially coherent outputs directly usable for urban planners [9,11,14]. On the other hand, AI models have been widely adopted to boost the accuracy and automation of land use classification [15,16]. Traditional machine learning models like Random Forest (RF) offer a balance between computational cost and performance [17]. Deep learning, despite its computational demands, has proven effective and robust in identifying urban land use patterns by automatically learning deep semantic features from multi-source data [18,19]. However, despite the advances in fine-grained urban understanding driven by the synergy of multi-source data and AI, existing methods still rely heavily on manual feature engineering and expert knowledge, and struggle to comprehensively capture the fine-grained spatial structures of complex urban scenes [20,21]. Moreover, relying entirely on deep learning models trained from scratch often leads to the dual problems of high computational costs and poor generalizability [22,23].

While deep learning methods excel at feature extraction, their internal representations are highly abstract, creating an opacity that limits trustworthy application in high-stakes domains [24]. In response, Explainable Artificial Intelligence (XAI) has emerged to demystify these black boxes, with tools like SHAP enabling the quantification of each feature’s contribution to a prediction [25,26,27]. However, a critical limitation persists in current XAI applications for urban science: their outputs often remain confined to the level of machine logic. They excel at pinpointing what data the model prioritizes but fall short of articulating why in a geographically meaningful way. Pioneering studies exemplify this challenge. Methods like Grad-CAM++ and explainable boosting machines have been successfully used to reveal which image regions and POI types were most influential in a land use model [28]. In a similar vein, feature attribution and counterfactual analysis have been leveraged on graph-based models to show how mobility data contributed to predictions [29]. Despite their value, these explanations stop short of providing a true geographic narrative. They tell a planner that certain pixels or POI counts were weighted heavily, but not that the decision was based on recognizing a pattern of high-density built form, complex vertical structure, or a specific texture of mixed land cover—which is the language of urban planning. Thus, a crucial gap exists in bridging this interpretive divide: moving from identifying important abstract features to decoding the tangible geographic concepts they represent.

The recent emergence of the GSE data offers a novel approach for addressing the shallow feature problem in remote sensing. Generated by Google’s AlphaEarth Foundation (AEF), it transforms global, multi-source satellite imagery into a high-dimensional feature vector through deep learning [30]. These pre-encoded features automatically capture the deep spectral-spatial structures and semantic patterns of land cover, holding the potential to fundamentally bypass the reliance on manually engineered features and provide an unprecedented feature foundation for fine-grained, parcel-level classification [31].

This presents an opportunity to forge a third way that navigates the trade-off between traditional methods and end-to-end deep learning. While traditional approaches using manually engineered features are interpretable but often limited by a shallow feature bottleneck, end-to-end deep learning models offer high accuracy but are computationally expensive black boxes. The proposed framework seeks to synergize these approaches by leveraging the pre-encoded deep features from GSE to achieve high performance, while utilizing explainable machine learning to maintain efficiency and ensure the transparency required for trustworthy application in urban planning.

Therefore, this study applies this synergistic framework to the land use classification of the central urban area of Nanning. It aims to achieve two core objectives.

To evaluate the performance of machine learning models using GSE data for urban land use classification.
To enhance model interpretability by decoding the geographic semantics of the features and dissecting the model’s decision-making mechanisms.

This study provides a synergistic analytical framework that integrates deep learning features with explainable machine learning, thereby paving the way for more trustworthy and interpretable approaches in smart city planning.

2. Materials and Methods

2.1. Study Area

This study focuses on the central urban area of Nanning, the capital city of the Guangxi Zhuang Autonomous Region in Southern China. It consists of the contiguous urban built-up region inside the Nanning Ring Expressway and the Nanning International Railway Port area, with a total area of approximately 665 km² (Figure 1). This area spans several major administrative districts, including Xixiangtang, Qingxiu, Xingning, Jiangnan, Yongning, and Liangqing, which represent diverse stages of urban development and functional concentrations.

As a frontier city for China’s open cooperation with the Association of Southeast Asian Nations (ASEAN), Nanning’s development serves as a representative microcosm of the rapid urbanization processes occurring throughout southern China [32]. The central urban district, which hosts the majority of the city’s population and economic activities, has evolved into an urban environment characterized by high functional complexity and significant landscape heterogeneity. A significant challenge arises from this complexity: land parcels with distinct functions often exhibit similar physical characteristics in remote sensing imagery [33]. This phenomenon poses a significant challenge to both the accuracy of land use classification and the transparency of model decision-making [34,35].

2.2. Data Source and Preprocessing

2.2.1. World Imagery

Esri World Imagery, a global service provided through the ArcGIS 10.8 platform, integrates multi-source satellite and aerial imagery with resolutions up to 0.3 m and offers multiple historical versions. This imagery service can be directly loaded and utilized within the ArcGIS environment. For this study, the 2022 Esri World Imagery was employed to support the visual interpretation of land parcel functions.

2.2.2. OpenStreetMap (OSM) Network Data

As a collaborative project providing free, editable global maps, OSM is a well-established and effective data source for capturing urban functional spatial patterns [36,37]. The 2022 dataset for the central urban area of Nanning was utilized in this study. First, the data was manually verified to eliminate redundant information. Subsequently, the cleaned roads were classified into a hierarchical system (Table 1), which was developed by integrating categories from the Ministry of Housing and Urban-Rural Development (MoHURD) and methodologies from previous research [38,39], and adapting them to reflect the specific local context of the study area. This classified hierarchical network is crucial as it serves as the primary structural framework for delineating urban parcels, which are the fundamental units of our analysis. The detailed methodology for how these road levels are utilized in the parcel generation process is described in Section 2.3.1.

To obtain an accurate river network, which serves as a crucial natural barrier for parcel delineation, a hybrid approach was adopted. The river network data from OSM was used as a baseline. Then, to ensure the highest level of positional accuracy and alignment with the current landscape, a manual visual correction of the OSM data was performed against the 2022 Esri World Imagery.

2.2.3. Points of Interest (POI) Data

POI effectively captures the spatial distribution of urban functions, offering crucial semantic information for land use classification [40,41]. For this study, 2022 POI data for the central urban area of Nanning were acquired from the Amap Web Service Application Programming Interface (API) using Python scripts.

Drawing upon the urban land use classification standards from the Ministry of Housing and Urban-Rural Development (MoHURD) and methodologies from previous research [9,42,43], a simplified, four-category functional area system was developed. Within this framework, commercial and industrial lands, which share similar economic functions, were consolidated into a single class.

This consolidation is methodologically justified by two key factors specific to this study. First, our focus on the central urban area means that the observed industrial parcels are predominantly light industry, logistics centers, and wholesale markets, which are functionally and physically distinct from the heavy industry typically located on the urban periphery. From a remote sensing perspective, these central urban industrial areas share similar physical signatures with old city commercial areas, such as low, metal-roofed buildings. Second, this simplification allows the model to learn a more robust, unified signature for areas of intense economic activity based on their shared physical attributes captured by GSE, which is crucial for the primary goal of this study: evaluating the interpretability of this new data source.

The final system comprises four categories: institution, open space, business, and residence (Table 2). The detailed classification of POI types into these categories is provided in Supplementary Table S1.

2.2.4. Google Satellite Embedding (GSE) Data

GSE data served as the primary feature source for our machine learning models. It is essential to clarify that GSE is not a classification product but a high-dimensional geospatial embedding vector instead of traditional spectral bands. It is generated by Google’s AlphaEarth Foundations (AEF) [30], an embedding field model based on a Transformer architecture developed by Google DeepMind.

The AEF model is trained using self-supervised learning, specifically employing contrastive learning techniques. This allows it to learn universal geospatial representations from massive, unlabeled Earth observation data. The process learns and encodes a year’s worth of multi-modal, time-series satellite imagery into a highly condensed semantic space. The result is a 64-dimensional (A00–A63) vector for each 10 m pixel that represents deep semantic features rather than direct physical signals. This encoding process aims to maximize discriminative information useful for downstream tasks while filtering out noise and redundancy [31]. This vector comprehensively reflects the spectral, structural, and contextual properties of land cover, capturing deep patterns that transcend singular physical responses and thereby overcoming the shallow feature bottleneck of traditional remote sensing.

A key strength of the AEF model is its ability to process heterogeneous data. Its architecture includes source-specific encoders for different data types (e.g., optical, radar). Through joint training on a global scale, the model learns to intelligently fuse information, meaning each dimension of a GSE vector can encapsulate complex patterns from multiple sensors. While trained globally, these embeddings provide fine-grained features applicable at local scales, making them suitable for our Nanning study area.

The 2022 annual, 10 m resolution product was acquired from the GOOGLE/SATELLITE_EMBEDDING/V1/ANNUAL dataset on the Google Earth Engine (GEE) platform. These 64 dimensions constitute an inseparable feature vector and were therefore all utilized for model training. As a ready-to-use dataset, it has been internally processed to handle cloud cover and data gaps. Furthermore, its vectors are of unit length, obviating the need for normalization. This significantly simplifies the data preprocessing workflow and makes the dataset inherently compatible with tree-based classifiers such as XGBoost and RF.

2.2.5. Sentinel-2 Multispectral Imagery

Sentinel-2 multispectral imagery was selected for its high spatial resolution (10 m) and rich spectral information. Its bands include the essential red, near-infrared (NIR), and short-wave infrared (SWIR) wavelengths required for calculating classic remote sensing indices such as the Normalized Difference Vegetation Index (NDVI) and the Normalized Difference Built-up Index (NDBI) [44,45].

Within the GEE platform, the COPERNICUS/S2_SR_HARMONIZED dataset was accessed. An annual mean composite image for the central urban area of Nanning was synthesized to create a data source for subsequent spectral and texture feature extraction. This composite was generated using all available 2022 images with a cloud cover of less than 40%.

2.3. Methods

This study proposes a synergistic analytical framework that integrates deep learning features with explainable machine learning (Figure 2). This framework consists of three core steps.

First, a high-precision sample library was constructed by fusing multi-source geospatial data with expert visual interpretation within ArcMap 10.8.
Second, two parallel feature sets were formulated on the GEE platform: the high-dimensional deep embedding features for model training, and a set of physical reference features for attribution.
Finally, the modeling and analysis pipeline was implemented in Python 3.9. We trained and evaluated machine learning models using the scikit-learn (for RF) and xgboost libraries, which were then deconstructed using the shap library to analyze their internal decision-making mechanisms. This involved quantifying the contributions of key abstract features and systematically correlating them with the physical reference features using the pandas and scipy libraries.

2.3.1. Semantically Constrained Urban Parcel Generation

OSM-guided physical parcel delineation. The central urban area of Nanning was first segmented into morphologically intact and non-overlapping initial parcels adopting an established method for automated parcel characterization [37]. This topological segmentation approach aligns with the framework of Essential Urban Land Use Categories (EULUC), where road networks serve as the fundamental skeleton of urban functional organization [9,11]. The process was executed within the ArcGIS platform by applying multi-level buffer analysis to the pre-processed 2022 OSM network and then overlaying natural barriers such as rivers.

POI-based semantic labeling and creation of a candidate sample pool. Next, the initial physical parcels were spatially associated with four major categories of POI data. The functional composition of each parcel was quantified by calculating the density of each of the four major POI categories. A preliminary label was assigned only to parcels where a single function was clearly dominant, defined as constituting over 70% of the total POI count within that parcel. This quantitative filtering strategy is consistent with recent frequency-density methodologies, which advocate for high thresholds to mitigate the inherent noise in social sensing data and ensure the functional purity of training samples [39,46]. This process was designed to focus the subsequent manual verification effort precisely on those parcels where POI signals were strongest, enabling efficient and targeted expert correction.

Visual Interpretation with Multi-Source Information and Final Sample Verification. To ensure definitive label accuracy and mitigate any remaining POI-induced semantic bias, a rigorous, multi-stage verification process was implemented on the candidate sample pool. First, a multi-source interpretation protocol was established. For each candidate parcel, interpreters did not rely solely on the high-resolution imagery but synergistically used it with the specific POI name labels within that parcel. This synthesis directly addresses the challenge of spectral confusion in complex urban environments, where functionally distinct zones often exhibit nearly identical physical morphologies [47]. For example, while both business and institution parcels might appear as high-rise buildings on imagery, reviewing the POI names (e.g., “Nanning No. 2 High School,” “First Affiliated Hospital of Guangxi Medical University”) allowed for unambiguous classification as institution, in strict accordance with the Standard for Urban Land Use Classification and Planning of Construction Land [42]. This approach effectively leveraged POI data as a semantic clue rather than a mere counter, directly addressing the risk of misclassification based on visual morphology alone. By integrating semantic labels with physical textures, this protocol adheres to the multi-view fusion paradigm recommended for fine-grained urban classification [33]. Second, to guarantee label consistency and quality in the absence of multiple annotators, a systematic iterative review was employed. The expert first conducted an initial round of labeling for the entire candidate set. After a deliberate time interval to minimize cognitive bias, a second, independent review round was performed. During this second pass, the expert re-evaluated each label against the established protocol, paying special attention to potentially ambiguous cases. Any parcels where the initial label seemed questionable upon re-inspection were subjected to a final, more intensive investigation before a definitive label was assigned. Parcels that remained ambiguous even after this multi-step scrutiny were discarded from the final sample set.

Ultimately, a final dataset of 800 high-precision samples—200 for each of the four functional categories—was curated that were unambiguously labeled (Figure 3). The aim was for spatial representativeness rather than absolute spatial uniformity in designing the sampling strategy. Urban functional areas are inherently clustered, and a strictly uniform sampling might fail to capture the diversity of these real-world patterns. Therefore, this approach focused on selecting samples for each class from multiple, geographically distinct locations across the study area. For example, samples for the business class were drawn from the old city center, the new CBD, and several secondary commercial hubs. This strategy ensures the model captures the high intra-class variability typical of heterogeneous urban landscapes [39]. This carefully constructed dataset serves as the foundational ground truth for all subsequent analyses.

2.3.2. Feature Engineering

Generation of high-dimensional input features. The core input features for the models were generated by aggregating the pixel-level GSE data. Specifically, the mean value of each of the 64 GSE feature bands was calculated on the GEE platform for each parcel. Aggregating these high-resolution raster features at the parcel level is not a loss of detail; rather, it creates a robust, composite feature signature that holistically represents the typical mixture of surfaces (buildings, vegetation, pavement) associated with a specific function. This process resulted in a single 64-dimensional feature vector per parcel, serving as a comprehensive, high-level representation of its unique deep spectral-spatial patterns.

Extraction of physical reference features. To facilitate model interpretation, a white-box feature set composed of traditional remote sensing characteristics was extracted from the annual Sentinel-2 composite imagery (Table 3). This set is designed to serve as a bridge, connecting the abstract deep learning features to established geographical knowledge. Specifically, it comprises two components:

First, a series of spectral features were calculated to quantitatively assess the vegetation, built-up, and water status of the land surface [48,49].
Second, texture features, derived from the Gray-Level Co-occurrence Matrix (GLCM), were computed to describe the spatial structure and heterogeneity of the land surface, complementing the spectral signatures which are often insufficient in heterogeneous urban environments [50]. However, utilizing the full set of GLCM features often leads to high data redundancy and computational inefficiency due to strong inter-correlations [51]. Therefore, drawing upon established feature selection strategies for remote sensing classification, a specific subset of metrics was chosen to form a representative and minimally redundant set [39,52]. These metrics targeted three complementary textural dimensions critical for distinguishing urban functions: (1) Spatial Order/Disorder (ASM, Entropy) to characterize the complexity of building arrangements; (2) Local Contrast (Contrast, Dissimilarity) to capture the edge intensity typical of built-up areas; and (3) Homogeneity (IDM) to identify uniform surfaces such as open spaces and water bodies.

2.3.3. Machine Learning Modeling and Explainability Analysis

Model training and performance evaluation. Two machine learning algorithms, RF and XGBoost, were selected for comparison. These algorithms represent the two dominant paradigms in ensemble learning—Bagging and Boosting, respectively—and are widely utilized for their robustness in modeling complex, non-linear relationships [53,54].

To mitigate the potential risk of overfitting given the high-dimensional feature space (64 features) relative to the sample size (800 parcels), several strategies were implemented to ensure model robustness. First, a 5-fold cross-validation strategy was used for both model training and evaluation, providing a more reliable estimate of generalization performance [55]. Second, hyperparameter tuning was performed using a grid search within the cross-validation loop to identify the optimal parameters that control model complexity (e.g., tree depth, learning rate). Third, an early stopping mechanism was employed for the XGBoost model specifically. This technique halts the training process if performance on a validation set does not improve for a specified number of iterations, effectively preventing the model from fitting to noise in the training data. Model performance was assessed using Overall Accuracy (OA), the Kappa Coefficient, Producer’s Accuracy (PA), User’s Accuracy (UA), and the Macro-F1.

Quantifying feature contributions to model decisions. The TreeSHAP algorithm, an efficient variant optimized for tree-based models, was utilized to quantify feature contributions at both global and local levels.

At the global level, the most influential features were identified by calculating the mean absolute SHAP value for each feature across all samples. To ensure stable results, these importance values were averaged across the five folds of the cross-validation, and the final ranking is presented as a percentage contribution.
At the local level, stratified SHAP beeswarm plots were employed for visual analysis. These plots intuitively display how a single feature contributes to each individual sample’s prediction, while simultaneously using color to reveal the patterns between the feature’s original value (high or low) and the direction and magnitude of its contribution [56,57].

Attributing Physical Meaning to Deep Features. A stratified Spearman correlation analysis was performed to investigate the intrinsic meaning of the abstract deep learning features. This analysis was conducted between the key features and our white-box set of physical reference features. The stratified approach—calculating correlations separately within each land use category—was designed to uncover how the association patterns between a single deep feature and the physical metrics vary across different geographical contexts.

3. Results

3.1. Performance Evaluation of Embedding-Based Models

3.1.1. Quantitative Evaluation

The performance of the XGBoost and RF models, trained on the high-dimensional GSE features, was evaluated using a 5-fold cross-validation strategy. The evaluation revealed that the XGBoost model demonstrated superior overall performance compared to the RF model (Table 4). The OA of XGBoost reached 85.00% ± 2.24%, marking a 3.13 percentage point improvement over RF’s accuracy of 81.87% ± 1.72%. Furthermore, its superiority in maintaining a balance between precision and recall was evidenced by a higher Macro-F1 of 84.81% ± 2.32%, compared to RF’s 81.72% ± 1.73%. Similarly, XGBoost achieved a higher Kappa Coefficient of 0.8000 ± 0.0298, compared to RF’s 0.7583 ± 0.0230, indicating a stronger agreement between its classification results and the ground truth, as well as superior model stability.

Performance disparities were particularly pronounced for specific land use categories. For institutions, a category characterized by complex physical morphology, XGBoost achieved a PA of 80.02% and a UA of 74.50%. These figures were substantially higher than those of RF, which were 72.95% and 69.00%, respectively. For the often-confused business and residence categories, XGBoost also demonstrated an accuracy improvement of 2 to 4 percentage points. Both models performed exceptionally well in classifying open space, a category with distinct features, consistently achieving accuracies above 96%.

3.1.2. Qualitative Evaluation

The qualitative assessment visually corroborated and provided context for the quantitative findings by visualizing the classification results (Figure 4).

First, the areas of high model consensus aligned perfectly with the high-accuracy land use categories. Both models demonstrated precise identification of categories with relatively distinct physical characteristics, such as open space, business and residence (Figure 4c–e). This observation is consistent with the high OA and PA scores these categories achieved in the quantitative evaluation.

Furthermore, the visualizations offered a clear explanation for the performance disparity observed for the institution category, where the quantitative difference was most significant. This category is challenging because its physical features—such as building densities and vegetation cover that often fall somewhere between those of business and residence—are inherently ambiguous. This ambiguity led to varying degrees of confusion and misclassification by both models (Figure 4f,g).

3.2. Identification and Physical Interpretation of Key Features

An analysis of feature importance for both models, conducted using the SHAP, revealed a strong consensus in identifying key discriminative features (Figure 5). A significant overlap was observed, with 8 of the top 10 most important features being common to both models: A36, A41, A51, A05, A52, A14, A18, and A40. This high degree of agreement indicated that these features are the core drivers for distinguishing land parcel types in the study area. More importantly, it suggested they possess algorithm-agnostic discriminative value, making them robust indicators regardless of the specific model used.

This consensus was further corroborated when examining the contributions of individual features to specific classes. For instance, in both models, A36 and A41 are primary contributors to the identification of open space. This suggested a strong correlation with surface cover information, such as vegetation or water bodies. Likewise, A51 consistently emerged as a key contributor for identifying the business across both models, implying that it likely captures spectral-spatial information associated with high-density built-up environments.

A stratified Spearman correlation analysis between the key features and a set of traditional spectral features and GLCM texture features revealed complex patterns of association (Figure 6).

Features associated with the built-up environment.

Across the institution, residence, and open space classes, A51 functions as a universal indicator of built-up intensity. It exhibits significant positive correlations with both NDBI and NDWI (r > 0.5) and strong negative correlations with vegetation indices like NDVI (r < −0.5). However, a key anomaly revealed its deeper properties. Within the Business class, the correlation between A51 and NDBI weakened dramatically (r = 0.18). In contrast, its correlation with NDWI became highly positive (r = 0.65), suggesting a functional shift from a simple spectral response to capturing the more complex structure of the urban canopy.

A40 behaved differently. Its positive correlations with NDBI (r = 0.56) and the DISS texture metric (r = 0.43) were most pronounced within the residence class. Feature A18, meanwhile, demonstrated a more multifaceted behavior, correlating positively with NDBI (r = 0.42) and NDWI (r = 0.28) in business, but with the DISS (r = 0.28) and Contrast (r = 0.30) texture metrics in Residence.

Features associated with vegetation.

The correlation analysis revealed that A36 exhibited a uniform yet critical contrast pattern. On one hand, it showed strong positive correlations with various vegetation indices in built-up areas (r = 0.42 to 0.69). On the other hand, it demonstrated strong negative correlations with NDBI and NDWI, which represent built-up and shadow properties, respectively (r = −0.43 to −0.73). Critically, both the positive and negative correlations consistently and substantially weakened specifically within the open space class.

A41 and A14 not only correlated positively with vegetation indices but also negatively with texture metrics such as DISS and Contrast in open space. This suggested they are sensitive to the spatial configuration of vegetation rather than just its presence. A05 displayed a highly context-dependent association pattern. Its correlation with vegetation indices peaked within the residence class, while it also showed a strong negative correlation with NDWI in both institution and residence.

3.3. SHAP-Based Insights into Classification Decision

The SHAP beeswarm plots provide an intuitive visualization of the model’s decision-making logic (Figure 7). They reveal how the model leverages the identified key features, with strategies that display both consensus and divergence across different classes.

A strong consensus in logic emerged between the two models for spectrally distinct classes. Specifically, both models identified business by leveraging high values of feature A51, a strong built-up signal, as the core positive driver. This single feature constituted 16.57% of XGBoost’s and 8.41% of RF’s decision weight. Conversely, high values in vegetation features like A52 and A36 acted as powerful negative predictors. The reverse held true for open space, where high values in vegetation features (e.g., A36, A41, and A05) provided the decisive positive push for classification.

For spectrally ambiguous classes, a significant divergence in strategy emerged between the two models. In identifying institutions, the RF model exhibits a diffuse decision pattern with no single dominant feature, relying primarily on negative contributions from high values in various vegetation and built-up features. Its logic for residence was similarly indistinct, mainly driven by negative contributions from multiple core features. In stark contrast, XGBoost developed a more sophisticated strategy. It pinpointed A39 as the most crucial discriminator for residential areas; high values of this feature contribute strong negative SHAP values, effectively forming a powerful exclusion rule. Simultaneously, the model leveraged high values of features like A52 and A28 as positive contributors to construct a more nuanced profile of the residence.

4. Discussion

A core finding of this study, as presented in the results, is the remarkable consensus between the Random Forest and XGBoost models in identifying the most important features. This convergence, where two algorithmically distinct paradigms—Bagging and Boosting—independently prioritize the same set of GSEs (e.g., A36, A51, A41), is not a trivial observation. According to the principles of explainable machine learning, such cross-model agreement serves as a powerful indicator of feature robustness, suggesting that the identified predictors capture genuine underlying phenomena rather than model-specific artifacts or noise [58]. It serves as a powerful, algorithm-agnostic validation of the Google Satellite Embeddings themselves. This high degree of agreement strongly suggests that the discriminative power resides within the intrinsic structure of the GSE data, rather than being an artifact of a specific modeling choice. It demonstrates that GSE features successfully encode stable and meaningful geospatial patterns that are universally recognizable across different learning architectures, a characteristic often cited as a hallmark of high-quality representation learning [59]. Consequently, this algorithm-agnostic validation provides the impetus for a deeper inquiry into the nature of these embeddings, moving from simply identifying which features are important to understanding what they represent geographically and how they inform the models’ decision-making logic.

4.1. Decoding the Geospatial Semantics of Key Features

4.1.1. Features for Built-Up Area Characterization

As direct literature on the geographic meaning of these specific features is not yet available, the interpretations were grounded in established remote sensing principles. The analysis revealed a clear functional differentiation among the most important features, which were then validated against classic physical indices. A51, A40, and A18 were identified as crucial features for characterizing built-up areas. Among these, A51 functions as a comprehensive indicator that comprehensively quantifies the intensity of the built environment. In contrast, A40 and A18 are more specialized, exhibiting strong class dependency in identifying distinct urban patterns.

It is established that NDBI effectively indicates impervious surfaces, NDWI captures structural information via building shadows [49], and texture metrics quantify the spatial heterogeneity of built-up areas [60,61,62]. The correlation analysis revealed that A51 is strongly correlated with indices from all three categories. It confirms that A51 is not a single-dimensional metric but a composite feature simultaneously encoding spectral, structural, and textural information, thereby providing a holistic quantification of built-up intensity. A critical insight emerges from the behavior of A51 in complex commercial districts: while its correlation with NDBI weakened, its strong association with NDWI persisted. This indicates that A51 is not merely a proxy for NDBI; rather, it is a versatile feature whose representational focus can shift from surface spectral properties to urban canopy structure. Consequently, A51 can be conceptualized as an adaptive composite indicator capable of adaptively adjusting its focus. The inherent flexibility enables a comprehensive and dynamic quantification of built-up intensity, explaining why XGBoost can extract rich contextual information from this single feature to achieve high accuracy. This interpretation is corroborated by recent research applying GSE to urban air quality; A51 was identified as a dominant predictor for SO2 concentrations, explicitly linking this feature to the capture of industrial land-use characteristics [63]. The view that A51 effectively encodes high-intensity urban morphologies is supported by this external validation.

In contrast to the broad applicability of A51, A40 and A18 exhibit pronounced class dependency. A40 precisely characterizes the spatial patterns unique to residential areas through its positive correlations with both NDBI and texture metrics. Meanwhile, A18 plays a more multifaceted role; it is sensitive to both the building-shadow complexes in business districts and the spatial textures of residential areas. This suggests that A18 is instrumental in identifying transitional built-up patterns between these two distinct types.

4.1.2. Features for Vegetation Characterization

A36, A41, A14, A05, and A52 were identified as key indicators for vegetation characterization. Clear functional differentiation exists within this group: A36 serves as a fundamental metric for overall surface greenness, while the remaining features specialize in identifying specific spatial patterns of green spaces.

Vegetation indices such as NDVI, EVI, and SAVI are the established standards for quantifying surface vegetation cover [48,64], whereas NDBI and NDWI characterize impervious surfaces and building shadows, respectively [49]. Correlation analysis identifies A36 as a fundamental feature exhibiting strong positive correlations with vegetation indices and simultaneous strong negative correlations with both NDBI and NDWI. These relationships collectively validate its role as a universal quantitative indicator of surface greenness.

The fundamental nature of A36 is particularly illuminated by its behavior within the open space class, where its correlations with all other indices uniformly weaken. This attenuation is observable in both its positive correlations with vegetation indices and its negative correlations with built-up and shadow indices. This phenomenon aligns with the V-I-S model framework [65], attributing the weakened correlation to the compositional nature of open space. Unlike the homogeneous gradient typical of residential areas, open space represents an aggregation of spectrally extreme endmembers—such as dense vegetation and bare soil [66,67]. Specifically, within residential areas, which consist of a mixed and continuous gradient of endmembers [68,69], the value of A36 moves in a predictable, linear fashion with physical indices. For instance, as a parcel’s green coverage increases, its A36 value rises in tandem with NDVI. In this context, A36 faithfully tracks the gradual shift in the land cover mix. The open space category, in contrast, is not a smooth gradient but rather an aggregation of spectrally distinct parcels from the extremes of the V-I-S feature space, where a single park, for instance, can be a composite of vegetation, bare ground, and plazas [70]. Within this class, the tight, linear relationship between A36 and physical indices like NDVI dissolves. When these spectrally distinct parcels are analyzed as a single group, their data points do not form a continuous trend but rather cluster at opposite ends of the feature space. This high intra-class heterogeneity, characterized by a lack of intermediate values, violates the assumptions of linearity inherent in standard correlation metrics, leading to a statistical dilution of the correlation coefficient [71]. Consequently, this observation validates A36 as a fundamental metric for overall surface greenness, distinguishing it from indicators that merely serve as proxies for semantic land use categories.

Specific features capture distinct vegetation configurations. A41 is sensitive to dense, homogeneous vegetation, effectively combining a strong vegetation signal with high spatial uniformity. In contrast, A14 responds primarily to the texture of low, uniform vegetation such as grasslands, thereby directly encoding low spatial heterogeneity. The role of A05 is particularly sophisticated; it encodes the physical signature of high greenness with minimal shadow—a characteristic corroborated by its correlations with vegetation indices and NDWI within residential areas. This capability allows A05 to precisely identify patterns of open green spaces interspersed among buildings. Finally, A52 functions as a subtle indicator of background vegetation, providing the supplementary information required to differentiate parcels that are spectrally similar yet distinct in their fine-grained vegetation details.

4.2. Deconstructing the Model’s Hierarchical Decision Logic

4.2.1. Convergent Strategies for Distinct Classes

For physically distinct classes such as business and open space, the decision logic of both models largely converged, grounded explicitly in the previously decoded physical attributes.

Both models utilized feature A51, which represents high built-up intensity, as the core positive predictor for identifying business. Concurrently, high values in vegetation features, such as A36 and A52, served as strong negative predictors, establishing a clear dual-confirmation mechanism. Conversely, the identification of open space relied heavily on core vegetation features, particularly A36 and A41, with decision logic centered on the direct quantification of vegetation abundance. In particular, XGBoost exhibited exceptional focus, with the combined contribution of these key vegetation features alone reaching 78.54% for this class. This underscores the model’s capacity to isolate core discriminative information, thereby constructing an efficient and reliable decision pathway.

4.2.2. Divergent Strategies for Ambiguous Classes

The models’ decision strategies diverged significantly when distinguishing between spectrally similar classes prone to confusion, specifically residence and institution.

The approach of the RF model appeared comparatively passive and inefficient. This behavior can be interpreted through the lens of bagging algorithms operating on highly collinear data. Because Random Forest builds each tree independently on a bootstrap sample of data and features, correlated predictors can dilute importance, leading to a democracy of many weak features rather than a clear directive from a few strong ones [72,73]. This theoretical trait manifested clearly in our results. Its identification of institution relied primarily on negative contributions from several core features (A36, A41, A05, A51), representing a strategy of elimination lacking strong positive predictors. Similarly, its logic for residence was indistinct, characterized by a diffuse pattern with no single dominant feature. This reliance on a combination of weak, competing signals resulted in ambiguous decision boundaries.

In stark contrast, XGBoost developed a more sophisticated and proactive non-linear strategy, which aligns with the sequential, error-correcting nature of boosting algorithms [74]. By building trees sequentially to correct the errors of the previous ones, XGBoost can learn complex feature interactions and dependencies. For institution, it learned a refined comparative exclusion rule, using feature A40 (residential texture) to effectively exclude residence. However, the model’s core breakthrough lay in its classification of residence. It discovered a specialist feature—A39—which exhibited powerful local discriminative ability despite its low global importance. This discovery provides a powerful illustration of the limitations of global feature importance metrics and highlights the explanatory power of local, instance-based methods like SHAP [75]. While global rankings might dismiss A39 as irrelevant, its targeted, high-impact role in specific decision paths is a crucial mechanism that only local explanations can uncover. Leveraging this, the model learned a potent exclusion rule: when a parcel exhibited the unique pattern of A39, it generated a strong negative SHAP value, leading to decisive exclusion from the residential class.

In summary, XGBoost’s performance advantage stems from its ability to discover highly specific features for complex classification tasks and to construct sophisticated, targeted decision strategies based on them. This observation aligns with comprehensive comparative studies in remote sensing, which indicate that gradient boosting frameworks frequently outperform bagging ensembles when navigating complex, non-linear feature spaces [76]. This strategic evolution from RF’s broad, diffuse logic to XGBoost’s precise, hierarchical targeting provides a deep mechanistic explanation for its superiority in handling high-dimensional, collinear remote sensing data.

4.3. Deconstructing the Mechanisms Behind XGBoost’s Superior Performance

Feature selection in RF models, which typically relies on metrics such as information gain, is susceptible to collinearity—a vulnerability that often dilutes the importance of correlated features [77]. In contrast, XGBoost leverages regularization terms and a gradient boosting mechanism to learn effectively from groups of highly correlated features [54,76,78]. This capability was evident in the superior performance of XGBoost on GSE data, particularly in distinguishing between complex, spectrally similar classes such as institution and residence.

A spearman correlation analysis on the 64 input features was conducted to investigate whether the algorithm’s advantage stems from its capacity to handle high-dimensional multicollinearity. Analysis revealed that 23.9% of all feature pairs exhibited moderate-to-high correlation (absolute coefficient ≥ 0.5) (Figure S1), confirming significant multicollinearity within the dataset. Such multicollinearity is likely an intrinsic property of deep learning embeddings, where features are learned to be rich yet redundant, capturing overlapping facets of identical underlying geographic patterns. This decision was critical because the 64 features (A00–A63) constitute a dense and indivisible holistic representation. These features are highly coupled, meaning individual dimensions are not independently meaningful. The full feature set was deliberately retained without applying dimensionality reduction techniques such as Principal Component Analysis (PCA), as doing so would disrupt their collective semantic structure. This methodological choice aligns with emerging practices in the application of AEF embeddings, which advocate for retaining the full feature set to preserve the integrity of the learned representation and allow the model to handle feature selection internally [63].

Given the characteristics of the dataset, the performance gap between RF and XGBoost can be attributed to their inherently different learning mechanisms. The Bagging-based parallel architecture of RF tends to disperse its discriminative power when encountering highly correlated features (e.g., A36 and A41). In such cases, the model distributes importance across multiple redundant predictors, which weakens the clarity of its decision boundaries. As a result, RF’s internal logic especially for ambiguous classes such as residence becomes obscured, leading the model to rely on combinations of weak signals for passive exclusion rather than decisive classification. This behavior is consistent with the SHAP interpretability patterns discussed in Section 4.2.

Conversely, XGBoost’s sequential boosting mechanism transforms the challenge of collinearity into an advantage through hierarchical learning [45]. The process is iterative: the model first employs general-purpose features, such as A51 (representing built-up intensity), for a coarse-grained classification of simpler samples. It subsequently focuses on the prediction errors, or residuals, from this initial stage. This enables the identification and prioritization of specialist features, such as A39, which are not globally dominant but are crucial for correcting specific misclassifications. Through this progression from broad classification to precise correction, XGBoost constructs a significantly more sophisticated non-linear strategy than RF [76,78]. This hierarchical learning mechanism provides the fundamental algorithmic basis for its superior performance and precision.

4.4. Major Contributions

First, it establishes a novel synergistic framework that forges a practical third way for trustworthy land use classification. As highlighted in the Introduction, current urban analytics faces a fundamental dilemma: traditional methods suffer from a shallow feature bottleneck limiting their capacity to capture complex urban morphology [20,21]. Conversely, while end-to-end deep learning models offer superior feature learning capabilities, they typically demand massive annotated datasets and substantial computational resources to train from scratch [79]. Moreover, their inherent opacity creates a significant trust deficit that explicitly hinders their adoption in high-stakes policy-making, where explainability is often a legal or ethical requirement [80]. Our framework navigates this trade-off by integrating ready-to-use deep learning embeddings (GSE) with XAI. This strategy aligns with the emerging paradigm of Deep Feature Extraction in remote sensing [63], which leverages the generalization power of large-scale pre-trained foundation models without the computational cost and opacity of full network training. The framework demonstrates that competitive performance (OA = 85.00%) can be achieved using a single, globally consistent embedding source, effectively solving the dual challenges of feature limitation and model opacity. Crucially, the remarkable consensus between the RF and XGBoost models in identifying key discriminative features serves as powerful, algorithm-agnostic validation that GSE data effectively encodes intrinsic patterns for urban land use.

Second, it pioneers a systematic methodology for bridging the gap between machine logic and domain logic in urban AI. A critical epistemic limitation of current XAI applications, as noted earlier, is that their explanations often remain confined to identifying what data is prioritized without articulating why in a geographically meaningful way. Most existing studies applying explainability techniques in urban science have focused primarily on producing feature importance rankings [81,82]. The work advances this by decoding the tangible geographic semantics of abstract features. For example, our study moves beyond prior interpretations that linked A51 to broad categories like industrial land-use [63], revealing it instead as a more nuanced adaptive indicator of built-up intensity. Furthermore, it validates A36 as a biophysical quantifier of surface greenness by explaining its seemingly paradoxical correlation loss in open space. This translation of abstract signals into a planner-centric narrative directly addresses the interpretive divide, fostering actionable insights beyond what traditional feature ranking can offer.

Third, it delivers deep mechanistic insights into advanced model performance, moving beyond black-box accuracy metrics. Comprehensive remote sensing studies consistently show that gradient boosting frameworks outperform bagging ensembles in complex classification tasks [76]. Our study moves beyond this established finding. We elucidate the underlying algorithmic mechanisms for XGBoost’s success, specifically within the context of high-dimensional geospatial embeddings. This analysis validates theoretical expectations regarding gradient boosting’s handling of multicollinearity [83] but provides novel empirical evidence in the domain of deep learning features. We reveal XGBoost’s superior ability to construct a hierarchical decision strategy—using generalist features for broad classification and then leveraging specialist features (e.g., A39) for fine-grained corrections. This provides a clear mechanistic explanation for its robustness, offering valuable lessons for model selection in the era of geospatial foundation models.

4.5. Limitations

While this study offers valuable contributions, several limitations should be acknowledged as they present avenues for future research.

First, a fundamental limitation arises from the trade-off between semantic relevance and spatial detail. The parcel-based approach, which defines the parcel as the de facto Minimum Mapping Unit (MMU), is a deliberate choice to align the analysis with the scale of urban planning. However, this aggregation inevitably results in a loss of intra-parcel heterogeneity. By assigning a single functional label to each parcel, the model cannot resolve fine-grained, mixed-use realities, such as ground-floor retail within a residential building. This limitation is intertwined with the simplified classification scheme and controlled sample size. for instance, merging commercial and industrial land overlooks nuanced functional differences. Future work should aim to develop hybrid models that not only adopt more granular classification hierarchies but also quantify the degree of functional mixture within each parcel, rather than assigning a single hard label.
Second, a key limitation relates to the sample size and its implications for model complexity and overfitting. The dataset, consisting of 800 high-quality parcels, is modest relative to the 64-dimensional GSE feature space. This imbalance introduces the risk of the Hughes phenomenon, where classification accuracy can degrade as feature dimensionality increases relative to the number of training samples [84]. While rigorous cross-validation, hyperparameter tuning, and early stopping proactively mitigated this limitation, the limited sample size necessarily constrained the complexity of the classification scheme that could be reliably implemented. This constraint directly led to decisions such as merging commercial and industrial land into a single class. Future research with access to larger, more comprehensive ground-truth datasets would be crucial to validate the findings and explore more granular classification systems without the heightened risk of overfitting.
Third, the practical applicability and transferability of the framework face significant barriers. The methodology’s performance is highly dependent on the availability of high-quality auxiliary data, particularly a reliable road network (e.g., from OSM) for accurate parcel delineation, which may be unavailable in many urban contexts. Relatedly, the reliance on GSE data inherently lacks direct socioeconomic information. While effective for capturing physical patterns, remote sensing data alone often struggles to distinguish morphologically similar but functionally distinct zones, a gap that social sensing data is better positioned to fill [10]. Future research could enhance transferability by exploring fusion with more universally available social sensing data. Furthermore, the current multi-step workflow requires considerable technical expertise, posing a challenge for widespread adoption. A critical direction for future work is therefore to encapsulate this entire process into an automated, user-friendly tool or cloud-based platform.
Finally, regarding interpretability, several nuances must be acknowledged. It is essential to remember that SHAP analysis elucidates model-specific correlations, not real-world causal mechanisms [75]. Furthermore, the raw outputs of XAI methods like SHAP, while scientifically robust, can be cognitively overwhelming due to the high volume of data they present. They do not automatically generate a holistic image for non-specialists. This aligns with recent critiques in Human–Computer Interaction, which suggest that algorithmic explanations often require a human-in-the-loop to bridge the gap between technical feature importance and domain-specific semantic understanding [85]. Therefore, a critical step, demonstrated in this study, is the researcher’s role in synthesizing these complex outputs into a coherent narrative. Any future user-friendly tool based on this framework must not only provide explainability charts but also include features that guide the user through this interpretive process.

5. Conclusions

This study demonstrates the potential of combining machine learning with GSE data for land use classification in the central urban area of Nanning. By leveraging the 64-dimensional deep features from GSE, a practical workflow was demonstrated that offers an efficient alternative to training custom deep learning models from scratch or relying on traditional hand-crafted feature engineering. The XGBoost model significantly outperformed the RF model across all metrics. Its OA reached 85.00% ± 2.24%, and the Kappa increased from 0.7583 ± 0.0230 to 0.8000 ± 0.0298. Notably, a remarkable consensus emerged between the RF and XGBoost models in identifying key discriminative features, providing a powerful, algorithm-agnostic validation of the intrinsic information encoded within GSE data.

A key contribution of this research lies not merely in identifying important features, but in proposing a systematic method to help decode their tangible geographic semantics. Through a SHAP-based analysis, the study seeks to move beyond some limits of current XAI applications by translating abstract deep learning features into a more human-readable, planner-centric narrative. The ability to connect these abstract features to tangible geographic concepts like built-up intensity is a valuable step toward making the model’s decision-making process more transparent and understandable. Furthermore, by deconstructing the model’s decision logic, the study aimed to provide clearer mechanistic insights into algorithm performance. The analysis suggests how and why XGBoost may outperform its counterparts when dealing with highly collinear data, by highlighting its ability to identify specialist features and construct sophisticated rules for ambiguous classes like residence.

The study not only validates the immense potential of GSE as a ready-to-use source of deep semantic features but also provides a scalable solution for the trustworthy application of AI in urban land use. Looking ahead, future work should proceed along two key directions. First, to enhance practical applicability, the entire workflow could be encapsulated into an automated, user-friendly tool. This would empower urban planners and researchers to utilize the framework’s diagnostic capabilities without needing to navigate the underlying technical complexity. Second, to address the critical question of generalizability, the spatial transferability of our framework remains to be validated. As this study was conducted in a single city, future research should involve rigorous inter-city cross-validation to test the robustness of the model and its semantic interpretations across diverse urban environments.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/land15010158/s1, Figure S1: POI classification framework; Table S1: Correlation analysis of the 64 features.

Author Contributions

Conceptualization, Y.Z.; methodology, Y.Z.; software, Y.Z.; validation, Y.Z. and H.Y.; formal analysis, Y.Z.; investigation, Y.Z.; resources, X.H.; data curation, Y.Z. and X.H.; writing—original draft preparation, Y.Z.; writing—review and editing, H.Y.; visualization, Y.Z.; supervision, H.Y.; project administration, H.Y.; funding acquisition, H.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Materials. Further inquiries can be directed to the corresponding author.

Acknowledgments

We thank the research team for all the help and support provided while developing this work.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
GSE	Google Satellite Embedding
XGBoost	Extreme Gradient Boosting
RF	Random Forest
XAI	Explainable Artificial Intelligence
SHAP	Shapley Additive Explanations
OSM	OpenStreetMap
POI	Points of Interest
AEF	AlphaEarth Foundation
ASAN	Association of Southeast Asian Nations
MoHURD	Ministry of Housing and Urban-Rural Development
GEE	Google Earth Engine
GLCM	Gray-Level Co-occurrence Matrix
NIR	Near-Infrared
SWIR	Short-Wave Infrared
NDVI	Normalized Difference Vegetation Index
EVI	Enhanced Vegetation Index
SAVI	Soil-Adjusted Vegetation Index
NDBI	Normalized Difference Built-up Index
NDWI	Normalized Difference Water Index
ASM	Angular Second Moment
IDM	Inverse Difference Moment
DISS	Dissimilarity
OA	Overall Accuracy
PA	Producer’s Accuracy
UA	User’s Accuracy
PCA	Principal Component Analysis
MMU	Minimum Mapping Unit

References

Chen, M.; Chen, L.; Cheng, J.; Yu, J. Identifying interlinkages between urbanization and Sustainable Development Goals. Geogr. Sustain. 2022, 3, 339–346. [Google Scholar] [CrossRef]
Ma, Y.; Zhang, P.; Zhao, K.; Zhou, Y.; Zhao, S. A Dynamic Performance and Differentiation Management Policy for Urban Construction Land Use Change in Gansu, China. Land 2022, 11, 942. [Google Scholar] [CrossRef]
Chen, F.; Luo, Q.; Zhu, Z. Integrating static and dynamic analyses in a spatial management framework to enhance ecological networks connectivity in the context of rapid urbanization. Ecol. Modell. 2025, 501, 111022. [Google Scholar] [CrossRef]
Cheng, J.; Turkstra, J.; Peng, M.; Du, N.; Ho, P. Urban land administration and planning in China: Opportunities and constraints of spatial data models. Land Use Policy 2006, 23, 604–616. [Google Scholar] [CrossRef]
Chaturvedi, V.; de Vries, W.T. Machine Learning Algorithms for Urban Land Use Planning: A Review. Urban Sci. 2021, 5, 68. [Google Scholar] [CrossRef]
Drici, H.; Carpio-Pinedo, J. Urban land use mix and AI: A systematic review. Cities 2025, 165, 2. [Google Scholar] [CrossRef]
Radhakrishnan, N.; Hemalatha, S.; Devarajan, G.G.; Nachiyappan, S.; Karthick, S.; Singhal, A. Urban data fusion for spatio-temporal incident forecasting using graph attention and generative AI. Inf. Fusion 2026, 126, 103532. [Google Scholar] [CrossRef]
Doda, S.; Kahl, M.; Ouan, K.; Obadic, I.; Wang, Y.; Taubenböck, H.; Zhu, X.X. Interpretable deep learning for consistent large-scale urban population estimation using Earth observation data. Int. J. Appl. Earth Obs. Geoinf. 2024, 128, 103731. [Google Scholar] [CrossRef]
Gong, P.; Chen, B.; Li, X.; Liu, H.; Wang, J.; Bai, Y.; Chen, J.; Chen, X.; Fang, L.; Feng, S.; et al. Mapping essential urban land use categories in China (EULUC-China): Preliminary results for 2018. Sci. Bull. 2020, 65, 182–187. [Google Scholar] [CrossRef]
Rosier, J.F.; Taubenböck, H.; Verburg, P.H.; van Vliet, J. Fusing Earth observation and socioeconomic data to increase the transferability of large-scale urban land use classification. Remote Sens. Environ. 2022, 278, 113076. [Google Scholar] [CrossRef]
Li, Z.; Chen, B.; Huang, Y.; Wang, H.; Wang, Y.; Yuan, Y.; Li, X.; Chen, J.M.; Xu, B.; Gong, P. Enhanced mapping of essential urban land use categories in China (EULUC-China 2.0): Integrating multimodal deep learning with multisource geospatial data. Sci. Bull. 2025, 70, 3029–3041. [Google Scholar] [CrossRef]
Yao, Y.; Jiang, Y.; Sun, Z.; Li, L.; Chen, D.; Xiong, K.; Dong, A.; Cheng, T.; Zhang, H.; Liang, X.; et al. Applicability and sensitivity analysis of vector cellular automata model for land cover change. Comput. Environ. Urban Syst. 2024, 109, 102090. [Google Scholar] [CrossRef]
Wu, P.; Zhang, S.; Li, H.; Dale, P.; Ding, X.; Lu, Y. Urban Parcel Grouping Method Based on Urban Form and Functional Connectivity Characterisation. ISPRS Int. J. Geo-Inf. 2019, 8, 282. [Google Scholar] [CrossRef]
Guo, Y.; Tang, J.; Liu, H.; Yang, X.; Deng, M. Identifying up-to-date urban land-use patterns with visual and semantic features based on multisource geospatial data. Sust. Cities Soc. 2024, 101, 105184. [Google Scholar] [CrossRef]
Wang, J.; Bretz, M.; Dewan, M.A.A.; Delavar, M.A. Machine learning in modelling land-use and land cover-change (LULCC): Current status, challenges and prospects. Sci. Total Environ. 2022, 822, 153559. [Google Scholar] [CrossRef]
Casali, Y.; Aydin, N.Y.; Comes, T. Machine learning for spatial analyses in urban areas: A scoping review. Sust. Cities Soc. 2022, 85, 104050. [Google Scholar] [CrossRef]
Salman, H.A.; Kalakech, A.; Steiti, A. Random Forest Algorithm Overview. Babylon. J. Mach. Learn. 2024, 2024, 69–79. [Google Scholar] [CrossRef]
Li, Z.; Chen, B.; Wu, S.; Su, M.; Chen, J.M.; Xu, B. Deep learning for urban land use category classification: A review and experimental assessment. Remote Sens. Environ. 2024, 311, 114290. [Google Scholar] [CrossRef]
Fayaz, M.; Nam, J.; Dang, L.M.; Song, H.-K.; Moon, H. Land-Cover Classification Using Deep Learning with High-Resolution Remote-Sensing Imagery. Appl. Sci. 2024, 14, 1844. [Google Scholar] [CrossRef]
Zhu, Q.; Lei, Y.; Sun, X.; Guan, Q.; Zhong, Y.; Zhang, L.; Li, D. Knowledge-guided land pattern depiction for urban land use mapping: A case study of Chinese cities. Remote Sens. Environ. 2022, 272, 112916. [Google Scholar] [CrossRef]
Su, Y.; Zhong, Y.; Zhu, Q.; Zhao, J. Urban scene understanding based on semantic and socioeconomic features: From high-resolution remote sensing imagery to multi-source geographic datasets. ISPRS J. Photogramm. Remote Sens. 2021, 179, 50–65. [Google Scholar] [CrossRef]
Liu, Y.; Zhong, Y.; Shi, S.; Zhang, L. Scale-aware deep reinforcement learning for high resolution remote sensing imagery classification. ISPRS J. Photogramm. Remote Sens. 2024, 209, 296–311. [Google Scholar] [CrossRef]
Sun, E.; Cui, Y.; Liu, P.; Yan, J. A decade of deep learning for remote sensing spatiotemporal fusion: Advances, challenges, and opportunities. Inf. Fusion 2025, 126, 103675. [Google Scholar] [CrossRef]
Barredo Arrieta, A.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
Li, Z. Extracting spatial effects from machine learning model using local interpretation method: An example of SHAP and XGBoost. Comput. Environ. Urban Syst. 2022, 96, 101845. [Google Scholar] [CrossRef]
Wang, S.; Liu, Y.; Wang, W.; Zhao, G.; Liang, H. Interpretable machine learning guided by physical mechanisms reveals drivers of runoff under dynamic land use changes. J. Environ. Manag. 2024, 367, 121978. [Google Scholar] [CrossRef]
Temenos, A.; Temenos, N.; Kaselimi, M.; Doulamis, A.; Doulamis, N. Interpretable Deep Learning Framework for Land Use and Land Cover Classification in Remote Sensing Using SHAP. IEEE Geosci. Remote Sens. Lett. 2023, 20, 8500105. [Google Scholar] [CrossRef]
Yao, Y.; Gao, R.; Wu, H.; Dong, A.; Hu, Z.; Ma, Y.; Guan, Q.; Luo, P. Explainable Mapping of the Irregular Land Use Parcel with a Data Fusion Deep-Learning Model. IEEE Trans. Geosci. Electron. 2025, 63, 5612015. [Google Scholar] [CrossRef]
Zhai, X.; Jiang, J.; Dejl, A.; Rago, A.; Guo, F.; Toni, F.; Sivakumar, A. Heterogeneous graph neural networks with post-hoc explanations for multi-modal and explainable land use inference. Inf. Fusion 2025, 120, 103057. [Google Scholar] [CrossRef]
Tollefson, J. Google AI model mines trillions of images to create maps of Earth ‘at any place and time’. Nature 2025, 644, 313–314. [Google Scholar] [CrossRef]
Brown, C.F.; Kazmierski, M.R.; Pasquarella, V.J.; Rucklidge, W.J.; Samsikova, M.; Zhang, C.; Shelhamer, E.; Lahera, E.; Wiles, O.; Ilyushchenko, S.; et al. AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data. arXiv 2025, arXiv:2507.22291. [Google Scholar] [CrossRef]
Wang, Q.; Li, R.; Cheong, K.C. Nanning—Perils and promise of a frontier city. Cities 2018, 72, 51–59. [Google Scholar] [CrossRef]
Du, S.; Zhang, X.; Lei, Y.; Huang, X.; Tu, W.; Liu, B.; Meng, Q.; Du, S. Mapping urban functional zones with remote sensing and geospatial big data: A systematic review. GISci. Remote Sens. 2024, 61, 2404900. [Google Scholar] [CrossRef]
Zhu, J.; Zhu, M.; Na, J.; Lang, Z.; Lu, Y.; Yang, J. Incorporation of Spatially Heterogeneous Area Partitioning into Vector-Based Cellular Automata for Simulating Urban Land-Use Changes. Land 2023, 12, 1893. [Google Scholar] [CrossRef]
Chen, D.; Feng, Y.; Li, X.; Qu, M.; Luo, P.; Meng, L. Interpreting core forms of urban morphology linked to urban functions with explainable graph neural network. Comput. Environ. Urban Syst. 2025, 118, 102267. [Google Scholar] [CrossRef]
Theobald, D.M. Development and applications of a comprehensive land use classification and map for the US. PLoS ONE 2014, 9, e94628. [Google Scholar] [CrossRef]
Liu, X.; Long, Y. Automated identification and characterization of parcels with OpenStreetMap and points of interest. Environ. Plann. B Plann. Des. 2015, 43, 341–360. [Google Scholar] [CrossRef]
CJJ 37-2012; Code for Design of Urban Road Engineering. China Architecture & Building Press: Beijing, China, 2012.
Yin, J.; Fu, P.; Hamm, N.A.S.; Li, Z.; You, N.; He, Y.; Cheshmehzangi, A.; Dong, J. Decision-Level and Feature-Level Integration of Remote Sensing and Geospatial Big Data for Urban Land Use Mapping. Remote Sens. 2021, 13, 1579. [Google Scholar] [CrossRef]
Xu, S.; Qing, L.; Han, L.; Liu, M.; Peng, Y.; Shen, L. A New Remote Sensing Images and Point-of-Interest Fused (RPF) Model for Sensing Urban Functional Regions. Remote Sens. 2020, 12, 1032. [Google Scholar] [CrossRef]
Xu, Y.; Zhou, B.; Jin, S.; Xie, X.; Chen, Z.; Hu, S.; He, N. A framework for urban land use classification by integrating the spatial context of points of interest and graph convolutional neural network method. Comput. Environ. Urban Syst. 2022, 95, 101807. [Google Scholar] [CrossRef]
GB501377; Code for Classification of Urban Land Use and Planning Standards of Development Land. China Architecture & Building Press: Beijing, China, 2011.
Zhang, X.; Du, S.; Zheng, Z. Heuristic sample learning for complex urban scenes: Application to urban functional-zone mapping with VHR images and POI data. ISPRS J. Photogramm. Remote Sens. 2020, 161, 1–12. [Google Scholar] [CrossRef]
Hafner, S.; Ban, Y.; Nascetti, A. Unsupervised domain adaptation for global urban extraction using Sentinel-1 SAR and Sentinel-2 MSI data. Remote Sens. Environ. 2022, 280, 113192. [Google Scholar] [CrossRef]
Phiri, D.; Simwanda, M.; Salekin, S.; Nyirenda, V.; Murayama, Y.; Ranagalage, M. Sentinel-2 Data for Land Cover/Use Mapping: A Review. Remote Sens. 2020, 12, 2291. [Google Scholar] [CrossRef]
Hu, Y.; Han, Y. Identification of Urban Functional Areas Based on POI Data: A Case Study of the Guangzhou Economic and Technological Development Zone. Sustainability 2019, 11, 1385. [Google Scholar] [CrossRef]
Kraff, N.J.; Wurm, M.; Taubenbock, H. Uncertainties of Human Perception in Visual Image Interpretation in Complex Urban Environments. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2020, 13, 4229–4241. [Google Scholar] [CrossRef]
da Silva, V.S.; Salami, G.; da Silva, M.I.O.; Silva, E.A.; Monteiro Junior, J.J.; Alba, E. Methodological evaluation of vegetation indexes in land use and land cover (LULC) classification. Geol. Ecol. Landsc. 2019, 4, 159–169. [Google Scholar] [CrossRef]
Zheng, Y.; Tang, L.; Wang, H. An improved approach for monitoring urban built-up areas by combining NPP-VIIRS nighttime light, NDVI, NDWI, and NDBI. J. Clean. Prod. 2021, 328, 129488. [Google Scholar] [CrossRef]
Xie, C.; Wang, J.; Haase, D.; Wellmann, T.; Lausch, A. Measuring spatio-temporal heterogeneity and interior characteristics of green spaces in urban neighborhoods: A new approach using gray level co-occurrence matrix. Sci. Total Environ. 2023, 855, 158608. [Google Scholar] [CrossRef]
Hall-Beyer, M. Practical guidelines for choosing GLCM textures to use in landscape classification tasks over a range of moderate spatial scales. Int. J. Remote Sens. 2017, 38, 1312–1338. [Google Scholar] [CrossRef]
Park, Y.; Guldmann, J.-M. Measuring continuous landscape patterns with Gray-Level Co-Occurrence Matrix (GLCM) indices: An alternative to patch metrics? Ecol. Indic. 2020, 109, 105802. [Google Scholar] [CrossRef]
Jafarzadeh, H.; Mahdianpari, M.; Gill, E.; Mohammadimanesh, F.; Homayouni, S. Bagging and Boosting Ensemble Classifiers for Classification of Multispectral, Hyperspectral and PolSAR Data: A Comparative Evaluation. Remote Sens. 2021, 13, 4405. [Google Scholar] [CrossRef]
Georganos, S.; Grippa, T.; Vanhuysse, S.; Lennert, M.; Shimoni, M.; Wolff, E. Very High Resolution Object-Based Land Use–Land Cover Urban Classification Using Extreme Gradient Boosting. IEEE Geosci. Remote Sens. Lett. 2018, 15, 607–611. [Google Scholar] [CrossRef]
Ramezan, C.A.; Warner, T.A.; Maxwell, A.E. Evaluation of Sampling and Cross-Validation Tuning Strategies for Regional-Scale Machine Learning Classification. Remote Sens. 2019, 11, 185. [Google Scholar] [CrossRef]
Guo, R.; Yang, B.; Guo, Y.; Li, H.; Li, Z.; Zhou, B.; Hong, B.; Wang, F. Machine learning-based prediction of outdoor thermal comfort: Combining Bayesian optimization and the SHAP model. Build. Sci. 2024, 254, 111301. [Google Scholar] [CrossRef]
Antonini, A.S.; Tanzola, J.; Asiain, L.; Ferracutti, G.R.; Castro, S.M.; Bjerg, E.A.; Ganuza, M.L. Machine Learning model interpretability using SHAP values: Application to Igneous Rock Classification task. Appl. Comput. Geosci. 2024, 23, 100178. [Google Scholar] [CrossRef]
Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
Bengio, Y.; Courville, A.; Vincent, P. Representation Learning: A Review and New Perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [Google Scholar] [CrossRef]
Zhang, J.; Li, P.; Wang, J. Urban Built-Up Area Extraction from Landsat TM/ETM+ Images Using Spectral Information and Multivariate Texture. Remote Sens. 2014, 6, 7339–7359. [Google Scholar] [CrossRef]
Kumari, S.; Lal, P.; Kumar, A. Spatial heterogeneity for urban built-up footprint and its characterization using microwave remote sensing. Adv. Space Res. 2022, 70, 3822–3832. [Google Scholar] [CrossRef]
Kebede, T.A.; Hailu, B.T.; Suryabhagavan, K.V. Evaluation of spectral built-up indices for impervious surface extraction using Sentinel-2A MSI imageries: A case of Addis Ababa city, Ethiopia. Environ. Chall. 2022, 8, 100568. [Google Scholar] [CrossRef]
Alvarez, C.I.; Ulloa Vaca, C.A.; Echeverria Llumipanta, N.A. Machine Learning for Urban Air Quality Prediction Using Google AlphaEarth Foundations Satellite Embeddings: A Case Study of Quito, Ecuador. Remote Sens. 2025, 17, 3472. [Google Scholar] [CrossRef]
Huang, S.; Tang, L.; Hupy, J.P.; Wang, Y.; Shao, G. A commentary review on the use of normalized difference vegetation index (NDVI) in the era of popular remote sensing. J. For. Res. 2020, 32, 1–6. [Google Scholar] [CrossRef]
Ridd, M.K. Exploring a V-I-S (vegetation-impervious surface-soil) model for urban ecosystem analysis through remote sensing: Comparative anatomy for cities^†. Int. J. Remote Sens. 2007, 16, 2165–2185. [Google Scholar] [CrossRef]
Franke, J.; Roberts, D.A.; Halligan, K.; Menz, G. Hierarchical Multiple Endmember Spectral Mixture Analysis (MESMA) of hyperspectral imagery for urban environments. Remote Sens. Environ. 2009, 113, 1712–1723. [Google Scholar] [CrossRef]
Guindon, B.; Zhang, Y.; Dillabaugh, C. Landsat urban mapping based on a combined spectral–spatial methodology. Remote Sens. Environ. 2004, 92, 218–232. [Google Scholar] [CrossRef]
Rashed, T.; Weeks, J.R.; Roberts, D.; Rogan, J.; Powell, R. Measuring the Physical Composition of Urban Morphology Using Multiple Endmember Spectral Mixture Models. Photogramm. Eng. Remote Sens. 2003, 69, 1011–1020. [Google Scholar] [CrossRef]
De Luca, G.; Pancorbo, J.L.; Carotenuto, F.; Gioli, B.; Modica, G.; Genesio, L. PRISMA imaging for land covers and surface materials composition in urban and rural areas adopting multiple endmember spectral mixture analysis (MESMA). ISPRS-J. Photogramm. Remote Sens. 2025, 225, 196–220. [Google Scholar] [CrossRef]
Small, C.; Sousa, D. Spectral Characteristics of the Dynamic World Land Cover Classification. Remote Sens. 2023, 15, 575. [Google Scholar] [CrossRef]
Gao, X.; Huete, A.R.; Ni, W.; Miura, T. Optical–Biophysical Relationships of Vegetation Spectra without Background Contamination. Remote Sens. Environ. 2000, 74, 609–620. [Google Scholar] [CrossRef]
Strobl, C.; Boulesteix, A.-L.; Zeileis, A.; Hothorn, T. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinform. 2007, 8, 25. [Google Scholar] [CrossRef]
Darst, B.F.; Malecki, K.C.; Engelman, C.D. Using recursive feature elimination in random forest to account for correlated variables in high dimensional data. BMC Genet. 2018, 19, 65. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoos: A Scalable Tree Boosting System. arXiv 2016. [Google Scholar] [CrossRef]
Lundberg, S.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. arXiv 2017. [Google Scholar] [CrossRef]
Abdi, A.M. Land cover and land use classification performance of machine learning algorithms in a boreal landscape using Sentinel-2 data. GISci. Remote Sens. 2019, 57, 1–20. [Google Scholar] [CrossRef]
Gregorutti, B.; Michel, B.; Saint-Pierre, P. Correlation and variable importance in random forests. Stat. Comput. 2016, 27, 659–678. [Google Scholar] [CrossRef]
Ghazizade-Fard, M.; Koupaie, E.H. Anaerobic co-digestion of wastewater sludge and food waste: A machine learning approach to process modeling and optimization. J. Environ. Manage. 2025, 393, 126985. [Google Scholar] [CrossRef]
Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.-S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [Google Scholar] [CrossRef]
Reichstein, M.; Camps-Valls, G.; Stevens, B.; Jung, M.; Denzler, J.; Carvalhais, N.; Prabhat, F. Deep learning and process understanding for data-driven Earth system science. Nature 2019, 566, 195–204. [Google Scholar] [CrossRef]
Lei, C.; Wagner, P.D.; Fohrer, N. Identifying the most important spatially distributed variables for explaining land use patterns in a rural lowland catchment in Germany. J. Geogr. Sci. 2019, 29, 1788–1806. [Google Scholar] [CrossRef]
Chen, H.; Yang, L.; Wu, Q. Enhancing Land Cover Mapping and Monitoring: An Interactive and Explainable Machine Learning Approach Using Google Earth Engine. Remote Sens. 2023, 15, 4585. [Google Scholar] [CrossRef]
Wu, W.; Xia, Y.; Jin, W. Predicting Bus Passenger Flow and Prioritizing Influential Factors Using Multi-Source Data: Scaled Stacking Gradient Boosting Decision Trees. IEEE Trans. Intell. Transp. Syst. 2021, 22, 2510–2523. [Google Scholar] [CrossRef]
Hughes, G. On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory 1968, 14, 55–63. [Google Scholar] [CrossRef]
Miller, T. Explanation in artificial intelligence: Insights from the social sciences. Artif. Intell. 2019, 267, 1–38. [Google Scholar] [CrossRef]

Figure 1. Overview of the study area.

Figure 2. Methodology workflow. (1) Step1: The sample dataset is established through expert annotation based on Esri World imagery, POI, and Standard, after delineating land parcels using OSM. (2) Step2: Input features are extracted from GSE data, and a set of physical reference features is constructed using Sentinel-2. (3) Step3: XGBoost and RF models are evaluated, followed by a SHAP analysis. SHAP summary and beeswarm plots are generated for the top 10 features to identify key features. Finally, a correlation analysis is conducted between key features and physical reference features to interpret their geographical semantics and elucidate the model’s decision-making mechanism.

Figure 3. (a) Spatial distribution of the samples; (b) The representative samples.

Figure 4. Visualization of Classification Results. (a) Classification results of XGBoost; (b) Classification results of RF; (c) A green park, identified by its large, contiguous green texture, was correctly classified as open space; (d) The area of low-rise, metal-roofed buildings was correctly classified as business; (e) The mixture of high-rise buildings and green space was correctly classified as residence; (f) A village, characterized by a dense cluster of small buildings and significant green space, was misclassified as institution; (g) Land under development, characterized by bare earth and uncompleted buildings, was misclassified as an institution.

Figure 5. SHAP Summary Plot for the Top 10 Features. (a) XGBoost; (b) RF.

Figure 6. Correlation analysis of key features with spectral and texture features. (a) Spectral features; (b) Texture features. Significance levels are denoted as follows: * p < 0.05, ** p < 0.01, *** p < 0.001.

Figure 7. SHAP Beeswarm Plot. Each pair of plots compares the XGBoost (left) and the RF (right) for a specific category. (a,b) Institution; (c,d) Open space; (e,f) Business; (g,h) Residence.

Table 1. Road Classification.

Classes	Road Descriptions	Road Widths (m)
Level 1	Motorway, motorway_link, primary, primary_link, trunk, trunk_link, rail	40
Level 2	Secondary, secondary_link	20
Level 3	Tertiary, tertiary_link, residential, service, unclassified, unknown, other	10

Table 2. Urban Land Use Classification.

Classification	Main Subcategories	Description
Institution	Governmental organization; Medical service; Culture and Education.	This class primarily includes land for public administration, public services, and education.
Open space	Tourist attraction	This category encompasses public green spaces and other open areas intended for recreation, ecological functions, or public access.
Business	Enterprises; Shopping; Accommodation service	This class represents areas dedicated to commercial activities, financial services, corporate functions, and some light industry.
Residence	Commercial house; Daily life service	This category includes land primarily used for housing.

Table 3. Physical Reference Features.

Feature Types	Indices	Description
Spectral features	Normalized Difference Vegetation Index (NDVI)	Indicates the presence and density of green vegetation
	Enhanced Vegetation Index (EVI)	Quantifies vegetation with reduced sensitivity to soil and atmospheric noise
	Soil-Adjusted Vegetation Index (SAVI)	Minimizes soil brightness effects to accurately map sparse vegetation
	Normalized Difference Built-up Index (NDBI)	Highlights impervious surfaces and built-up areas
	Normalized Difference Water Index (NBVI)	Delineates open water bodies and aids in identifying building shadows
Textural features	Entropy	Measures the disorder or complexity of the spatial texture
	Contrast	Captures the intensity of local variations and edges
	Angular Second Moment (ASM)	Measures textural uniformity and order
	Inverse Difference Moment (IDM)	Quantifies the local homogeneity of the image texture
	Dissimilarity (DISS)	Describes the distinctiveness of spatial patterns and local contrast

Table 4. Comparison of Model Performance.

Evaluation Metrics		RF	XGBoost
OA		81.87% ± 1.72%	85.00% ± 2.24%
Macro F1		81.72% ± 1.73%	84.81% ± 2.32%
Kappa		0.7583 ± 0.0230	0.8000 ± 0.0298
UA	Institution	69.00% ± 4.36%	74.50% ± 5.10%
	Open space	98.00% ± 1.87%	99.00% ± 1.22%
	Business	84.50% ± 6.60%	87.00% ± 7.81%
	Residence	76.00% ± 7.84%	79.50% ± 8.72%
PA	Institution	72.95% ± 6.08%	80.02% ± 4.21%
	Open space	96.15% ± 2.84%	96.18% ± 2.42%
	Business	81.97% ± 5.68%	84.11% ± 4.50%
	Residence	76.89% ± 1.20%	80.32% ± 5.65%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zheng, Y.; Huang, X.; Yao, H. Towards Trustworthy Urban Land Use Classification: A Synergistic Fusion of Deep Learning and Explainable Machine Learning with a Nanning Case Study. Land 2026, 15, 158. https://doi.org/10.3390/land15010158

AMA Style

Zheng Y, Huang X, Yao H. Towards Trustworthy Urban Land Use Classification: A Synergistic Fusion of Deep Learning and Explainable Machine Learning with a Nanning Case Study. Land. 2026; 15(1):158. https://doi.org/10.3390/land15010158

Chicago/Turabian Style

Zheng, Yusheng, Xinying Huang, and Huanmei Yao. 2026. "Towards Trustworthy Urban Land Use Classification: A Synergistic Fusion of Deep Learning and Explainable Machine Learning with a Nanning Case Study" Land 15, no. 1: 158. https://doi.org/10.3390/land15010158

APA Style

Zheng, Y., Huang, X., & Yao, H. (2026). Towards Trustworthy Urban Land Use Classification: A Synergistic Fusion of Deep Learning and Explainable Machine Learning with a Nanning Case Study. Land, 15(1), 158. https://doi.org/10.3390/land15010158

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Towards Trustworthy Urban Land Use Classification: A Synergistic Fusion of Deep Learning and Explainable Machine Learning with a Nanning Case Study

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data Source and Preprocessing

2.2.1. World Imagery

2.2.2. OpenStreetMap (OSM) Network Data

2.2.3. Points of Interest (POI) Data

2.2.4. Google Satellite Embedding (GSE) Data

2.2.5. Sentinel-2 Multispectral Imagery

2.3. Methods

2.3.1. Semantically Constrained Urban Parcel Generation

2.3.2. Feature Engineering

2.3.3. Machine Learning Modeling and Explainability Analysis

3. Results

3.1. Performance Evaluation of Embedding-Based Models

3.1.1. Quantitative Evaluation

3.1.2. Qualitative Evaluation

3.2. Identification and Physical Interpretation of Key Features

3.3. SHAP-Based Insights into Classification Decision

4. Discussion

4.1. Decoding the Geospatial Semantics of Key Features

4.1.1. Features for Built-Up Area Characterization

4.1.2. Features for Vegetation Characterization

4.2. Deconstructing the Model’s Hierarchical Decision Logic

4.2.1. Convergent Strategies for Distinct Classes

4.2.2. Divergent Strategies for Ambiguous Classes

4.3. Deconstructing the Mechanisms Behind XGBoost’s Superior Performance

4.4. Major Contributions

4.5. Limitations

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI