Generalization of LULC Classification in Arid Environments Using Machine Learning and Spectral, Texture, and Topographic Features: Spatial and Seasonal Analyses with Implications for Urban Environmental Monitoring

Aljaddani, Amal H.

doi:10.3390/land15061095

Open AccessArticle

Generalization of LULC Classification in Arid Environments Using Machine Learning and Spectral, Texture, and Topographic Features: Spatial and Seasonal Analyses with Implications for Urban Environmental Monitoring

by

Amal H. Aljaddani

Department of Physical Sciences-Geographic Information Systems Program, College of Science, University of Jeddah, Jeddah 21589, Saudi Arabia

Land 2026, 15(6), 1095; https://doi.org/10.3390/land15061095 (registering DOI)

Submission received: 8 May 2026 / Revised: 10 June 2026 / Accepted: 18 June 2026 / Published: 20 June 2026

Download

Browse Figures

Versions Notes

Abstract

Accurate land use/land cover (LULC) mapping from remotely sensed data remains challenging in arid regions, particularly for spatial and seasonal generalization. This work proposes a novel exclude-one-city-out (EOCO) framework based on machine learning (ML) to achieve LULC generalization across summer and winter in arid environments. Four cities in Saudi Arabia witnessing rapid urban growth were selected: Riyadh, Madinah, Jeddah, and Dammam. The ML models were trained on three cities and tested on the unseen city. Sentinel-2 surface reflectance data for the visible (Blue, Green, and Red) and near-infrared bands (NIR, SWIR1, and SWIR2) were used. Spectral indices, texture features, and topographical data were used to form five feature sets, which were utilized as inputs for four ML algorithms: random forest, support vector machine, classification and regression trees, and K-nearest neighbors. Statistical tests (Friedman, Kendall’s W, and Wilcoxon signed rank) were conducted to assess differences across ML models, feature sets, and seasons. The random forest model consistently outperformed other models across the five feature sets, while the spectral texture and combined feature sets outperformed other feature combinations. Significant differences in feature importance were observed across cities and seasons for spectral texture during summer and winter (p-values: 1.25 × 10⁻⁴ and 9.2 × 10⁻⁵, respectively), with strong agreement (Kendall’s W = 0.9212 and 0.9424). The findings can support urban environmental monitoring in arid regions, contributing to sustainable urban development.

Keywords:

machine learning; land use/land cover; generalization; urban areas; arid environment; seasonal variability; Saudi Arabia

1. Introduction

Land use and land cover (LULC) mapping is a vital application of remote sensing, supporting various domains such as environmental monitoring, climate change assessment, disaster management, and urban development [1,2,3,4]. Temporal and spatial information regarding LULC is also essential for land use planning and the establishment of effective development strategies [5]. In general, LULC mapping can be applied across diverse climatic and environmental zones, including tropical, arid, semi-arid, temperate, and continental environments [6,7,8,9,10].

Despite the importance of LULC mapping, ensuring accurate surface mapping remains challenging in arid and semi-arid environments, which have experienced continuous LULC changes driven by urbanization and industrialization. Additionally, natural processes (e.g., droughts and flash floods) and anthropogenic activities may contribute to changes in LULC [11,12]. Mapping quality is also influenced by the unique environmental conditions in these regions, including ecological fragility, environmental and land degradation, and climate variability [13]. Notably, field mapping and feature extraction are particularly complex in arid and semi-arid environments, owing to spectral similarities between urban surfaces and barren lands, fragmented and sparse vegetation cover, and seasonal variations. All these factors significantly hinder the creation of high-quality maps in dry arid areas.

LULC mapping can be conducted using field surveys and/or remotely sensed data [7]. When combined with historical and current satellite imagery, these data sources provide up-to-date information, support advanced mapping, and facilitate monitoring of changes on the Earth’s surface. LULC patterns derived from remote-sensing data highlight the importance of LULC accuracy and generalization in environmental contexts [14]. Various satellite image sets with different spatial resolutions have been employed to produce LULC maps across scales ranging from local to global. Coarse-resolution data include Moderate Resolution Imaging Spectroradiometer (MODIS, 250 m to 1 km) and Advanced Very-High-Resolution Radiometer (1 km) [15]. Moderate-resolution data include Landsat (30 m) [16] and Sentinel-2 (10 and 20 m) images [17]. High-resolution data include WorldView-3 (0.3 and 1.2 m) [18] and IKONOS (1 m) [19]. Although high spatial resolutions provide detailed information, medium spatial resolutions, such as the 10 m bands of Sentinel-2, are commonly used for city-scale mapping. This is because such data can capture detailed information and are freely available, facilitating sustainable urban development and environmental studies. These data can contribute to the generation of accurate, high-quality maps, which are critical inputs for environmental and climatic models.

Various machine learning (ML) algorithms have been adopted to improve LULC classification and handle complex patterns, including random forest (RF), support vector machine (SVM), classification and regression trees (CART), and K-nearest neighbors (KNN) [20,21,22,23]. These algorithms have been used in both supervised and unsupervised classification settings, across different spatial and temporal scales, to extract essential information from surface features. However, their performance depends on the characteristics of satellite imagery data, computational processing, and environmental and climatic factors. Therefore, the performance of ML models in LULC mapping must be comprehensively evaluated to better understand their behavior and improve estimation accuracy across different applications [24].

Recent advances in remote sensing have led to increasing interest in deep-learning approaches for urban land cover classification, such as convolutional neural networks (CNNs) [25,26], recurrent neural networks and temporal models [27,28], and Siamese and metric-learning networks [29,30]. These frameworks have demonstrated strong performance in extracting spatial and temporal information for distinguishing spectral patterns in complex areas. In addition, transfer learning and domain adaptation techniques have been used to improve model generalization across different environmental conditions and geographical regions [31,32,33]. However, deep-learning approaches typically require larger datasets, more considerable computational capabilities, and extensive model tuning, which can limit their applicability to local-scale regions. Consequently, traditional ML algorithms remain widely used for LULC mapping, owing to their applicability, interpretability, computational efficiency, and ability to achieve comparable performance when different feature sets are implemented. Thus, this study focuses on assessing the generalization capability of four ML algorithms (RF, SVM, CART, and KNN) across multiple urban arid environments using different feature sets.

Advances in remote-sensing and geographic information system technologies have enabled the processing of large datasets with improved computational efficiency. For example, Google Earth Engine (GEE), a geospatial online programming platform, has been used across various domains, such as urban development [4,34,35], forest monitoring [36,37,38], identification of burned areas [39,40,41], natural disaster assessment [42,43,44], and land surface temperature evaluations [2,45,46]. GEE offers free access to different types of data, including remotely sensed data [42], as well as coding, processing, and instant data-visualization tools, rendering it superior to various other online sources [47].

Under this backdrop, LULC dynamics across different environmental conditions, especially in arid and semi-arid contexts, must be examined using classification techniques and remotely sensed data [48]. Numerous studies have examined the potential of various ML models in LULC mapping across different regions, including Brazil, Indonesia, Australia, China, India, Ethiopia, Sweden, Canada, and the United States of America [6,7,9,17,49,50,51,52,53]. However, few studies have focused on the use of supervised classification techniques for arid and semi-arid environments. Notable examples include studies on the Dengkou Oasis, China [7]; the Urmia Lake basin, Iran [54]; and Botswana [8].

Although previous studies have evaluated ML performance and provided valuable insights, research on LULC generalization across spatial and temporal domains remains limited. Notably, generalization performance may degrade, owing to challenges in modeling uncertainty across different sources of remotely sensed data [55,56]. Relevant studies have mainly focused on crop and wetland mapping across diverse climates [57,58,59,60]. For example, Cai et al. [39] attempted to enhance generalization accuracy by integrating segmentation and spectral data. Shafizadeh-Moghadam et al. [37] and Shibuya et al. [40] examined algorithm performance for temporal and spatial generalization, illustrating the strengths and limitations of different ML techniques. In the context of arid environments, Halmy and Gessler [38], Weng et al. [41], and Ali and Johnson [42] explored generalization challenges and the corresponding influence of data and study design. This literature review identifies a critical knowledge gap, i.e., limited direct comparisons of ML techniques in terms of their generalization performance for LULC mapping, specifically using Sentinel-2 imagery across seasons in arid and semi-arid regions.

To address this gap, this study is aimed at evaluating the generalization performance of LULC classification in arid and semi-arid cities using ML algorithms across five feature sets and two seasons (summer and winter). A novel approach, named exclude-one-city-out (EOCO), is introduced to assess the generalization strength and robustness of ML algorithms. The models are trained on data from three cities and tested on a fourth, unseen city using supervised classification. Although several ML- and deep-learning-based studies have used similar approaches, such as the leave-one-region-out or cross-domain validation strategies [61,62,63], the proposed EOCO framework is novel in various aspects. The model is trained in arid-region cities that vary in terms of urban land cover, spectral response and topography, and environmental conditions, while excluding an entire city for testing. This creates a strict spatial generalization scenario in which models must transfer knowledge from training cities to a completely unseen city. Moreover, the integration of five feature sets (including spectral features, spectral indices, texture, topographical variables, and their combinations) helps identify the feature set that most effectively improves LULC classification under conditions of spectral confusion observed in arid environments.

Four densely populated cities in Saudi Arabia, noted to witness rapid urban growth, are selected: Riyadh, Madinah, Jeddah, and Dammam. Sentinel-2 surface reflectance data from the visible and near-infrared bands (Blue, Green, Red, NIR, SWIR1, and SWIR2) are used. Texture variables (based on the gray level co-occurrence matrix, GLCM), spectral indices (modified normalized difference water index, MNDWI; normalized difference built-up index, NDBI; normalized difference vegetation index, NDVI; and soil-adjusted vegetation index, SAVI), and topographic data (elevation and slope) were organized into five feature sets. The following research questions are addressed in this work: (1) How effectively do ML models generalize across spatial and seasonal contexts when trained on multiple cities and tested on an unseen city? (2) Which ML algorithm demonstrates consistently high performance across the five feature sets in summer and winter? (3) Which LULC classes are commonly misclassified across cities, models, feature sets, and seasons in arid and semi-arid environments? (4) How does environmental heterogeneity influence model performance over seasons and four selected cities? The findings of this work are expected to provide valuable insights for the scientific community and policymakers focused on both urban and environmental monitoring.

The remainder of this paper is organized as follows. Section 2 describes the study area, EOCO approach, data collection, and preprocessing. Section 3 outlines training data and classification, generalization accuracy assessment across cities, and statistical analysis. Section 4 discusses the model performance metrics, spatial generalization, seasonal performance, and feature importance. The implications of the results are presented in Section 5, and Section 6 presents the concluding remarks.

2. Data and Methodology

2.1. Study Area

The analysis focused on four cities in Saudi Arabia, including Riyadh, Madinah, Jeddah, and Dammam (Figure 1). Riyadh, the national capital of Saudi Arabia and regional capital of the Riyadh Region, is located on the Najd plateau and surrounded by deserts. Riyadh serves as the financial and political hub of the country [4]. Madinah, the capital of the administrative region of Madinah, is located in the western part of Saudi Arabia and in the northwestern part of the Makkah region. Medina is characterized by mountains, valleys, and volcanic lava fields, with predominant agricultural activity [64]. Jeddah is located in the western part of Saudi Arabia, along the Red Sea coast. It serves as a key seaport, with economic and tourism implications. Jeddah lies on a coastal plain along the Red Sea, with valleys that drain into it [65]. Dammam, located on the Arabian Gulf in the eastern part of Saudi Arabia, is a center of economic and industrial activity. The city lies on a coastal plain, interspersed with valleys and salt flats [66].

Notably, these cities were selected because they share similar climate, terrain, and construction materials. Southern regions were excluded, owing to monsoonal influences and variability in seasonal vegetation.

2.2. Overview of the EOCO Approach

The EOCO approach was employed to assess the performance of four ML models (RF, SVM, CART, and KNN). The models were trained on three cities (e.g., Madinah, Jeddah, and Dammam) and tested on the unseen city, i.e., Riyadh (Figure 2). This framework ensured that each city was used once as a test set, clarifying how ML models generalize across different feature sets during summer and winter. Moreover, it helped identify conditions under which the models fail to generalize in challenging environments such as arid regions.

The five feature sets were constructed as follows:

Spectral: Spectral bands, including visible (Blue, Green, and Red), near-infrared (NIR), and shortwave infrared (SWIR1 and SWIR2).
Spectral_Indices: Spectral bands and water-urban-vegetation indices (MNDWI, NDBI, NDVI, and SAVI).
Spectral_Texture: Spectral bands and GLCM-based texture features (contrast, entropy, homogeneity, and variance).
Spectral_Topography: Spectral bands and topographical features (elevation and slope).
All_Features: Combination of spectral, indices, texture, and topographical features.

2.3. Data Collection and Preprocessing

Sentinel-2 Level-2A surface reflectance data were obtained from the COPERNICUS/S2_SR_HARMONIZED dataset and processed using the GEE platform [67]. Sentinel-2 data are typically characterized by a wide swath and medium spatial resolution (10–20 m). The Level-2A images were atmospherically corrected using the Sen2Cor processor and geometrically corrected using orthorectification to derive surface reflectance [68]. The data were georeferenced in UTM/WGS84. Temporal filtering was applied to partition the data into summer (1 June 2025 to 30 September 2025) and winter (1 December 2024 to 30 March 2025) subsets for the four cities of interest. Images with cloud and cloud-shadow coverage above 10% were excluded.

Multispectral bands, including the visible bands (Blue, Green, and Red), NIR, SWIR1, and SWIR2, were used for LULC classification and model assessments (Table 1) [67]. Because Sentinel-2 Level-2A data exhibits mixed spatial resolution (visible and NIR: 10 m, SWIR1 and SWIR2: 20 m), it was necessary to unify the pixel size and resample the bands. Accordingly, the SWIR bands were resampled to 10 m using bilinear interpolation, which computes new pixel values based on a weighted average of the four nearest neighbors and produces smoother output than other methods, such as nearest-neighbor sampling [69]. Next, median composites were generated for summer and winter. This approach reduces noise and outliers by computing the median of all values at each pixel across the stack of matching bands, thus improving LULC classification accuracy [70]. All Sentinel-2 surface reflectance images were clipped to the administrative boundaries of the specific cities. Topographic data from the NASA SRTM Digital Elevation Model (30 m resolution) were incorporated into feature sets 4 and 5. These data were also processed on the GEE platform [71]. The NASA SRTM DEM (30 m resolution) was selected, owing to its global coverage, free availability, consistent quality, and widespread use in remote-sensing-based LULC applications. The elevation and slope generated from the DEM were used to assess the effective contribution of topographic information when integrated with Sentinel-2 spectral bands.

Spectral indices of water, urban, and vegetation were considered, including MNDWI, NDBI, NDVI, and SAVI [2,36,73,74]. MNDWI, developed by Xu [52], improves water-surface detection while reducing noise from other features such as urban areas, vegetation, and soil. This remote-sensing spectral index is determined based on Green and SWIR2 bands [73]. NDBI, a geospatial monitoring index, is used to distinguish and map materials in urban areas, such as impervious surfaces, buildings, and roads, by leveraging bands that reflect higher built-up materials in the SWIR region compared with the NIR [2]. NDVI is one of the most widely used remote-sensing metrics for distinguishing healthy, dense vegetation based on chlorophyll reflectance. It is determined as the ratio of NIR (high reflectance in healthy vegetation) to Red (strong absorption by chlorophyll) bands. NDVI values range from −1 to 1, with higher values indicating healthier and denser vegetation [36]. SAVI is similar to NDVI, enabling the detection of healthy and dense vegetation, albeit with an additional function of minimizing the influence of soil background reflectance. This makes it valuable in arid and semi-arid regions, characterized by sparse vegetation coverage. SAVI is computed based on the NIR and Red bands, including a correction factor L. In this study, this factor was set to 0.5, associated with moderate vegetation coverage [74]. These spectral indices were incorporated into feature sets (2) and (5) and computed using Equations (1)–(4):

M N D W I = \frac{G r e e n (B 3) - S W I R (B 11)}{G r e e n (B 3) + S W I R (B 11)}

(1)

N D B I = \frac{S W I R (B 11) - N I R (B 8)}{S W I R (B 11) + N I R (B 8)}

(2)

N D V I = \frac{N I R (B 8) - R E D (B 4)}{N I R (B 8) + R E D (B 4)}

(3)

S A V I = (1 + L) \frac{N I R (B 8) - R E D (B 4)}{N I R (B 8) + R E D (B 4) + 0.5}

(4)

With standard L = 0.5.

Statistical texture features (contrast, entropy, homogeneity, and variance) were derived from the GLCM using a window size of 3. These features captured the local variation (contrast), complexity of image texture (entropy), similarity of pixel values (homogeneity), and dispersion per pixel (variance). The textures were processed using Red bands to highlight urban areas. The red band was chosen because it provides greater contrast with urban areas than other LULC classes, making it useful for distinguishing the structure and homogeneity of urban areas [4,75]. They were determined in GEE using the glcmTexture({size: 3}) function [76]. The specified 3 by 3 window was chosen because it captures local texture differences while preserving spatial resolution and details, compared to a larger window that may produce smoother texture and present mixed pixels from neighboring LULC classes. The texture was integrated into feature sets 3 and 5.

3. LULC Classification

3.1. Training Data and Classification

Training samples for both summer and winter seasons were collected using a stratified purposive sampling strategy, leveraging medium-resolution Sentinel-2 imagery. During data collection, stable LULC classes were selected based on the spatial distribution, including urban areas, barren lands, vegetation cover, and water bodies. Training samples were collected using Sentinel-2 images and verified using the higher-resolution imagery in Google Earth Pro. Training data were collected for each city, ensuring sufficient representation for all classes. To minimize class imbalance, the data were distributed across cities, covering all class types. Training samples were collected using ArcGIS Pro 3.6. Shapefile point features were used instead of polygons, owing to the simplicity of data collection, reduced computational complexity, and potentially enhanced accuracy of LULC classification. This is because the training samples were collected from homogeneous land cover surfaces, which helped reduce mixed pixels, thereby improving the feature set and ML models.

Training data for each selected LULC class were collected:

Urban areas: areas with impervious surfaces covering more than 30%, such as residential, institutional, commercial, industrial, and transportation networks.
Barren lands: areas with bare soil, such as mountains, deserts, and rocks, including vacant non-urban areas.
Vegetation cover: public green spaces, crops, shrublands, and sparse vegetation.
Water: surface water features, constituting the smallest class, especially in inland cities such as Riyadh and Madinah.

The total number of training sample points across the four cities was 4696, with urban areas, barren lands, vegetation cover, and water summarized in Table 2.

The LULC classification process was based on the EOCO approach. Each of the four ML models (RF, SVM, CART, and KNN) was applied using each of the five feature sets across the four cities. These models were selected because they have been widely applied in remote-sensing applications in different environments and exhibit distinct learning mechanisms [20,21,22,23]. All models were implemented in GEE using default settings. Model descriptions are provided below.

RF constructs multiple decision trees and combines their outputs to improve accuracy, minimize overfitting, and stabilize predictions [77]. RF was implemented using the smileRandomForest() function in GEE with 150 decision trees and the default parameter mtry. Feature importance was assessed using the Gini-based importance scores generated by the RF model implemented in GEE. These metrics reflect the contribution of each variable to reducing LULC classification impurity across ensemble decision trees. Higher scores indicate greater importance of features in distinguishing LULC classes.

SVM seeks the best decision boundary, termed a hyperplane, which distinguishes different LULC classes while maximizing the margin between them. The libsvm() function with a radial basis function and the default kernel was used, given its effectiveness in remote-sensing and LULC applications [78].

CART was implemented using the smileCart() function, which generates a CART decision tree classifier with Gini impurity as the splitting criterion. This specification matrix defines the new pixel based on memorized training data [79].

KNN was implemented using the smileKNN() function. This classifier assigns each pixel to the LULC class of the most analogous training data. The number of neighbors, K, was set to 5 to balance noise/outlier removal and stability in LULC classification [59]. The distance metric was Euclidean.

3.2. Generalization Accuracy Assessment Across Cities

The accuracy of LULC classification was assessed for each iteration across models, feature sets, and seasons using the EOCO cross-validation framework. The evaluation was performed using the GEE platform with the following metrics: overall accuracy (OA), Kappa coefficient (CM), precision, recall, and macro F1 score [80]. OA represents the number of correct LULC pixels divided by the total number of pixels; it was determined using cm.accuracy(). CM quantifies the agreement between the classified and predicted pixels and the reference data. Its value ranges between −1 and 1, with 1 indicating outstanding agreement, 0 indicating random agreement, and −1 indicating complete disagreement. The value was computed using cm.kappa(). User’s accuracy was also used to evaluate probabilities, showing the reliability of LULC class labels from the user’s viewpoint. Precision measures the number of correct positive predictions, while Recall measures the number of positive predictions identified correctly. These metrics were calculated using precisionMean and recallMean. Macro F1 score was computed by averaging the F1 score across classes (urban, barren, vegetation, and water). The objective was to assess the overall effectiveness of a model across the four classes, particularly for minority classes such as vegetation and water. The value was determined using macroF1.

Accuracy assessments included accuracy metrics, confusion matrices, and importance lists for each city, each ML classifier, and each of the five models. This analysis provided a detailed perspective on per-city, season, and class variabilities in arid and semi-arid environments.

3.3. Statistical Analysis

Statistical analysis helps transform numerical data into meaningful information, clarifying differences between groups or paired conditions. In this study, the Friedman test

(x_{F}^{2})

, Kendall’s coefficient of concordance (

W)

, and Wilcoxon signed-rank (

W i

) tests were used to assess differences and patterns across ML models, feature sets, seasons, and feature importance [81,82,83]. The statistical tests were performed using the accuracy assessment metrics to evaluate the characteristics of each group from different viewpoints.

The Friedman test, a nonparametric statistical test, was used to assess whether statistically significant differences exist among the four ML classifiers across the five feature sets. It was also applied to assess the importance of bands within Spectral_Texture and All_Features sets.

x_{F}^{2} = \frac{12}{n k (k + 1)} \sum_{j = 1}^{k} R_{j}^{2} - 3 n (K + 1)

(5)

where

x_{F}^{2}

is the Friedman test statistic,

n

is the number of datasets (termed blocks; in this study, city and season combinations),

k

is the number of groups used for comparison (here, three groups: ML classifiers, feature sets, and importance bands), and

R_{j}

is the sum of ranks for group

j

.

Kendall’s W was applied only to the Spectral_Texture and All_Features sets for Riyadh using the RF model. This is because these two groups achieved the highest accuracy for Riyadh (used as a representative example). In general, this test aims to measure the effect size of the bands within two groups and to explain how strong the agreement between the rankings is. The results of Kendall’s W can be divided into four categories: 0.00–0.30, no agreement; 0.30–0.50, moderate agreement; 0.50–0.70, strong agreement; and 0.70–1.00, perfect agreement. Lower rank values indicate greater importance within the feature group.

W = \frac{12 S}{p^{2} (n^{3} - n) - p T}

(6)

where

W

represents the agreement,

n

is the number of ranked elements,

p

is the number of judges (blocks; in this study, cities and seasons),

S

indicates the variation in rank sums, and

T

is the correction factor for tied ranks.

The Wilcoxon signed-rank test was used to assess significant differences between the paired groups, summer and winter.

W i = m i n (\sum R_{i}^{+}, \sum R_{i}^{-})

(7)

where

R_{i}^{+}

and

R_{i}^{-}

represent the ranks of positive and negative differences, respectively.

4. Results

4.1. Model Performance Metrics

Table 3 presents the performance metrics for the five feature sets during the summer season in Riyadh as an example. The OA for the five feature sets and four ML models fell within the moderate-to-high range. Spectral_Texture yielded the highest accuracy with the RF and SVM classifiers (0.91 and 0.90, respectively). By contrast, the Spectral and Spectral_Topography sets yielded the lowest accuracy with the CART model (0.78 and 0.74), respectively. All classifiers incorporating Spectral_Indices showed moderate accuracy, while the All_Features set showed higher accuracy. RF demonstrated the highest accuracy with the Spectral_Texture and All_Features sets (0.91 and 0.90, respectively). According to the Friedman test, no statistically significant differences were observed among the four models (RF, SVM, CART, and KNN), as the differences between the four classifiers were minimal (

x_{F}^{2}

= 3.0, p-value = 0.391). Table A1 presents the mean ± standard deviation (SD) of the five accuracy metrics across experimental runs for the four cities in the summer seasons. Lower values indicate lower variability and greater model stability, whereas higher values indicate greater variability. RF consistently attained the highest OA across the five feature sets, especially for Spectral_Texture and All_Features. In terms of stability, RF and CART showed the lowest SD, indicating robust performance across the datasets. In comparison, KNN and SVM showed higher variability, especially for All Features and Spectral_Texture. Across feature sets, Spectral_Texture provided the best balance between accuracy and stability, whereas Spectral_Topography exhibited lower performance and greater variability.

4.2. Spatial Generalization

Table 4 presents the overall accuracies of the highest-performing features for each city across ML classifiers in summer and winter. The RF classifier performed well across seasons when combined with the Spectral_Texture features. Spectral_Texture features outperformed the All_Features set, indicating their substantial contribution to accuracy. These results may be attributable to the presence of redundant or less informative variables, which may hinder discrimination between LULC classes when high-dimensional feature sets are used. This indicates that the addition of more variables does not necessarily improve urban land cover classification accuracy.

Dammam displayed higher accuracy in summer and winter (RF (0.95, 0.92), SVM (0.97, 0.81), CART (0.94, 0.91), and KNN (0.96, 0.94)), while Madinah displayed good-to-moderate results (RF (0.85, 0.85), SVM (0.97, 0.81), CART (0.94, 0.91), and KNN (0.96, 0.94)). The Friedman statistical test revealed no significant difference between the feature sets (

x_{F}^{2}

: 4.000, p-value: 0.406). The feature ranking, based on the mean of OA, was as follows: R1: Spectral_Texture (>0.8786), R2: All_Features (>0.8556), R3: Spectral_Indices (>0.8402), R4: Spectral (>0.8180), and R5: Spectral_Topography (0.7983). The best feature set was Spectral_Texture.

Figure 3 shows the distribution of the OA for each city, model, feature, and season using boxplots and the mean value of each feature. Figure 4 shows the performance of the five feature sets.

4.3. Seasonal Performance

Table 5 summarizes the seasonal variability in model performance across the five feature sets using RF, with results for Riyadh presented as an example. The OA for summer was slightly higher than that for winter, especially for the Spectral_Texture and All_Features sets: (0.91, 0.90) for summer and (0.87, 0.86) for winter. Other feature sets also showed moderate OA across both seasons. The mean OA was moderate to high. The RF model for Riyadh maintained stability across seasons and feature sets.

Figure 5 presents the mean of OA and seasonal variability, demonstrating that Spectral_Texture displays the highest performance.

4.4. Feature Importance

Table 6 presents the ranking of feature importance associated with the RF model for Riyadh across summer and winter for two feature sets: Spectral_Texture and All_Features. In the Spectral_Texture case, the spectral bands, particularly B8 (NIR) and B2 (Blue), exhibited the highest importance. The NIR bands contribute strongly to land classification, owing to the sensitivity of NIR to vegetation and the large contrast between the high reflectance of vegetation and lower reflectance of urban areas. The Blue bands help discriminate between land surface features such as dry and wet soil, rocky surfaces, dry vegetation, and small water bodies. Among the texture features, contrast and homogeneity exhibited slightly higher importance than entropy and variance. Contrast helps differentiate the properties of land surface features (urban areas with higher contrast, especially at edges and buildings, and vegetation with lower contrast). Homogeneity enables discrimination between vegetation cover and urban areas. These measures help differentiate spatial structure. In the All_Features set, indices such as NDVI and SAVI values appeared important, both of which help discriminate between vegetation and non-vegetation areas. The importance of texture features increases from feature set (3) to feature set (5), particularly in the entropy layer, given its ability to capture the complexity of spatial features. Among topographic features, slope appeared to be more important than elevation, owing to its higher contribution to land cover classification, given that more detailed local terrain information is available.

Figure 6 shows the key features across cities and seasons for the two highest accuracy sets, namely, Spectral_Texture and All_Features.

Friedman and Kendall’s W tests indicated significant differences in feature importance across cities and seasons (Table 7). For the Spectral_Texture set, a significant difference in feature ranking variability was noted across cities and seasons, with values of (

x_{F}^{2}

= 33.16, p-value = 1.25 × 10⁻⁴) and (

x_{F}^{2}

= 33.92, p-value = 9.2 × 10⁻⁵) for summer and winter, respectively. Strong agreement was noted among the four cities (Kendall’s W = 0.9212 and 0.9424 for summer and winter, respectively). The All_Features set also showed significant differences across cities in summer and winter, with values of (

x_{F}^{2}

= 51.28, p-value = 7.0 × 10⁻⁶ and (

x_{F}^{2}

= 55.47, p-value = 1.0 × 10⁻⁶), respectively. Strong agreement and consistency were noted among the four cities (Kendall’s W = 0.8548 and 0.9246).

5. Discussion

A novel EOCO approach was used to perform multidimensional comparative analysis of the generalization and performance of ML algorithms under different feature sets and seasonal variation. The four study areas (Riyadh, Madinah, Jeddah, and Dammam) are characterized by arid and semi-arid environments, which makes it challenging to assess performance across diverse and organized input features. The EOCO approach involves model training on three cities and testing on an unseen city. The analysis was performed for each city, four ML algorithms (RF, SVM, CART, and KNN), five feature sets (Spectral, Spectral_Indices, Spectral_Texture, Spectral_Topography, and All_Features), and two seasons (summer and winter). The results show that RF algorithms consistently achieved the highest and most stable accuracy across all five feature sets, especially when combined with Spectral_Texture and All_Features. RF offers advantages in ensemble learning and is robust against spectral variations. The Spectral_Texture set achieved the best performance among feature sets, highlighting the importance of contrast and homogeneity across all four classes: urban areas, vegetation cover, barren lands, and water bodies. These results help clarify the contribution of each feature to the generalization process in arid environments.

The superior performance of the Spectral_Texture set can be attributed to the characteristics of urban arid environments, where similar spectral responses commonly occur among bare soil, exposed ground, and urban areas. Therefore, spectral bands alone may be insufficient to distinguish among urban land cover classes. In contrast, adding texture features provides spatial support to ML models, including the arrangement, heterogeneity, and structure of each pixel. Urban areas are typically characterized by a complex spatial structure, including buildings, roads, open spaces, and mixed land cover, which helps generate distinct texture information despite similar spectral responses. Thus, by incorporating texture information with spectral bands, the ML classifier can capture both spectral and spatial information of the urban landscape, leading to improved class separability and classification accuracy. The consistent performance of the Spectral-Texture feature set across the four arid cities suggests that texture information exhibits robust, transferable importance in urban arid environments.

Despite promising results, misclassification in arid and semi-arid environments during summer and winter remains a critical challenge. Analysis of Riyadh using the RF model (most consistent) across the five feature sets indicates that certain classes are misclassified (Table 8). The most frequent misclassifications occur between urban and barren, barren and urban, and water and urban classes. The confusion between urban and barren land is attributable to similarities in spectral responses. The confusion between water and urban classes may be ascribed to spectral similarity. For instance, the darker surfaces of urban areas may resemble water in terms of reflectance. Shadows of built-up areas also appear darker, resulting in their classification as water. Mixed pixels often contain multiple classes, such as water and urban areas. Moreover, atmospheric effects, such as dust, haze, and humidity, may also distort results. In winter, misclassifications are noted between water and barren, water and urban, and urban and barren classes. The confusion between water and barren classes is particularly severe in water. This is because wet soil or clay often exhibits reflectance similar to that of shallow water; salt flats and dry-dark beds display reflectance similar to water; and the low reflectance of dark barren areas, such as dark rock or sand, may be misidentified as water in some bands. Table A2 in Appendix A presents the producers’ and users’ accuracies across feature sets and seasons for Riyadh using RF. The Spectral_Texture and All_Features sets provided higher accuracies than the other features, and the accuracy values in summer are higher than those in winter.

Figure 7 shows the stability analysis (summer versus winter) across the four classes for Riyadh using RF. Results for the barren and vegetation classes were superior to those of other classes during summer and winter, while urban classes showed slightly better performance in winter than in summer. This highlights the importance of carefully considering water classes, particularly in winter. The seasonal variations observed between summer and winter are unlikely to be influenced by vegetation phenology, as the vegetation cover in arid and semi-arid lands is sparse and often maintained by irrigation. Instead, these variations may be attributable to differences in solar illumination geometry and solar zenith angle, which influence the surface reflectance and extent of building shadows in urban areas. Additionally, seasonal atmospheric conditions, including dust and sand aerosols, which commonly occur in the Arabian Peninsula, can influence image quality and spectral response despite atmospheric correction processes. Variations in irrigation management and vegetation health within the urban areas may also contribute to the seasonal spectral differences. These factors may explain the seasonal variations identified by the Wilcoxon signed-rank test.

The findings provide valuable insights into the role of ML algorithms across five feature sets and two seasons (summer and winter). Achieving high accuracy and high-quality mapping, especially in challenging environments such as arid regions, is crucial for supporting studies on sustainable urban growth and climatic monitoring, especially under the backdrop of rapid changes in the Earth’s surface and climate. Previous studies have investigated the generalization of LULC classes using different datasets, methods, and ML models. However, they largely overlooked the performance of ML models in generalization, which is essential for uncovering the model’s strengths and weaknesses when using spatial and temporal variables [55,57,60]. For example, Shafizadeh-Moghaddam [60] assessed spatial and temporal generalization in urban growth modeling. The results confirmed that RF achieved high calibration accuracy, but its performance degraded during validation. SVM showed the opposite trend. In contrast, this study demonstrates that the RF is the best generalization model, with the SVM displaying moderate performance (Table 4). Another study, conducted in the Cerrado and Amazon biomes in Brazil, using MODIS satellite images, assessed temporal generalization using RF, a CNN (TempCNN), and a lightweight temporal attention encoder (L-TAE). RF achieved higher accuracy and consistently performed better across the agricultural land cover classes [57]. The excellent performance of RF is evident in both densely vegetated areas (Brazilian Cerrado and Amazon biomes) and arid environments (Saudi Arabia) characterized by desert-dominant, sparse vegetation. Huang et al. [55] investigated LULC classification in a subtropical karst environment using remote sensing. RF and SVM performed and generalized well under specific conditions, especially under limited sample size and data availability. The results indicate their robustness, aligning with the findings of the present study.

At the class level, despite advances in ML models for LULC classification, misclassification errors, owing to spectral variations, remain [84]. In this study, the most prevalent error was the misclassification of barren land as urban areas, especially in regions featuring rocky and sandy mountains. This issue was also reported by Aljaddani et al. [4], who performed Landsat time-series analysis in arid and semi-arid environments, during the processing of the CCDC time series [4]. Among topographic effects, mountainous areas with shadows—such as in Madinah—were misclassified as water. This is because shadowed regions in satellite images appear similar to water in terms of spectral signature. This issue was also reported by Huang et al. [55]. Moreover, the sparse vegetation in arid regions, with a low spectral signature, contributes to misclassification errors [54]. Additionally, small and limited LULC classes are often underrepresented. Classes such as vegetation and water are often indistinguishable, particularly in areas with spectral similarity, as observed in the misclassification of urban and barren classes. This represents a common challenge in multi-class classification with imbalanced LULC distributions. These challenges must be addressed in future research to achieve higher accuracy and optimize spatial and temporal generalization performance.

The spatial resolution of Sentinel-2 data for the visible, NIR, SWIR1, and SWIR2 bands (10 m) is well suited for training data collection. However, several limitations restrict the scope of this work. First, the sample size was limited to four cities (Riyadh, Madinah, Jeddah, and Dammam), which hindered the comprehensive analyses of spatial, seasonal, and statistical generalization. Although no statistically significant difference was noted in this work, expanding the analysis to more diverse areas could enhance assessment. Second, the imbalance between the four classes affected performance. Barren land was the dominant class, followed by urban areas, while vegetation and water classes were limited, especially in inner cities like Riyadh and Madinah. Also, the training set shows an imbalance, especially for the water class, which is substantially underrepresented in Madinah compared with other LULC classes. The limited availability of water samples may have constrained the EOCO model’s ability to capture the full spectral variability of water bodies. This results in increased confusion with neighboring LULC classes and reduced class-specification performance. Even though the overall accuracy remains high, this metric is influenced by the prevalence of other LULC classes (Barren, Urban, and Vegetation), which may reflect the main challenge in precisely determining the minority classes. Thus, the reported accuracy should be analyzed in considering the imbalance.

Third, the seasonal analysis was limited to one year (2025), which restricted the assessment of long-term seasonal generalization. Incorporating data from multiple years would help improve interpretation. Fourth, although training points provide good results, they may not sufficiently represent mixed pixels and class heterogeneity. Moreover, such samples are highly sensitive to geolocation errors and limit the diversity of pixels per object. In addition, the Spectral_Topography feature set incorporated both elevation and slope derived from the NASA SRTM DEM, which has an original spatial resolution of 30 m. To ensure consistency with Sentinel-2 imagery, the DEM was resampled to 10 m for feature extraction. Although this processing facilitated the integration of topographic and spectral features, it did not enhance the spatial detail of the DEM. Consequently, the finer-scale details may not have been adequately captured. The use of higher-resolution evaluation datasets such as the JAXA ALOS World 3D (AW3D30) could help assess the contribution of topographic information to LULC classification. Future work must address these limitations by increasing the sample size to cover more global arid and semi-arid cities, using deep-learning models, and expanding the temporal window to include multiyear data. These improvements will provide a comprehensive understanding of generalization and model performance and also enable meaningful statistical analysis of spatial and seasonal conditions. Increasing the training samples and attempting to address class imbalance can help mitigate misclassification.

Overall, this study provides meaningful, informative results, highlighting that the RF model using the Spectral_Texture feature set achieves high-quality mapping and serves as a first-tier macro-monitoring instrument that can facilitate urban land cover monitoring, natural resource management, environmental monitoring, and disaster management in arid regions. Improved accuracy and mapping quality will benefit decision-making in government and private sectors, contributing to the realization of urban Sustainable Development Goals.

6. Conclusions

A multidimensional comparative analysis was performed using a novel EOCO approach to evaluate generalization across four ML models (RF, SVM, CART, and KNN) and five feature sets for summer and winter 2025 (short-term seasonal analysis). The analysis was conducted for each city, ensuring that every city was tested once for a given model, feature set, and season. Sentinel-2 surface reflectance data (visible and NIR bands) were used to generate spectral indices and texture features. These features were integrated with spectral data to construct feature sets (2) and (3). Additionally, topographic data (elevation and slope) were integrated with spectral data to construct feature set (4). All of these features were combined into feature set (5). Three statistical tests were employed to assess the models, feature sets, and important features. The results confirmed that RF provided consistent and higher performance, while other models performed moderately across the feature sets. The Spectral_Texture and All_Features sets perform the best across the cities. Most misclassifications that occur during the summer pertain to the urban and barren classes due to spectral similarities between the two. Misclassifications during the winter pertain to water and barren land, as well as water and urban areas. These areas must be prioritized to enhance classification accuracy. This work has considerable implications for scientific communities and decision makers in focusing on environmental and human needs. The proposed benchmark can contribute to Sustainable Development Goals aimed at safeguarding the environment and humanity [85,86].

Funding

This work was funded by the University of Jeddah, Jeddah, Saudi Arabia, under grant No. (UJ-25-DR-2568).

Data Availability Statement

The datasets used in this study are publicly available through the Google Earth Engine (GEE) platform. The processed data and outputs generated during the current study are available from the corresponding author upon request.

Acknowledgments

The author gratefully acknowledges the University of Jeddah for its technical and financial support. Sincere thanks are also extended to the anonymous reviewers for their insightful comments and constructive feedback, which have significantly contributed to the improvement of this article.

Conflicts of Interest

The author declare no conflict of interest.

Appendix A

Table A1. Mean ± standard deviation across experimental runs for four cities and five accuracy matrices in summer. Note: Boldfaced values indicate the best performance.

Feature Sets	Model	OA	Kappa	MacroF1	Precision	Recall
Spectral	RF	0.858 ± 0.058	0.788 ± 0.092	0.862 ± 0.07	0.889 ± 0.057	0.871 ± 0.061
	SVM	0.798 ± 0.046	0.701 ± 0.083	0.82 ± 0.072	0.87 ± 0.035	0.829 ± 0.057
	CART	0.825 ± 0.057	0.74 ± 0.087	0.849 ± 0.052	0.867 ± 0.041	0.849 ± 0.059
	KNN	0.85 ± 0.066	0.774 ± 0.106	0.854 ± 0.069	0.885 ± 0.053	0.858 ± 0.068
Spectral_Indices	RF	0.838 ± 0.093	0.753 ± 0.149	0.834 ± 0.105	0.871 ± 0.078	0.856 ± 0.087
	SVM	0.811 ± 0.041	0.716 ± 0.075	0.828 ± 0.059	0.893 ± 0.019	0.831 ± 0.058
	CART	0.842 ± 0.079	0.764 ± 0.127	0.842 ± 0.09	0.856 ± 0.087	0.865 ± 0.079
	KNN	0.85 ± 0.066	0.774 ± 0.106	0.854 ± 0.069	0.885 ± 0.053	0.858 ± 0.068
Spectral_Texture	RF	0.907 ± 0.039	0.861 ± 0.063	0.899 ± 0.06	0.917 ± 0.054	0.905 ± 0.04
	SVM	0.893 ± 0.099	0.838 ± 0.156	0.864 ± 0.155	0.899 ± 0.123	0.892 ± 0.082
	CART	0.874 ± 0.033	0.813 ± 0.054	0.878 ± 0.048	0.89 ± 0.037	0.88 ± 0.058
	KNN	0.877 ± 0.059	0.816 ± 0.091	0.878 ± 0.064	0.904 ± 0.049	0.881 ± 0.061
Spectral_Topography	RF	0.829 ± 0.088	0.741 ± 0.142	0.819 ± 0.139	0.876 ± 0.083	0.846 ± 0.084
	SVM	0.799 ± 0.071	0.704 ± 0.113	0.773 ± 0.164	0.842 ± 0.111	0.824 ± 0.073
	CART	0.8 ± 0.072	0.7 ± 0.11	0.817 ± 0.073	0.858 ± 0.053	0.817 ± 0.082
	KNN	0.836 ± 0.087	0.75 ± 0.145	0.829 ± 0.126	0.885 ± 0.069	0.846 ± 0.083
All_Features	RF	0.903 ± 0.043	0.856 ± 0.069	0.894 ± 0.07	0.912 ± 0.062	0.904 ± 0.045
	SVM	0.876 ± 0.055	0.817 ± 0.084	0.848 ± 0.126	0.876 ± 0.105	0.886 ± 0.049
	CART	0.89 ± 0.045	0.837 ± 0.072	0.881 ± 0.081	0.889 ± 0.085	0.899 ± 0.045
	KNN	0.848 ± 0.096	0.769 ± 0.154	0.838 ± 0.131	0.896 ± 0.07	0.855 ± 0.089

Table A2. Producers’ (PA) and users’ accuracy in percentage across feature sets and seasons for Riyadh using RF.

Feature Set	LULC Class	Summer		Winter
		PA	UA	PA	UA
Spectral	Urban	69.11	90.11	72.36	96.04
	Barren	97.25	68.51	99.22	65.21
	Vegetation	100.00	99.47	97.87	97.35
	Water	76.60	100.00	54.26	100.00
Spectral_Indices	Urban	68.83	95.49	73.71	95.77
	Barren	98.82	68.29	98.82	67.74
	Vegetation	100.00	95.92	98.40	96.86
	Water	79.79	100.00	62.77	100.00
Spectral_Texture	Urban	88.62	90.58	86.99	89.66
	Barren	96.47	85.12	92.55	77.12
	Vegetation	100.00	98.95	97.34	95.81
	Water	70.21	100.00	54.26	100.00
Spectral_Topography	Urban	62.60	87.83	66.94	94.64
	Barren	95.29	63.78	99.22	61.86
	Vegetation	100.00	100.00	97.34	98.39
	Water	78.72	100.00	53.19	100.00
All_Features	Urban	87.26	92.00	82.38	90.75
	Barren	96.86	84.01	95.29	75.23
	Vegetation	100.00	95.92	97.87	96.34
	Water	70.21	100.00	60.64	100.00

References

Mashala, M.J.; Dube, T.; Mudereri, B.T.; Ayisi, K.K.; Ramudzuli, M.R. A Systematic Review on Advancements in Remote Sensing for Assessing and Monitoring Land Use and Land Cover Changes Impacts on Surface Water Resources in Semi-Arid Tropical Environments. Remote Sens. 2023, 15, 3926. [Google Scholar] [CrossRef]
Aljaddani, A.H. Evaluation of the Land Use Land Cover Impact on Surface Temperature and Urban Thermal Comfort: Insight from Saudi Arabia’s Five Most Populated Cities (2000–2024). Urban Sci. 2026, 10, 157. [Google Scholar] [CrossRef]
Sibandze, P.; Kalumba, A.M.; Aljaddani, A.H.; Zhou, L.; Afuye, G.A. Geospatial Mapping and Meteorological Flood Risk Assessment: A Global Research Trend Analysis. Environ. Manag. 2025, 75, 137–154. [Google Scholar] [CrossRef]
Aljaddani, A.H.; Song, X.-P.; Zhu, Z. Characterizing the Patterns and Trends of Urban Growth in Saudi Arabia’s 13 Capital Cities Using a Landsat Time Series. Remote Sens. 2022, 14, 2382. [Google Scholar] [CrossRef]
Gadal, S.; Mozgeris, G. Advances of Remote Sensing in Land Cover and Land Use Mapping. Remote Sens. 2025, 17, 1980. [Google Scholar] [CrossRef]
Shiraishi, T.; Motohka, T.; Thapa, R.B.; Watanabe, M.; Shimada, M. Comparative Assessment of Supervised Classifiers for Land Use–Land Cover Classification in a Tropical Region Using Time-Series PALSAR Mosaic Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 1186–1199. [Google Scholar] [CrossRef]
Ge, G.; Shi, Z.; Zhu, Y.; Yang, X.; Hao, Y. Land Use/Cover Classification in an Arid Desert-Oasis Mosaic Landscape of China Using Remote Sensed Imagery: Performance Assessment of Four Machine Learning Algorithms. Glob. Ecol. Conserv. 2020, 22, e00971. [Google Scholar] [CrossRef]
Ouma, Y.; Nkwae, B.; Moalafhi, D.; Odirile, P.; Parida, B.; Anderson, G.; Qi, J. comparison of machine learning classifiers for multitemporal and multisensor mapping of urban LULC features. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2022, 43, 681–689. [Google Scholar] [CrossRef]
Abdi, A.M. Land Cover and Land Use Classification Performance of Machine Learning Algorithms in a Boreal Landscape Using Sentinel-2 Data. GIsci. Remote Sens. 2020, 57, 1–20. [Google Scholar] [CrossRef]
El Mjiri, I.; Rahimi, A.; Bouasria, A.; Bounif, M.; Boulanouar, W. Long-Term LULC Monitoring in El Jadida, Morocco (1985–2020): A Machine Learning-Based Comparative Analysis. ISPRS Int. J. Geoinf. 2025, 14, 445. [Google Scholar] [CrossRef]
Talukdar, S.; Singha, P.; Mahato, S.; Shahfahad; Pal, S.; Liou, Y.-A.; Rahman, A. Land-Use Land-Cover Classification by Machine Learning Classifiers for Satellite Observations—A Review. Remote Sens. 2020, 12, 1135. [Google Scholar] [CrossRef]
Nabinejad, S.; Schüttrumpf, H. Flood Risk Management in Arid and Semi-Arid Areas: A Comprehensive Review of Challenges, Needs, and Opportunities. Water 2023, 15, 3113. [Google Scholar] [CrossRef]
Wang, Z.; Xiong, H.; Zhang, F.; Qiu, Y.; Ma, C. Sustainable Development Assessment of Ecological Vulnerability in Arid Areas under the Influence of Multiple Indicators. J. Clean. Prod. 2024, 436, 140629. [Google Scholar] [CrossRef]
Ali, A.; Bilal, M. A Comprehensive Review of GIS and Remote Sensing Applications in Assessing Land Use and Land Cover Impacts on Groundwater Systems. Environ. Sci. Pollut. Res. 2025, 32, 18631–18652. [Google Scholar] [CrossRef] [PubMed]
Asam, S.; Eisfelder, C.; Hirner, A.; Reiners, P.; Holzwarth, S.; Bachmann, M. AVHRR NDVI Compositing Method Comparison and Generation of Multi-Decadal Time Series—A TIMELINE Thematic Processor. Remote Sens. 2023, 15, 1631. [Google Scholar] [CrossRef]
Pei, J.; Wang, L.; Wang, X.; Niu, Z.; Kelly, M.; Song, X.-P.; Huang, N.; Geng, J.; Tian, H.; Yu, Y.; et al. Time Series of Landsat Imagery Shows Vegetation Recovery in Two Fragile Karst Watersheds in Southwest China from 1988 to 2016. Remote Sens. 2019, 11, 2044. [Google Scholar] [CrossRef]
Kumar, R.; Aneesh, K.S.; Ajay, K.V.; Murali, K.V.; Kundariati, M. Comparative Evaluation of Machine Learning Algorithms for LULC Classification Using Sentinel-1 and Sentinel-2 Imagery. In Application of Machine Learning in Earth Sciences: A Practical Approach; Springer Nature: Cham, Switzerland, 2026; pp. 585–596. [Google Scholar]
Islam, M.K.; Simic Milas, A.; Abeysinghe, T.; Tian, Q. Integrating UAV-Derived Information and WorldView-3 Imagery for Mapping Wetland Plants in the Old Woman Creek Estuary, USA. Remote Sens. 2023, 15, 1090. [Google Scholar] [CrossRef]
Figliomeni, F.G.; Parente, C. Bathymetry from Satellite Images: A Proposal for Adapting the Band Ratio Approach to IKONOS Data. Appl. Geomat. 2023, 15, 565–581. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Breiman, L.; Friedman, J.; Olshen, R.; Stone, C. Classification and Regression Trees, 1st ed.; Wadsworth International Group: New York, NY, USA, 1984. [Google Scholar]
Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Cover, T.; Hart, P. Nearest Neighbor Pattern Classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
Zafar, Z.; Zubair, M.; Zha, Y.; Fahd, S.; Ahmad Nadeem, A. Performance Assessment of Machine Learning Algorithms for Mapping of Land Use/Land Cover Using Remote Sensing Data. Egypt. J. Remote Sens. Space Sci. 2024, 27, 216–226. [Google Scholar] [CrossRef]
Kumar, A.; Kumar Gorai, A. A Comparative Evaluation of Deep Convolutional Neural Network and Deep Neural Network-Based Land Use/Land Cover Classifications of Mining Regions Using Fused Multi-Sensor Satellite Data. Adv. Space Res. 2023, 72, 4663–4676. [Google Scholar] [CrossRef]
Carranza-García, M.; García-Gutiérrez, J.; Riquelme, J.C. A Framework for Evaluating Land Use and Land Cover Classification Using Convolutional Neural Networks. Remote Sens. 2019, 11, 274. [Google Scholar] [CrossRef]
Sun, Z.; Di, L.; Fang, H. Using Long Short-Term Memory Recurrent Neural Network in Land Cover Classification on Landsat and Cropland Data Layer Time Series. Int. J. Remote Sens. 2019, 40, 593–614. [Google Scholar] [CrossRef]
Das, B.; Prasad, J. Cellular Automata (CA) and AI-Based Recurrent Neural Networks (RNNs) Approaches in Land Use Land Cover (LULC) Change Dynamics Using Multi-Spectral and Multi-Decadal Landsat Data in Haldia, India. Remote Sens. Earth Syst. Sci. 2025, 8, 1223–1243. [Google Scholar] [CrossRef]
Bao, H.; Zerres, V.H.D.; Lehnert, L.W. Deep Siamese Network for Annual Change Detection in Beijing Using Landsat Satellite Data. Int. J. Appl. Earth Obs. Geoinf. 2024, 130, 103897. [Google Scholar] [CrossRef]
Zhu, Q.; Guo, X.; Deng, W.; Shi, S.; Guan, Q.; Zhong, Y.; Zhang, L.; Li, D. Land-Use/Land-Cover Change Detection Based on a Siamese Global Learning Framework for High Spatial Resolution Remote Sensing Imagery. ISPRS J. Photogramm. Remote Sens. 2022, 184, 63–78. [Google Scholar] [CrossRef]
Hasnaoui, Y.; Tachi, S.E.; Bouguerra, H.; Yaseen, Z.M. Transfer Learning-Based Deep Learning Models for Flood and Erosion Detection in Coastal Area of Algeria. Earth Sci. Inform. 2025, 18, 380. [Google Scholar] [CrossRef]
Kommula, S.P.; Singh, R.; Lohani, B.; Ryu, D.; Winter, S. Transfer Learning for Identifying Rainwater Harvesting Sites in Training Data-Scarce Catchments. Sci. Rep. 2026. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Zong, R.; Han, J.; Zheng, H.; Lou, Q.; Zhang, D.; Wang, D. TransLand: An Adversarial Transfer Learning Approach for Migratable Urban Land Usage Classification Using Remote Sensing. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data); IEEE: Los Angeles, CA, USA, 2019; pp. 1567–1576. [Google Scholar]
Stamou, A.; Stylianidis, E. Urban Monitoring from the Cloud: A Review of Google Earth Engine (GEE)-Based Approaches for Assessing Urban Environmental Indices. Geographies 2025, 5, 68. [Google Scholar] [CrossRef]
Liang, J.; Xie, Y.; Sha, Z.; Zhou, A. Modeling Urban Growth Sustainability in the Cloud by Augmenting Google Earth Engine (GEE). Comput. Environ. Urban Syst. 2020, 84, 101542. [Google Scholar] [CrossRef]
Aljaddani, A.H. Geospatial Analysis of Patterns and Trends of Mangrove Forest in Saudi Arabia: Identifying At-Risk Zone-Based Land Use. Sustainability 2025, 17, 5957. [Google Scholar] [CrossRef]
Hird, J.N.; Kariyeva, J.; McDermid, G.J. Satellite Time Series and Google Earth Engine Democratize the Process of Forest-Recovery Monitoring over Large Areas. Remote Sens. 2021, 13, 4745. [Google Scholar] [CrossRef]
Mohanty, S.; Wasim, M.; Pandey, P.C. Google Earth Engine Applications for Monitoring Forest Ecosystem Services. In Advanced Geospatial and Ground Based Techniques in Forest Monitoring; Elsevier: Amsterdam, The Netherlands, 2026; pp. 191–211. [Google Scholar]
Bar, S.; Parida, B.R.; Pandey, A.C. Landsat-8 and Sentinel-2 Based Forest Fire Burn Area Mapping Using Machine Learning Algorithms on GEE Cloud Platform over Uttarakhand, Western Himalaya. Remote Sens. Appl. 2020, 18, 100324. [Google Scholar] [CrossRef]
Yilmaz, O.S.; Acar, U.; Sanli, F.B.; Gulgen, F.; Ates, A.M. Mapping Burn Severity and Monitoring CO Content in Türkiye’s 2021 Wildfires, Using Sentinel-2 and Sentinel-5P Satellite Data on the GEE Platform. Earth Sci. Inform. 2023, 16, 221–240. [Google Scholar] [CrossRef] [PubMed]
Roteta, E.; Bastarrika, A.; Franquesa, M.; Chuvieco, E. Landsat and Sentinel-2 Based Burned Area Mapping Tools in Google Earth Engine. Remote Sens. 2021, 13, 816. [Google Scholar] [CrossRef]
Kumar, L.; Mutanga, O. Google Earth Engine Applications Since Inception: Usage, Trends, and Potential. Remote Sens. 2018, 10, 1509. [Google Scholar] [CrossRef]
Liu, Z.; Liu, H.; Luo, C.; Yang, H.; Meng, X.; Ju, Y.; Guo, D. Rapid Extraction of Regional-Scale Agricultural Disasters by the Standardized Monitoring Model Based on Google Earth Engine. Sustainability 2020, 12, 6497. [Google Scholar] [CrossRef]
Ghosh, S.; Kumar, D.; Kumari, R. Cloud-Based Large-Scale Data Retrieval, Mapping, and Analysis for Land Monitoring Applications with Google Earth Engine (GEE). Environ. Chall. 2022, 9, 100605. [Google Scholar] [CrossRef]
Feng, L.; Hussain, S.; Pricope, N.G.; Arshad, S.; Tariq, A.; Feng, L.; Mubeen, M.; Aslam, R.W.; Fnais, M.S.; Li, W.; et al. Seasonal Dynamics in Land Surface Temperature in Response to Land Use Land Cover Changes Using Google Earth Engine. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 17983–17997. [Google Scholar] [CrossRef]
Jamei, Y.; Seyedmahmoudian, M.; Jamei, E.; Horan, B.; Mekhilef, S.; Stojcevski, A. Investigating the Relationship between Land Use/Land Cover Change and Land Surface Temperature Using Google Earth Engine; Case Study: Melbourne, Australia. Sustainability 2022, 14, 14868. [Google Scholar] [CrossRef]
Tamiminia, H.; Salehi, B.; Mahdianpari, M.; Quackenbush, L.; Adeli, S.; Brisco, B. Google Earth Engine for Geo-Big Data Applications: A Meta-Analysis and Systematic Review. ISPRS J. Photogramm. Remote Sens. 2020, 164, 152–170. [Google Scholar] [CrossRef]
Ganem, K.A.; Xue, Y.; de Almeida, A.; Franca-Rocha, W.; de Oliveira, M.T.; de Carvalho, N.S.; Cayo, E.Y.T.; Rosa, M.R.; Dutra, A.C.; Shimabukuro, Y.E. Mapping South America’s Drylands through Remote Sensing—A Review of the Methodological Trends and Current Challenges. Remote Sens. 2022, 14, 736. [Google Scholar] [CrossRef]
Aryal, J.; Sitaula, C.; Frery, A.C. Land Use and Land Cover (LULC) Performance Modeling Using Machine Learning Algorithms: A Case Study of the City of Melbourne, Australia. Sci. Rep. 2023, 13, 13510. [Google Scholar] [CrossRef] [PubMed]
Basheer, S.; Wang, X.; Farooque, A.A.; Nawaz, R.A.; Liu, K.; Adekanmbi, T.; Liu, S. Comparison of Land Use Land Cover Classifiers Using Different Satellite Imagery and Machine Learning Techniques. Remote Sens. 2022, 14, 4978. [Google Scholar] [CrossRef]
Camargo, F.F.; Sano, E.E.; Almeida, C.M.; Mura, J.C.; Almeida, T. A Comparative Assessment of Machine-Learning Techniques for Land Use and Land Cover Classification of the Brazilian Tropical Savanna Using ALOS-2/PALSAR-2 Polarimetric Images. Remote Sens. 2019, 11, 1600. [Google Scholar] [CrossRef]
Yimer, S.M.; Bouanani, A.; Kumar, N.; Tischbein, B.; Borgemeister, C. Comparison of Different Machine-Learning Algorithms for Land Use Land Cover Mapping in a Heterogenous Landscape over the Eastern Nile River Basin, Ethiopia. Adv. Space Res. 2024, 74, 2180–2199. [Google Scholar] [CrossRef]
Subedi, M.R.; Portillo-Quintero, C.; McIntyre, N.E.; Kahl, S.S.; Cox, R.D.; Perry, G.; Song, X. Ensemble Machine Learning on the Fusion of Sentinel Time Series Imagery with High-Resolution Orthoimagery for Improved Land Use/Land Cover Mapping. Remote Sens. 2024, 16, 2778. [Google Scholar] [CrossRef]
Arfa, A.; Minaei, M. Utilizing Multitemporal Indices and Spectral Bands of Sentinel-2 to Enhance Land Use and Land Cover Classification with Random Forest and Support Vector Machine. Adv. Space Res. 2024, 74, 5580–5590. [Google Scholar] [CrossRef]
Huang, D.; Zhou, Z.; Zhang, Z.; Dai, Q.; Lu, H.; Li, Y.; Huang, Y. Land Use/Land Cover Remote Sensing Classification in Complex Subtropical Karst Environments: Challenges, Methodological Review, and Research Frontiers. Appl. Sci. 2025, 15, 9641. [Google Scholar] [CrossRef]
Putty, A.; Annappa, B.; Pariserum Perumal, S. Semantic Segmentation of Remotely Sensed Images for Land-Use and Land-Cover Classification: A Comprehensive Review. IETE Tech. Rev. 2025, 42, 222–237. [Google Scholar] [CrossRef]
Shibuya, D.H.; Esquerdo, J.C.D.M.; Werner, J.P.S.; Tavares, A.S.; Felix, F.C. Cross-Temporal Domain Generalization Approach Combined with Self-Organizing-Maps for Classifying MODIS Time Series Data. Int. J. Remote Sens. 2025, 46, 5802–5831. [Google Scholar] [CrossRef]
Ali, K.; Johnson, B.A. Land-Use and Land-Cover Classification in Semi-Arid Areas from Medium-Resolution Remote-Sensing Imagery: A Deep Learning Approach. Sensors 2022, 22, 8750. [Google Scholar] [CrossRef] [PubMed]
Cai, Y.; Li, X.; Zhang, M.; Lin, H. Mapping Wetland Using the Object-Based Stacked Generalization Method Based on Multi-Temporal Optical and SAR Data. Int. J. Appl. Earth Obs. Geoinf. 2020, 92, 102164. [Google Scholar] [CrossRef]
Shafizadeh-Moghadam, H.; Valavi, R.; Asghari, A.; Minaei, M.; Murayama, Y. On the Spatiotemporal Generalization of Machine Learning and Ensemble Models for Simulating Built-up Land Expansion. Trans. GIS 2022, 26, 1080–1097. [Google Scholar] [CrossRef]
Ramezan, C.A.; Warner, T.A.; Maxwell, A.E. Evaluation of Sampling and Cross-Validation Tuning Strategies for Regional-Scale Machine Learning Classification. Remote Sens. 2019, 11, 185. [Google Scholar] [CrossRef]
Airola, A.; Pohjankukka, J.; Torppa, J.; Middleton, M.; Nykänen, V.; Heikkonen, J.; Pahikkala, T. The Spatial Leave-Pair-out Cross-Validation Method for Reliable AUC Estimation of Spatial Classifiers. Data Min. Knowl. Discov. 2019, 33, 730–747. [Google Scholar] [CrossRef]
Haddad, K.; Rahman, A.; A Zaman, M.; Shrestha, S. Applicability of Monte Carlo Cross Validation Technique for Model Development and Validation Using Generalised Least Squares Regression. J. Hydrol. 2013, 482, 119–128. [Google Scholar] [CrossRef]
Hazaea, B.Y.; Alamri, A.M.; Fnais, M.S.; Abdelrahman, K. Engineering Site Characterization of Al-Madinah Al-Munawarah, Saudi Arabia, for Sustainable Urban Development. Sustainability 2024, 17, 9. [Google Scholar] [CrossRef]
Alhothali, A.; Alwated, B.; Faisal, K.; Alshammari, S.; Alotaibi, R.; Alghanmi, N.; Bamasag, O.; Bin Yamin, M. Location-Allocation Model to Improve the Distribution of COVID-19 Vaccine Centers in Jeddah City, Saudi Arabia. Int. J. Environ. Res. Public Health 2022, 19, 8755. [Google Scholar] [CrossRef] [PubMed]
Dano, U.L.; Abubakar, I.R.; AlShihri, F.S.; Ahmed, S.M.S.; Alrawaf, T.I.; Alshammari, M.S. A Multi-Criteria Assessment of Climate Change Impacts on Urban Sustainability in Dammam Metropolitan Area, Saudi Arabia. Ain Shams Eng. J. 2023, 14, 102062. [Google Scholar] [CrossRef]
European Union/ESA/Copernicus Harmonized Sentinel-2 MSI: MultiSpectral Instrument, Level-2A (SR). Available online: https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR_HARMONIZED (accessed on 7 May 2026).
Louis, J.; Debaecker, V.; Pflug, B.; Main-Knorn, M.; Bieniarz, J.; Mueller-Wilm, U.; Cadau, E.; Gascon, F. SENTINEL-2 SEN2COR: L2A Processor for Users. In Proceedings of the Living Planet Symposium 2016 (SP-740); European Space Agency (ESA), Ed.; European Space Agency (ESA): Prague, Czech Republic, 2016; pp. 1–8. [Google Scholar]
GEE Resampling and Reducing Resolution. Available online: https://developers.google.com/earth-engine/guides/resample#pixel_weights_for_reduceresolution (accessed on 14 January 2026).
Xu, H.; Su, G.; Li, C.; Deng, W. The Differences, Advantages, and Disadvantages of Various Image Compositing Methods on the Google Earth Engine Platform: An Exploration. Geo-Spat. Inf. Sci. 2025, 1–28. [Google Scholar] [CrossRef]
Farr, T.G.; Rosen, P.A.; Caro, E.; Crippen, R.; Duren, R.; Hensley, S.; Kobrick, M.; Paller, M.; Rodriguez, E.; Roth, L.; et al. The Shuttle Radar Topography Mission. Rev. Geophys. 2007, 45. [Google Scholar] [CrossRef]
EARTHDATA Sentinel-2 MSI. Available online: https://www.earthdata.nasa.gov/data/instruments/sentinel-2-msi (accessed on 14 January 2026).
Xu, H. Modification of Normalised Difference Water Index (NDWI) to Enhance Open Water Features in Remotely Sensed Imagery. Int. J. Remote Sens. 2006, 27, 3025–3033. [Google Scholar] [CrossRef]
Huete, A.R. A Soil-Adjusted Vegetation Index (SAVI). Remote Sens. Environ. 1988, 25, 295–309. [Google Scholar] [CrossRef]
Aljaddani, A.H. Integrationof Multi-Temporal Remote Sensing Imagery and GIS for Mapping and Analysis of Land Use Change in Jeddah City, Saudi Arabia. Master’s Thesis, Murray State University, Murray, KY, USA, 2015. [Google Scholar]
Lan, Z.; Liu, Y. Study on Multi-Scale Window Determination for GLCM Texture Description in High-Resolution Remote Sensing Image Geo-Analysis Supported by GIS and Domain Knowledge. ISPRS Int. J. Geoinf. 2018, 7, 175. [Google Scholar] [CrossRef]
Lodato, F.; Colonna, N.; Pennazza, G.; Praticò, S.; Santonico, M.; Vollero, L.; Pollino, M. Analysis of the Spatiotemporal Urban Expansion of the Rome Coastline through GEE and RF Algorithm, Using Landsat Imagery. ISPRS Int. J. Geoinf. 2023, 12, 141. [Google Scholar] [CrossRef]
Awad, M. Google Earth Engine (GEE) Cloud Computing Based Crop Classification Using Radar, Optical Images and Support Vector Machine Algorithm (SVM). In Proceedings of the 2021 IEEE 3rd International Multidisciplinary Conference on Engineering Technology (IMCET), Beirut, Lebanon, 8–10 December 2021. [Google Scholar]
Abdelsamie, E.A.; Mustafa, A.A.; El-Sorogy, A.S.; Maswada, H.F.; Almadani, S.A.; Shokr, M.S.; El-Desoky, A.I.; Meroño de Larriva, J.E. Current and Potential Land Use/Land Cover (LULC) Scenarios in Dry Lands Using a CA-Markov Simulation Model and the Classification and Regression Tree (CART) Method: A Cloud-Based Google Earth Engine (GEE) Approach. Sustainability 2024, 16, 11130. [Google Scholar] [CrossRef]
Hejmanowska, B.; Michałowska, K.; Kramarczyk, P.; Głowienka, E. The Potential of U-Net in Detecting Mining Activity: Accuracy Assessment Against GEE Classifiers. Appl. Sci. 2025, 15, 9785. [Google Scholar] [CrossRef]
Kendall, M.G.; Smith, B.B. The Problem of m Rankings. Ann. Math. Stat. 1939, 10, 275–287. [Google Scholar] [CrossRef]
Wilcoxon, F. Individual Comparisons by Ranking Methods. In Breakthroughs in Statistics: Methodology and Distribution; Springer: New York, NY, USA, 1992; pp. 196–202. [Google Scholar]
Friedman, M. The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance. J. Am. Stat. Assoc. 1937, 32, 675–701. [Google Scholar] [CrossRef]
Krivoguz, D.; Chernyi, S.G.; Zinchenko, E.; Silkin, A.; Zinchenko, A. Using Landsat-5 for Accurate Historical LULC Classification: A Comparison of Machine Learning Models. Data 2023, 8, 138. [Google Scholar] [CrossRef]
UN Environment Program Goal 11: Sustainable Cities and Communities. Available online: https://www.unep.org/topics/sustainable-development-goals/why-do-sustainable-development-goals-matter/goal-11-0#:~:text=Goal%2011%20and%20the%20environment,universal%20access%20to%20basic%20services (accessed on 1 April 2026).
ICLEI Cities and Sustainable Development Goals. Available online: https://www.local2030.org/library/232/ICLEI-Briefing-Sheets-02-Cities-and-the-Sustainable-Development-Goals.pdf#:~:text=Sustainable%20Development%20Goal%2011%2C%20also%20known%20as,that%20is%20location%2Dspecific%20at%20a%20manageable%20scale (accessed on 1 April 2026).

Figure 1. (A) Geographical locations of the four selected cities. Sentinel-2 false color composites (RGB: 8-4-3) for (B) Riyadh, (C) Dammam, (D) Madinah, and (E) Jeddah.

Figure 2. Process flow of the exclude-one-city-out (EOCO) approach.

Figure 3. Distribution of overall accuracy across city, model, feature, and season.

Figure 4. (A) Geographical location of northeast Riyadh city with Sentinel-2 false color composites (RGB: 8-4-3). Spatial performance of RF across the five feature sets for Riyadh (summer season). Included feature sets: (B) Spectral, (C) Spectral_Indices, (D) Spectral_Texture, (E) Spectral_Topography, and (F) All_Features.

Figure 5. Trade-off between seasonal variability and mean overall accuracy for Riyadh.

Figure 6. Feature importance determined by RF across four cities and seasons for two feature sets: Spectral_Texture and All_Features.

Figure 7. Stability (%) of land use and land cover categories for Riyadh using RF. Darker colors represent summer, while lighter colors represent winter.

Table 1. Data characteristics: Earth Observation Satellite (EOS), multispectral bands (MSBs), spatial resolution (SR), Central wavelength (CW), and bandwidth (BW).

EOS	MSBs	SR (m)	CW (nm)	BW (nm)	Citation
Sentinel-2	B2-Blue	10/PX	492.7	65	[72]
	B3-Green	10/PX	559.8	35
	B4-Red	10/PX	664.6	30
	B8-near-infrared (NIR)	10/PX	832.8	105
	B11-Shortwave infrared (SWIR-1)	10/PX (resampled)	1613.7	90
	B12- Shortwave infrared (SWIR-2)	10/PX (resampled)	2202.4	174

Table 2. Training samples for different cities (2025).

City	Urban Areas	Barren Lands	Vegetation	Water	Columns/Rows	MGRS Tile
Riyadh	330	255	188	94	6594, 7967	38RPN
Madinah	487	321	338	14	3543, 3645	37REH, 37QEG
Jeddah	792	507	322	130	6997, 14,708	37QEE, 37QDE, 37QED
Dammam	293	360	161	104	3866, 3769	39RVK 39RUK
Total samples	1902	1443	1009	342

Table 3. Evaluation metrics for five feature sets (Riyadh, summer). The Bold overall accuracy indicates the highest results.

Feature Sets	Model	Overall	Kappa	Macro F1	Precision	Recall
Spectral	RF	0.842	0.777	0.862	0.895	0.857
	SVM	0.801	0.721	0.827	0.880	0.823
	CART	0.783	0.694	0.834	0.835	0.839
	KNN	0.849	0.786	0.849	0.898	0.838
Spectral_Indices	RF	0.848	0.787	0.868	0.899	0.868
	SVM	0.811	0.735	0.834	0.884	0.832
	CART	0.864	0.809	0.888	0.907	0.891
	KNN	0.849	0.786	0.849	0.898	0.838
Spectral_Texture	RF	0.912	0.874	0.905	0.936	0.888
	SVM	0.906	0.866	0.906	0.931	0.895
	CART	0.879	0.828	0.900	0.910	0.894
	KNN	0.863	0.805	0.862	0.906	0.850
Spectral_Topography	RF	0.812	0.736	0.844	0.879	0.841
	SVM	0.791	0.707	0.817	0.877	0.812
	CART	0.740	0.632	0.800	0.802	0.801
	KNN	0.839	0.773	0.845	0.894	0.836
All_Features	RF	0.908	0.868	0.899	0.929	0.885
	SVM	0.878	0.828	0.886	0.917	0.879
	CART	0.897	0.854	0.908	0.922	0.903
	KNN	0.825	0.753	0.835	0.888	0.827

Table 4. Model–feature pair performance in summer and winter for different ML classifiers and cities.

N	Model	City	Feature	Summer (OA)	Winter (OA)
1	RF	Riyadh	Spectral_Texture	0.9128	0.8731
2	RF	Madinah	Spectral_Texture	0.8543	0.8517
3	RF	Jeddah	Spectral_Texture	0.9143	0.9240
4	RF	Dammam	Spectral_Indices	0.9575	0.9292
5	SVM	Riyadh	Spectral_Texture	0.9062	0.9084
6	SVM	Madinah	All_Features	0.7974	0.7991
7	SVM	Jeddah	Spectral_Texture	0.9446	0.9172
8	SVM	Dammam	Spectral_Texture	0.9706	0.8181
9	CART	Riyadh	All_Features	0.8974	0.8433
10	CART	Madinah	Spectral_Texture	0.8741	0.8759
11	CART	Jeddah	All_Features	0.8784	0.8966
12	CART	Dammam	All_Features	0.9477	0.9118
13	KNN	Riyadh	Spectral_Texture	0.8631	0.8455
14	KNN	Madinah	Spectral_Texture	0.8534	0.8328
15	KNN	Jeddah	All_Features	0.8658	0.8921
16	KNN	Dammam	All_Features	0.9651	0.9423

Table 5. Seasonal performance variability.

N	Model	City	Feature	Summer (OA)	Winter (OA)	Mean (OA)	Instability
1	RF	Riyadh	Spectral	0.8422	0.8333	0.83775	0.0089
2	RF	Riyadh	Spectral_Indices	0.8488	0.8477	0.84825	0.0011
3	RF	Riyadh	Spectral_Texture	0.9128	0.8731	0.89295	0.0397
4	RF	Riyadh	Spectral_Topography	0.8124	0.8091	0.81075	0.0033
5	RF	Riyadh	All_Features	0.9084	0.8698	0.88910	0.0386

Table 6. Feature importance ranking for RF for Riyadh (higher ranks indicate higher importance).

N	Spectral_Texture Set (3)			All_Features Set (5)
	Feature	Summer	Winter	Feature	Summer	Winter
1	B2	0.890	0.877	B2	0.681	0.659
2	B3	0.731	0.674	B3	0.680	0.706
3	B4	0.706	0.674	B4	0.739	0.731
4	B8	1.000	1.000	B8	0.633	0.640
5	B11	0.795	0.745	B11	0.655	0.722
6	B12	0.689	0.676	B12	0.842	0.846
7	Contrast	0.572	0.625	MNDWI	0.670	0.690
8	Entropy	0.478	0.514	NDBI	0.620	0.734
9	Homogeneity	0.542	0.558	NDVI	1.000	1.000
10	Variance	0.489	0.552	SAVI	0.908	0.933
11				Contrast	0.735	0.836
12				Entropy	0.929	0.990
13				Homogeneity	0.441	0.609
14				Variance	0.582	0.589
15				Elevation	0.439	0.519
16				Slope	0.500	0.635

Table 7. Statistical results of Friedman and Kendall’s W for feature sets (3) and (5).

		Friedman’s Statistical Test $x_{F}^{2}$		Kendall’s W
Feature Sets	Season	Chi-Square	p-Value	W-Value
Spectral_Texture	Summer	33.16	1.25 × 10⁻⁴	0.9212
Spectral_Texture	Winter	33.92	9.2 × 10⁻⁵	0.9424
All_Features	Summer	51.28	7.0 × 10⁻⁶	0.8548
All_Features	Winter	55.47	1.0 × 10⁻⁶	0.9246

Table 8. Misclassified class pair across feature sets and seasons for Riyadh using RF.

Feature Set	Rank	Summer			Winter
		Predictor	Reference	Misclassification	Predictor	Reference	Misclassification
Spectral	1	Urban	Barren	30.89%	Water	Barren	30.85%
	2	Water	Urban	22.34%	Urban	Barren	27.64%
	3	Barren	Urban	2.75%	Water	Urban	9.57%
Spectral_Indices	1	Urban	Barren	31.17%	Urban	Barren	26.29%
	2	Water	Urban	9.57%	Water	Barren	21.28%
	3	Water	Vegetation	8.51%	Water	Urban:	9.57%
Spectral_Texture	1	Water	Urban	26.60%	Water	Barren	19.15%
	2	Urban	Barren	11.38%	Water	Urban	18.09%
	3	Barren	Urban	3.53%	Urban	Barren	13.01%
Spectral_Topography	1	Urban	Barren	37.40%	Urban	Barren	33.06%
	2	Water	Urban	21.28%	Water	Barren	30.85%
	3	Barren	Urban	4.71%	Water	Urban	12.77%
All_Features	1	Water	Urban	21.28%	Water	Urban	20.21%
	2	Urban	Barren	12.74%	Urban	Barren	17.62%
	3	Water	Vegetation	8.51%	Water	Barren	11.70%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Aljaddani, A.H. Generalization of LULC Classification in Arid Environments Using Machine Learning and Spectral, Texture, and Topographic Features: Spatial and Seasonal Analyses with Implications for Urban Environmental Monitoring. Land 2026, 15, 1095. https://doi.org/10.3390/land15061095

AMA Style

Aljaddani AH. Generalization of LULC Classification in Arid Environments Using Machine Learning and Spectral, Texture, and Topographic Features: Spatial and Seasonal Analyses with Implications for Urban Environmental Monitoring. Land. 2026; 15(6):1095. https://doi.org/10.3390/land15061095

Chicago/Turabian Style

Aljaddani, Amal H. 2026. "Generalization of LULC Classification in Arid Environments Using Machine Learning and Spectral, Texture, and Topographic Features: Spatial and Seasonal Analyses with Implications for Urban Environmental Monitoring" Land 15, no. 6: 1095. https://doi.org/10.3390/land15061095

APA Style

Aljaddani, A. H. (2026). Generalization of LULC Classification in Arid Environments Using Machine Learning and Spectral, Texture, and Topographic Features: Spatial and Seasonal Analyses with Implications for Urban Environmental Monitoring. Land, 15(6), 1095. https://doi.org/10.3390/land15061095

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Generalization of LULC Classification in Arid Environments Using Machine Learning and Spectral, Texture, and Topographic Features: Spatial and Seasonal Analyses with Implications for Urban Environmental Monitoring

Abstract

1. Introduction

2. Data and Methodology

2.1. Study Area

2.2. Overview of the EOCO Approach

2.3. Data Collection and Preprocessing

3. LULC Classification

3.1. Training Data and Classification

3.2. Generalization Accuracy Assessment Across Cities

3.3. Statistical Analysis

4. Results

4.1. Model Performance Metrics

4.2. Spatial Generalization

4.3. Seasonal Performance

4.4. Feature Importance

5. Discussion

6. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI