An Application of Machine-Learning Model for Analyzing the Impact of Land-Use Change on Surface Water Resources in Gauteng Province, South Africa

: The change in land-use diversity is attributed to the anthropogenic factors sustaining life. The surface water bodies and other crucial natural resources in the study area are being depleted at an alarming rate. This study explored the implications of the changing land-use diversity on surface water resources by using a random forest (RF) classiﬁer machine-learning algorithm and remote-sensing models in Gauteng Province, South Africa. Landsat datasets from 1993 to 2022 were used and processed in the Google Earth Engine (GEE) platform, using the RF classiﬁer. The results indicate nine land-use diversity classes having increased and decreased tendencies, with high F-score values ranging from 72.3% to 100%. In GP, the spatial coverage of BL has shrunk by 100.4 km 2 every year over the past three decades. Similarly, BuA exhibits an annual decreasing rate of 42.4 km 2 due to the effect of dense vegetation coverage within the same land use type. Meanwhile, water bodies, marine quarries, arable lands, grasslands, shrublands, dense forests, and wetlands were expanded annually by 1.3, 2.3, 2.9, 5.6, 11.2, 29.6, and 89.5 km 2 , respectively. The surface water content level of the study area has been poor throughout the study years. The MNDWI and NDWI values have a stronger Pearson correlation at a radius of 5 km (r = 0.60, p = 0.000, n = 87,260) than at 10 and 15 km. This research is essential to improve current land-use planning and surface water management techniques to reduce the environmental impacts of land-use change.


Introduction
Since the 1960s, the biophysical features of the earth's surface have changed by an average value of 720,000 kilometers square (km 2 ) globally each year [1]. As a result, potential natural resources (e.g., surface water bodies) have been significantly diminishing occasionally. Consequently, knowledge of the real-time land-surface processes, diversity, and change requires regular quantification, monitoring, modeling, analysis, and mapping of the spatial and temporal dynamics of the land-use and land-cover change (LULCC). LULCC describes the transition or alteration of one land-use and land-cover (e.g., water bodies) type into another (e.g., wetlands). Contrarily, land-use diversity is defined as the range of land-use types within a given area [2]. Studies indicate that humans are the main triggering factors of LULCC and have altered about three-quarters of the earth's land surface in the past millennium. For instance, Winkler et al. [1] reported the loss of global forest area by 0.8 million km 2 and increases in global agriculture (i.e., cropland and pasture/rangeland) by 1.0 and 0.9 million km 2 , respectively. The same authors have pointed out that the loss is highly confined in developing countries because humans have produced excessive amounts of charcoal and extracted firewood, resulting in high deforestation rates and temperature change. Therefore, understanding the status of natural resources (e.g., water bodies have been shrinking from time to time. This might impact harmonizing the local climate condition [8] and the livelihood system of the inhabitants. An analysis of LULCC using machine learning is crucial for water resources evaluation, monitoring, and mapping because its impact on the natural environment differs from place to place [19,21]. Spectral differences between different land-use types are used in the classification process. However, the classification and analysis of historical LULCC remained challenging due to the lack of ground-truth data [5] and improved image classification techniques. So far, many studies have focused on LULCC mapping using moderate spatial resolution satellite data (e.g., Landsat 4,5,8, and 9; Sentinel; IKONOS; and SPOT). Conversely, a better option for users from developing countries would be to use Landsat images because of their moderate-high spatial resolution, free accessibility, and high temporal resolution. Most existing LULCC studies were analyzed using the supervised maximum likelihood classification techniques in ENVI, ERDAS, ArcGIS, QGIS, and TerrSet, among other geospatial software.
However, the classification and analysis of land-use diversity using a machine-learning approach provides comprehensive information on the various land-use types of the natural environment. For instance, Zurqani et al. [21] applied the RF classifier, one of the most robust machine-learning models, for LULCC analysis in the Savannah River Basin, using Google Earth Engine (GEE). The same authors reported that the major causes of LULCC occurring in the Savannah River Basin were linked to deforestation and reforestation of forest areas during the entire study period. Additionally, four primary limitations need great attention in the existing global land-use maps, e.g., few land classes, absence of temporal updates, coarser resolution, and lower accuracies (~77%) [13]. This topic has received little research due to the challenges in producing high-resolution multitemporal products on a large scale [6]. However, small-to-medium-sized catchments can be thoroughly examined because they have distinct and understandable land-use classes. For comprehending and evaluating LULCC, accurate and up-to-date LULCC information is needed for environmental resources evaluation, monitoring, and modeling. Additionally, it is helpful for various stakeholders to assess future pathways of sustainable land use for food production and nature conservation [22]. This challenge can be resolved using machine-learning models such as RF classifier supported by Google Earth Engine (GEE), ESRI ArcGIS v.10.8.1 software, and the Modified Normalized Difference Water Index (MNDWI). Spectral-index-based surface-water-bodies mapping and analysis methods are robust approaches to discriminate from all non-water bodies' land-use types [23,24]. The spectral water index is a single value that results from arithmetic operations on two or more bands [25]. This helps to carry out a detailed investigation on the surface water content of all water bodies. For instance, the analysis of land-use diversity and its changes strongly impacts regional species richness [2] and water bodies. Therefore, using machine-learning algorithms is crucial for investigating the implications of changing land-use diversity by identifying crucial features from satellite images [26]. Feature selection determines the distinction between out-of-bag (OOB) errors or estimates before and after feature variation. This method helps to focus on essential features that can eliminate the difficulty of interpretation.
Aigbokhan et al. [27] compared the accuracy of four different machine-learning algorithms for land-use diversity mapping, such as RF classifier, Support Vector Machine (SVM), K-Nearest Neighbors (KNNs), and Gaussian Mixture Models (GMMs). The same authors found that RF classified 23% of their study area as bare land, SVM 24%, KNN 24% and GMM 30% for the same land-use type. However, the accuracy of RF was 0.9840, which is higher than that of SVM (0.9780), KNN (0.9641), and GMM (0.9421) machine-learning models. The RF is one of the robust and advanced machine-learning models trained by bootstrapping for land-use diversity mapping using the ground-truth dataset [28]. On GEE, RF java script is enabled to run the model. As a result, the GEE is becoming powerful in identifying and analyzing features at the smallest pixel unit. The traditional per-scene analysis is replaced by per-pixel analysis in GEE because it uses advanced algorithms [13]. Therefore, adequate ground-truth data are needed to ensure the quality of image classification in RF. For instance, Yang and Huang [6] created high-resolution LULC datasets for China based on GEE, and they used a pixel-by-pixel temporal composite to analyze the spectral, phenological, and topographical metrics. Following the covariate predictors, a random subset of the training data is classified using numerous decision trees created by the RF algorithm [29]. RF improves the quality of the image classification process and can quickly manage many features without affecting overall accuracy [5,28]. Moreover, water is an essential natural resource for both livestock and humans. However, it is a serious challenge in the study area due to climate change, less rainfall availability, high population growth, and urbanization, among other factors [30]. Martínez-Núñez et al. [2] analyzed the land-use diversity and Normalized Difference Vegetation Index (NDVI) values for each pixel in GEE. Findell et al. [3] investigated the impact of anthropogenic LULCC on regional climate extremes to characterize the joint temperature-humidity response to land use. Waterlogged areas were successfully extracted using the Modified Normalized Difference Water Index (MNDWI); however, the Normalized Difference Water Index (NDWI) shows the mixing of water with built-up features in the Sri Muktsar Sahib District of Punjab, India [31].
Therefore, there is a need to (i) address research-based essential models for investigating the implication of land-use diversity alteration on surface water resources; and (ii) identify the most accurate surface water detection models. Despite this, the Republic of South Africa's Department of Forestry, Fisheries, and the Environment developed a limited amount of quantitative data on land use at the national level, and reliable machine-learning models such as RF did not support the development of this data. The effects of shifting land-use diversity on surface water resources using remote-sensing models were not examined. Therefore, the novelty of our study is the incorporation of robust approaches for studying the implication of land-use diversity across various district municipalities. The objective of this study was to (i) analyze land-use diversity from the period (1993-2023) by using an RF machine-learning model; (ii) quantify the probability of transfer-out (losses) and transfer-in (gains) rate across each district municipality; (iii) explore the spatiotemporal trends of surface water bodies and determine the implications of changing land-use diversity; and (iv) find the relationship between the Modified Normalized Difference Water Index (MNDWI) and Normalized Difference Water Index (NDWI) for surface water bodies' characterization. This study's findings are crucial to enriching the existing landuse planning and surface water management strategies and reducing the environmental consequence of changing land-use diversity and water supply problems. Additionally, it is imperative to comprehend the spatial distribution of natural resources across each district municipality for planning and conservation. Moreover, it is helpful to realize the performance of GEE-based machine-learning models such as RF for characterizing the implication of changing land-use diversity on surface water bodies.

Study Area
This study was conducted in Gauteng Province, the smallest province in South Africa. It is the economic powerhouse and transportation hub of South Africa [30,32]. It is also one of the mining-dominated provinces of South Africa. The area covers 18,170 km 2 (1.22%) and is located between 27 • 17 15 S to 29 • 17 25 S and 26 • 54 45 E to 25 • 54 40 E (Figure 1). Gauteng Province comprises five districts, namely Tshwane, 6296.2 km 2 (34.7%); West Rand, 4084.5 km 2 (22.5%); Ekurhuleni, 3458.6 km 2 (19%); Sedibeng, 2686.9 km 2 (14.8%); and Johannesburg, 1644.0 km 2 (9%). There are also eight different metropolitan municipalities in the area. The topography of Gauteng Province ranges from 931 to 1916 m above sea level (m.a.s.l.). The average elevation of Gauteng Province is 1481 m.a.s.l. The mean annual rainfall of the study area is up to 363 mm [30]. The maximum and minimum temperature are 26.3 • C and 20.3 • C, respectively. During the summer, the average temperature can reach 23.3 • C. The study area has two primary climatic seasons. These are the dry season (April-October) and winter (May-August). Based on the Soil and Terrain Database of Southern Africa (SOTERSAF version 1.0), nine dominant soil types exist in the study area. Lixisols are the most dominant soil type, covering 6895.3 km 2 (37.9%), while Solonetz covers 68 km 2 (0.4%). The remaining soil types include Acrisols, covering 3656.1 km 2 (20.1%); Leptosols, 2924.2 km 2 (16.1); Plinthosols, 2458.3 km 2 (13.5%); Vertisols, 1150.2 km 2 (6.3%); Luvisols, 448.3 km 2 (2.5%); Nitisols, 441.3 km 2 (2.4%); and Regosols, 100 km 2 (0.6%). The remaining part of the area is covered by a water body, 28.4 km 2 (0.2%). About 15,488,137 people live in Gauteng Province and are receiving water from the Vaal Dam [33,34]. However, the area is highly characterized by crystalline nature rocks, making it challenging to extract adequate groundwater [35]. Additionally, it is impacted by a variety of hydrogeological factors, including drainage density, lithology, slope, and geomorphology [36]. km 2 (14.8%); and Johannesburg, 1644.0 km 2 (9%). There are also eight different metropolitan municipalities in the area. The topography of Gauteng Province ranges from 931 to 1916 m above sea level (m.a.s.l.). The average elevation of Gauteng Province is 1481 m.a.s.l. The mean annual rainfall of the study area is up to 363 mm [30]. The maximum and minimum temperature are 26.3 °C and 20.3 °C, respectively. During the summer, the average temperature can reach 23.3 °C. The study area has two primary climatic seasons. These are the dry season (April-October) and winter (May-August). Based on the Soil and Terrain Database of Southern Africa (SOTERSAF version 1.0), nine dominant soil types exist in the study area. Lixisols are the most dominant soil type, covering 6895.3 km 2 (37.9%), while Solonetz covers 68 km 2 (0.4%). The remaining soil types include Acrisols, covering 3656.1 km 2 (20.1%); Leptosols, 2924.2 km 2 (16.1); Plinthosols, 2458.3 km 2 (13.5%); Vertisols, 1150.2 km 2 (6.3%); Luvisols, 448.3 km 2 (2.5%); Nitisols, 441.3 km 2 (2.4%); and Regosols, 100 km 2 (0.6%). The remaining part of the area is covered by a water body, 28.4 km 2 (0.2%). About 15,488,137 people live in Gauteng Province and are receiving water from the Vaal Dam [33,34]. However, the area is highly characterized by crystalline nature rocks, making it challenging to extract adequate groundwater [35]. Additionally, it is impacted by a variety of hydrogeological factors, including drainage density, lithology, slope, and geomorphology [36].

Image Acquisition
One of the most frequently utilized data sources for investigating the implications of changing land-use diversity on surface water resources is Landsat imagery. The Landsat multispectral and multitemporal imageries collection 2, level 2, acquired during daytime,

Image Acquisition
One of the most frequently utilized data sources for investigating the implications of changing land-use diversity on surface water resources is Landsat imagery. The Landsat multispectral and multitemporal imageries collection 2, level 2, acquired during daytime, used at moderate spatial resolution for 1993, 2003, 2013, and 2022 (Table 1) downloaded from four different scenes (tiles), namely path 170/row 77, 170/78, 170/79, and 171/78. The availability of those medium-spatial-resolution satellite imageries drew the attention of scholars to investigate the diversity and changes in land use in the landscape. Several scholars have also successfully used the datasets to evaluate LULCC and other related activities [18].

Level of Agreement Kappa Coefficient
Training samples were also collected by supporting false-color-composite and visualimage-interpretation techniques from all study areas' land-use types in the GEE platform. GEE has the benefit of simultaneously processing large pools of satellite imagery quickly, and it has powerful computational algorithms [13]. According to Zhang et al. [37], there are two required methods for gathering ground-truth data in a remote-sensing environment: (i) using a false-color composite supported by the visual-image-interpretation approach and (ii) creating sample points automatically from satellite images that are prepared for classification. Therefore, manual interpretation and/or semi-automated classification backed by field surveys are/is required for the identification, monitoring, mapping, and analysis of land-use diversity [26]. Our research used the false-color-composite method supported by the visual-image-interpretation method. After that, we gathered samples from each land-use type separated by their spectral values. Studies also indicated that visual-image-interpretation-based approaches to ground-truth data collection are higher in quality than sample points that are randomly created from satellite images. The consistency of postprocessing land-use datasets depends on these training samples [6].

Image Preprocessing, Interpretation, and Analysis
The primary prerequisites for remote-sensing products; processing and the LULCC research are (i) image preprocessing and enhancement, (ii) appropriate image-classificationtechnique selection, and (iii) collection of reference datasets for accuracy and validation. The geometric, radiometric, and atmospheric errors had to be fixed to enhance the visibility of each pixel and eliminate positional inclination (if any) before any research was performed. As part of this requirement, the issue of cloud cover (shadow) and its effect is not new [38], but it is critical in satellite-image analysis based on their spectral signatures. For instance, cloud cover can significantly hinder the analysis of crop or vegetation growth [39]. Therefore, the effect of cloud, haze, shadow, or other disturbances in the input images must be corrected to optimize classification accuracy [13]. In our study, we used high qualities of Landsat imageries Level 2, which are atmospherically and radiometrically corrected and enhanced images. Furthermore, geometric corrections should be applied to remove distortions caused by the sensor or the earth's rotation. The Universal Transverse Mercator (UTM) Zone 35 South projection and World Geodetic System 1984 (WGS84) data were applied for all inputs that we used. Studies have shown that RF has successfully processed and analyzed remote-sensing products even with more pronounced noise effects. Zurqani et al. [21] reported that LULCC was evaluated across the study area, using preprocessed Landsat imagery made accessible by GEE. A quick analysis utility for LULCC is provided by GEE and was analyzed based on Table 2. Following that, a pixel-by-pixel study was performed using the spectral values of each individual pixel (Equation (1)): where i = the i th class (e.g., class 1, 2, 3. . .. n), x = n-dimensional data (where n is the number of bands), p(w i ) = probability that a class occurs in the image and is assumed the same for all classes, |∑ i | = determinant of the covariance matrix of the data in a class, ∑ i −1 = the inverse of the covariance matrix of a class, and m i = mean vector of a class. Green-SWIR1/Green + SWIR1 Xu (2005Xu ( , 2006 [24,41] The spectral values of each pixel were analyzed using the N × 300 ground-truth data to improve the classification processes and accuracy of the findings. In this case, N refers to the total number of land-use types in the study area. Gbedzi et al. [19] reported that about 50 ground-truth datasets per land-use type were used to classify the satellite imageries and assess the level of accuracy. There are 500 trees in the RF classification for categorizing seven land-use types [29]. Yang and Huang [6] reported that the accuracy level of China land-cover datasets is evaluated by applying confusion matrixes, namely producer's accuracy, consumer's accuracy, overall accuracy, and F-score. F-score is one of the machine-learning measures that help assess the RF model's precision. The proportion of all instances that are classified by the model is known as classification accuracy [42]. The minimum level of the overall accuracy should be at least 85 percent [43]. In this study, we analyzed the confusion matrixes, namely consumer's accuracy (CA), producer's accuracy (PA), overall accuracy (OA), F-score based on the Yang and Huang [6] formula, and Kappa coefficient (Table 1), as follows (Equations (2)-(6)): whereK is the Kappa coefficient, or K-coefficient; x ii is the total number of observations in row i and column i; x i+ and x +i are the marginal totals of row i and column i, respectively; N is the total number of observations; D is the total number of correct pixels (diagonal); V is the total number of pixels in the error matrix; F-score is the harmonic mean of PA and CA; PA is the producer's accuracy; and CA is the consumer's accuracy.
To realize the spatial and temporal patterns of LULCC in the study area, a postclassification change detection analysis was performed pixel-by-pixel between the final and initial study years. This analysis was carried out by deducting the spatial area coverage of the final year from the initial year. This change-detection method was proven to be the most effective technique because data from two periods are separately classified, thereby minimizing the problem of normalizing for atmospheric and sensor differences [44].

Surface Water Content Detection, Analysis, and Mapping
The surface water bodies of Gauteng Province were identified to investigate their history and existing status, using Landsat 4-5 TM and 8 OLI satellite images. We also evaluated the water content of the study area for proper planning and management. The Modified Normalized Difference Water Index (MNDWI) and Normalized Difference Water Index (NDWI) were used based on Table 2. Both indices are novel for surface water body detection and water content analysis. They are also useful methods for determining water quality since they exclude soil and terrestrial vegetation features within a water body [40]. The MNDWI overcomes the limits of the Normalized Difference Water Index (NDWI) regarding soil and built-up areas but still has proven to be effective in separating water bodies and vegetation [45]. The MNDWI improves the detection of water-content information and eliminates shadow noise in areas where built-up lands are a dominant cover type than the Normalized Difference Water Index (NDWI) [41]. The MNDWI also provides more detailed and higher-quality water-content information than the NDWI [46,47]. In this study, we employed both indices to verify the water content of the study area (Table 2; Figure 2). For the NDWI estimation, the Green and NIR represent bands 2 and 4 in Landsat 4-5 TM; however, they signify bands 3 and 5 in Landsat 8 OLI products. Furthermore, fo the analysis of the MNDWI, the Green and SWIR1 represent bands 2 and 5 in Landsat 4-5 TM and bands 3 and 6 in Landsat 8 OLI. The MNDWI uses green and shortwave in For the NDWI estimation, the Green and NIR represent bands 2 and 4 in Landsat 4-5 TM; however, they signify bands 3 and 5 in Landsat 8 OLI products. Furthermore, for the analysis of the MNDWI, the Green and SWIR1 represent bands 2 and 5 in Landsat 4-5 TM and bands 3 and 6 in Landsat 8 OLI. The MNDWI uses green and shortwave infrared (SWIR-1) bands to point out water bodies, while the NDWI uses green and near-infrared (NIR) bands. The value of the MNDWI and NDWI ranges from −1 to +1. Built-up areas, soil, and vegetation have negative values due to their higher reflectance, while surface water bodies have positive values because of lower reflectance in the SWIR band [31].

The Statistical Relationship between the MNDWI and NDWI
We applied the Pearson correlation coefficient (r) to investigate the relationships between the NDWI and MNDWI (Equation (7)). In most cases, correlations are utilized to analyze bivariate relationships between measured variables. In this study, we generated the values of each pixel found within a 5, 10, and 15 km radius distance to compare the results from the center of the study site to assess the relationships of each factor. This increases the processing capability of our computer and Minitab version 16 statistical software. The Pearson correlation coefficient values range from −1 to +1. Bolboaca and Jäntschi [48] reported that if the variables have a value of +1, an increasing relationship completely associates them; if they have a value of −1, they are perfectly related by a decreasing relationship, and if they have a value of 0, they are not linearly related to each other. If the correlation coefficient is more than 0.8, the correlation is strong; if it is lower than 0.5, the correlation is weak. In several studies, the Pearson correlation matrix was used to assess the relationships between the NDWI and MNDWI [2,49].
where r = Pearson correlation coefficient, n = no. of pairs, ∑ xy = sum of x and y, (∑ x) = sum o f x values, (∑ y) = sum o f y values , ∑ x 2 = sum of the squared x value, and ∑ y 2 = sum of the squared y value.

Land-Use Diversity Analysis at the Province Level
Using the RF classifier machine-learning model, we found nine different land-use diversity classes in the study area, namely WB, MQ, DF, GL, ShL, BuA, WeL, AL, and BL (Figure 3a-d), with high F-score values ranging from 72.3 to 100% during the period from 1993 to 2022 (Table 3). The results of CA and PA also show a strong agreement with the F-score values of each land-use type (Table 3). These values were validated using the OA and K-coefficient.  [43] underlined that an acceptable OA value should be at least 85%. Our findings were above the minimum standard and acceptable range in this case. We also proved this value with the F-score value of each land-use type in our study area. Various studies also indicated that an F-score value above the average (50%) indicates perfect precision and is considered to be in higher agreement (Table 3). Table 3 shows our study area's detailed CA, PA, F-score, OA, and K-coefficient analysis from 1993 to 2022. Remote Sens. 2023, 15, x FOR PEER REVIEW 11 of 23    Table 4). The 2013 land-use diversity of the study area shows that AL was still covering the highest occupancy rate, i.e., 7569.6 km 2 (41.7%), in comparison to other land-use categories in the area (Figure 3c). This land-use type shows an  In  Table 4). The 2013 land-use diversity of the study area shows that AL was still covering the highest occupancy rate, i.e., 7569.6 km 2 (41.7%), in comparison to other land-use categories in the area (Figure 3c). This land-use type shows an increased trend from its earlier coverage by 5067.1 km 2 ( Table 4). The remaining land-use types included WeL, which covered 3878.  Table 4). In the years 2013-2022, AL, DF, MQ, and BL also showed an annual decline trend of 147. 6, 42.8, 34.7, and 24.6 km 2 , respectively (Figure 4c). However, the extent of WB, WeL, BuA, GL, and ShL intensified annually by 2.1, 40.5, 48.5, 51.7, and 107 km 2 , respectively. This study reported that during the last three decades (1993-2022), only BL and BuA declined by 100.4 and 42.2 km 2 , respectively (Figure 4d). The presence of high vegetation in the area was strongly affecting the classification of BuA. The BuA was heavily dominated by vegetation in the majority of GP district municipalities, including the metropolitan center of Johannesburg. As a result, the spatial extent of BuA is less than the previous years. The remaining land-use types, namely WB, MQ, AL, GL, ShL, DF, and WeL, have expanded annually by 1.3, 2.3, 2.9, 5.6, 11.2, 29.6, and 89.5 km 2 , respectively. Those land uses may affect the security of surface water bodies in the area. Abiye [35] stated that the economic sustainability of the area is hampered by water insecurity (e.g., surface water), which is brought on by factors such as population growth, expansion of industrial, agricultural, and mining sectors, among others. In addition to this, the hydrological cycle may also be significantly impacted by changes in water quality and surface runoff [33]. Mawasha and Britz [50] reported an increase in BuA and BL by 427.13 (56.2%) and 62.251 km 2 (8.2%), whereas a decrease of vegetation cover by 74.55 (9.8%) and 197 km 2 (25.8%), respectively, was observed for the period of 1987-2015. The decline in AL impacts the accessibility of food for both humans and livestock. This has been proven by [51] because livestock occupies 70% of all AL and GL. The decreased trend of WB may harm AL productivity [52]. Additionally, the reduction of MQ could influence the economic growth of GP. However, unwisely treated MQ causes negative environmental effects linked with the contamination of surface and subsurface water resources [53]. Therefore, it is important to undertake proper MQ waste management to safeguard the local water bodies and other natural resources.
As a result, the spatial extent of BuA is less than the previous years. The remaining land-use types, namely WB, MQ, AL, GL, ShL, DF, and WeL, have expanded annually by 1.3, 2.3, 2.9, 5.6, 11.2, 29.6, and 89.5 km 2 , respectively. Those land uses may affect the security of surface water bodies in the area. Abiye [35] stated that the economic sustainability of the area is hampered by water insecurity (e.g., surface water), which is brought on by factors such as population growth, expansion of industrial, agricultural, and mining sectors, among others. In addition to this, the hydrological cycle may also be significantly impacted by changes in water quality and surface runoff [33].    Table 5 indicates the overall surface water bodies of the study area, using the MNDWI and NDWI. The results show that the surface water content has been below the standard and diminishing yearly over the last three decades (Figure 5a-h). This reduction may have caused the worsening shortages of surface water access for livestock and humans in all district municipalities of GP from 1993 to 2022 (Table 5). As a result, the incidence of drought would be severe [54], as well as the regularity, intensity, severity, and spatial extent. Although all district municipalities of GP are depicted as being below the standard thresholds, the TSH and EKU show relatively higher MNDWI values than all of the districts of GP. In these districts, the max MNDWI values during the year 1993 were 0.39, and 0.38, respectively. In the same year, the district of DC42 revealed a lower (0.27) MNDWI value. In addition, the NDWI shows very low surface water content levels compared to the MNDWI across all the study sites. Studies indicated that surface water bodies may be quickly affected by extreme weather and poor land-use management-related factors. Additionally, the allocation of water bodies using the RF method, MNDWI, and NDWI was analyzed. The findings showed that the spatial representations of water bodies in the province of Gauteng varied in three of the models. For instance, the MNDWI and NDWI show a decline of 103 and 594 km 2 , respectively, from 1993 to 2022, whereas the RF model indicates an increase of 38.56 km 2 .

Relationships of Various Surface Water Content Indicators at Various Radiuses
In this study, we examined the statistical relationship between the MNDWI and NDWI at a 5, 10, and 15 km radius from the center of the study area to ascertain how the two significant surface water content indicators related to one another. Figure 6a,b depict the long-term mean NDWI and MNDWI levels at various radiuses.
Our results indicated that the Pearson correlation between the MNDWI and NDWI is higher at the 5 km radius than at 10 and 15 km (r = 0.60, p = 0.00, n = 87,260) and is statistically significant (p = 0.000) (Figure 7a-c). MNDWI, and NDWI was analyzed. The findings showed that the spatial representations of water bodies in the province of Gauteng varied in three of the models. For instance, the MNDWI and NDWI show a decline of 103 and 594 km 2 , respectively, from 1993 to 2022, whereas the RF model indicates an increase of 38.56 km 2 .   Our results indicated that the Pearson correlation between the MNDWI and NDWI is higher at the 5 km radius than at 10 and 15 km (r = 0.60, p = 0.00, n = 87,260) and is statistically significant (p = 0.000) (Figure 7a-c).  Our results indicated that the Pearson correlation between the MNDWI and NDWI is higher at the 5 km radius than at 10 and 15 km (r = 0.60, p = 0.00, n = 87,260) and is statistically significant (p = 0.000) (Figure 7a-c).

Conclusions
This study explored the performance of machine-learning models such as RF to characterize the impact of changing land-use diversity on surface water bodies. Our results showed that the spatial coverage of BL has decreased by 100.4 km 2 annually during the last three decades. The influence of the dense vegetation in GP was also evident in BuA, which showed a decreasing trend of 42.2 km 2 . On the other hand, WB, MQ, AL, GL, ShL, DF, and WeL have increased by 1.3, 2.3, 2.9, 5.6, 11.2, 29.6, and 89.5 km 2 annually. Between 1993 and 2022, the spatial coverage of MQ, WB, GL, and DF increased

Conclusions
This study explored the performance of machine-learning models such as RF to characterize the impact of changing land-use diversity on surface water bodies. Our results showed that the spatial coverage of BL has decreased by 100.4 km 2 annually during the last three decades. The influence of the dense vegetation in GP was also evident in BuA, which showed a decreasing trend of 42.2 km 2 . On the other hand, WB, MQ, AL, GL, ShL, DF, and WeL have increased by 1.3, 2.3, 2.9, 5.6, 11.2, 29.6, and 89.5 km 2 annually. Between 1993 and 2022, the spatial coverage of MQ, WB, GL, and DF increased in EKU by 0.1, 1.5, 6.2, and 10.8 km 2 , respectively. During the same time period, the spatial extent of BL, AL, BuA, ShL, and WeL decreased by an average of 22.8, 11.2, 7.5, 3.7, and 1.9 km 2 , respectively. In the metropolitan city of JHB, the coverage of ShL, which is crucial for minimizing the local effects of climate change, has typically decreased. In this city, ShL has declined by 23.4%, BL by 5.4%, AL and GL by 2.1, and WeL by 0.6 km 2 yearly. However, there were annual increases in WB, MQ, BuA, and DF of 0.3, 2.3, 3.2, and 5 km 2 , respectively. Additionally, the spatial area of BL, AL, and BuA decreased by 142.8, 54.1, and 35 km 2 in the TSH District, respectively. However, ShL, WB, MQ, GL, DF, and WeL showed an annual improvement of 0.9, 2.3, 2.5, 5.1, 31.9, and 151.8 km 2 . In DC48, BL, BuA, AL, and MQ declined by 102.5, 36.1, 25, and 0.9 km 2 , respectively. Nonetheless, WeL, ShL, DF, GL, and WB have intensified by 71.8, 26.8, 23.6, 7.7, and 1.4 km 2 , respectively. Additionally, BuA, BL, and WB decreased annually by 60.6, 49.9, and 1.2 km 2 in the DC42 District. Other land-use types, such as GL, MQ, DF, ShL, WeL, and AL, intensified annually by 1.2, 3.5, 24.2, 35.5, 67.4, and 101.9 km 2 , respectively. Additionally, this study reported that the levels of surface water bodies are decreasing because of the severe effects of land-use change. This study found that the statistical relationship between the MNDWI and NDWI was higher at a radius of 5 km (r = 0.60, p = 0.00, n = 87,260) than at 10 and 15 km. The finding of this study helps us to comprehend the implications of the changing land-use diversity on surface water bodies and improves the existing land-use system. Additionally, it can be used as a baseline study for future research on the economic effects of land-use change on surface water resources.
Supplementary Materials: The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/rs15164092/s1. All supplementary annexes, such as land-use diversity classes, trend at municipality levels, and RF-GEE scripts are placed here as supplementary information to enhance the quality of our manuscript. Data Availability Statement: Any datasets used in our study will be shared upon request to the corresponding author.