Land-Use Change Prediction in Dam Catchment Using Logistic Regression-CA, ANN-CA and Random Forest Regression and Implications for Sustainable Land–Water Nexus

: For sustainable water resource management within dam catchments, accurate knowledge of land-use and land-cover change (LULCC) and the relationships with dam water variability is necessary. To improve LULCC prediction, this study proposes the use of a random forest regression (RFR) model, in comparison with logistic regression–cellular automata (LR-CA) and artificial neural network–cellular automata (ANN-CA), for the prediction of LULCC (2019–2030) in the Gaborone dam catchment (Botswana). RFR is proposed as it is able to capture the existing and potential interactions between the LULC intensity and their nonlinear interactions with the change-driving factors. For LULCC forecasting, the driving factors comprised physiographic variables (elevation, slope and aspect) and proximity-neighborhood factors (distances to water bodies, roads and urban areas). In simulating the historical LULC (1986–2019) at 5-year time steps, RFR outperformed ANN-CA and LR-CA models with respective percentage accuracies of 84.9%, 62.1% and 60.7%. Using the RFR model, the predicted LULCCs were determined as vegetation ( − 8.9%), bare soil (+8.9%), built-up (+2.49%) and cropland ( − 2.8%), with water bodies exhibiting insignificant change. The correlation between land use (built-up areas) and water depicted an increasing population against decreasing dam water capacity. The study approach has the potential for deriving the catchment land–water nexus, which can aid in the formulation of sustainable catchment monitoring and development strategies.


Introduction
In this Anthropocene era where human activities are forcing significant changes in the environment, rising incomes together with the growing demands for goods and services, including food, energy, water and land, have resulted in increased pressures on natural resources and ecosystems, leading to their overexploitation and degradation.This predicament is further fueled by climate change.For sustainable development, systematic integrated solutions are necessary.This can be achieved in part through nexus-based crosssectoral links between resources for increased and efficient exploitation of resources at local, national and global scales.For catchments, especially dam water catchments, the determination of accurate links between land use and water resources is very significant [1,2].
Focusing on the sustainability assessment of a land-water nexus within catchment ecosystems, it is necessary to apply an integrated approach for accurate identification of land-use and land-cover changes, future predictions and their impacts on water resources and subsequently the water budget.
Land is the main element of the natural environmental substrate and is an important carrier of socio-economic development [3].Land use refers to the human-based utilization of the Earth's surface, while land cover comprises the natural biological, physical and terrestrial surfaces [4].Changes in land use and land cover (LULC) are mainly attributed to urbanization, infrastructural developments, industrialization, intensification of agriculture and irrigation and overexploitation of grassland and forest covers [5].Different studies have shown that LULC changes are critical for the assessment of environmental impacts and in determining the relationships with hydrological and ecological issues [6].Recently, more attention has been given to LULC change due to its extreme impacts on the environment at the basin or watershed levels, as well as its contributions to climate change [7].
Rationalized use and management of land are some of the crucial ways of resolving socio-economic conflicts and contribute to the harmonization of environment, society and economy [8].However, with increasing population and rapid urbanization, regional and global LULC has undergone enormous changes in the past decades.The modifications of the biophysical systems and the alterations of the land surface energy processes through LULC changes result in fluctuations in surface energy balance, leading to variabilities in climate, which have in turn become responsible for several adverse impacts on different environmental phenomena [9,10].The changes in LULC are thus critical driving factors for several environmental problems, including enhanced carbon emissions [11,12], biodiversity loss [13], ecosystem productivity decrease [14], soil and land degradation [15] and ecosystem service decline [16].This implies that LULC change detection and studies play a significant role in monitoring land as a critical resource, how it is transformed naturally and/or due to human activities and its impacts on other resources.
Consequently, changes in LULC are linked to several of the United Nations' (UN) Sustainable Development Goals (SDGs).Specifically, SDG 15 on Life on Land, which demands to "protect, restore and promote sustainable use of terrestrial ecosystems, sustainably manage forests, combat desertification, and halt and reverse land degradation" [17].Further, the realization of SDG 11 on Sustainable Cities and Communities is based on informed development that relies on accurate land-use information [18,19].Geospatial information on LULC empowers decision makers to monitor and address environmental challenges, promote sustainable development, ensure inclusive, safe, resilient and sustainable cities as envisioned by SDG 11 [20], implement evidence-based policies and ensure the resilience and sustainability of cities and societies [21].For sustainable environmental planning, monitoring, identification of possible threats and formulation of mitigation and adaptation strategies, past, present and future LULC information is necessary [8,9].
For LULC mapping and LULC change prediction, a significant amount of research has been based on geospatial modeling using remote sensing (RS) data and GIS techniques [22,23].The other commonly used approaches for LULC change predictions include models based on: statistics [24]; evolutionary approaches [25]; cellular automata [26]; Markov chain [27]; hybrid models [28]; expert systems [29] and multiagent models [30].Compared to the RS-and GIS-based methods, the cellular automata (CA)-Markov chain (MC) model is the most widely used approach for LULC change prediction [31,32].
While MC analysis has been used in several case studies for the simulation and prediction of LULC changes, it is best suited for short-term projections [33] due to the fact that it is not spatially explicit, as it works under the physics assumption that the future state depends only on the current state while ignoring the spatial information within LULC classes [34].Furthermore, its analysis does not offer a geographical dispersal of the verification, which is essential in comprehending the possible influence of predicted variations.Hence, MC can determine the right magnitude of LULC change without the correct change direction.This drawback has been mitigated by creating a superior hybrid MC-CA model that combines the MC with the empirical CA [28].A bottom-up dynamic model, CA incorporates a spatial dimension, thus adding direction modeling in LULC change modeling.The MC-CA model has been widely used to understand urban expansion and landscape dynamics in several studies [35][36][37][38][39].
Despite the potential and wide use of the MC-CA model, its adoption presents a hurdle due to the assumption of stationary transitions, making the approach more suited for short-term predictions than long-term projections [40].Furthermore, MC-CA simulations present uncertainty in the output patterns resulting from the sensitivity of the model to the cell size of the input grid data and to the shapes of the different neighborhood types [41].Additionally, the self-adaptive ability of MC-CA for modeling nonlinear relationships between the dynamic LULC classes and the change drivers is still not completely reliable [42].These drawbacks in the MC-CA model can be solved by integrating the model with other dynamic and empirical models [43].Some of the approaches which have been integrated with MC-CA include multicriteria analysis (MCA), analytical hierarchy process (MCA-AHP) and frequency ratio (FR) [44]; multivariable statistical methods (MSMs) such as linear regression (LiR) [45]; logistic regression (LR) [46]; machine learning methods [47] and artificial neural networks (ANNs) [48].Nonetheless, due to the complexity of studying land-use change, most of the proposed hybridized models have limitations that include insufficient knowledge about the area of interest and subjectivity in weighting the variables and they cannot model nonlinear phenomena, resulting in unreliable prediction results [49].
In applying the different LULC change prediction models, ref [50] utilized hybrid LR-MC-CA and obtained an accuracy of 89% between simulated and actual land use for case study of Tehran.For urban growth modeling, ref [51] employed RF-CA and SVM-CA models which exhibited a high certainty for modeling complex urban growth in the Wallonia region (Belgium).For forecasting LULC change in Attica (Greece), ref [52] applied RF-CA with notably high overall accuracy of 88.4%.[53] employed the ANN-CA-Markov model for the simulation of urban growth in China with a prediction suitability score of 0.864 and kappa coefficient of 0.78.Similar accuracy of 84% was obtained by [54] using ANN-CA for spatiotemporal analysis and simulations of biophysical indicators under urbanization and climate change scenarios.In another study, ref [55] predicted urban land-use change in Bogota up to 2034 using the MC-ACC with an average validation value of 0.85.In comparing different machine learning models, ref [49] compared different machine learning models, ANN, SVR, RF, decision tree (DT), LR and multivariate adaptive regression splines (MARS), for the prediction of urban growth with the following results.The prediction accuracy results ranged between 68% (LR) and 75% (ANN).A detailed review on the application of ML models for spatiotemporal simulation and prediction of land-use change can be found in Aburas et al. [44].In addition to the inherent drawbacks with the LULC prediction models, most of the models are developed and tested for wellstructured urban landscapes with proper development plans [51].They may therefore be inapplicable for catchment landscapes with complex and unsystematic evolutions in the spatial-temporal dynamics of land-use change [56].
In the recent past, machine learning (ML) models such as random forest (RF), ANN and support vector machine (SVM) have been proposed for the identification of transitions in land-use classes and for the prediction of LULC changes [51].ML, with its capabilities such as high learning rates and big data analytics, is able to enhance the data-fitting accuracies, thus producing more intelligent and robust simulations and predictions [57].Compared to the traditional empirical, statistical and dynamic approaches, machine learning models are more flexible with the ability for efficient learning and processing, do not consider problems related to multivariate covariances and can handle the nonlinearity in LULC data [58].As such, ML models have the potential to improve the simulation of unsystematic and complex spatial-temporal dynamics in LULC and the prediction of LULC change type and intensity within catchment landscapes.Despite their advantages, the application of ML models for the simulation and prediction of LULC change is still very limited.
For multiple predicted factors, the uncertainty of the inspection variable is influenced by the LULC type and its intensity.This poses a nonlinearity connectivity condition between the LULC variables.To improve the LULC change detection and prediction, this study proposes the simulation and prediction of LULC change using the random forest regression (RFR) ML approach for the prediction of LULC changes in the Gaborone dam catchment (GDC) in Botswana.Catchment management is important not only as a hydrological unit but also as part of the socio-ecological environment that is responsible in part for economic activities and food production and as social security for the residents [59].However, LULC changes within catchments through, for example, urbanization and deforestation contribute to negative impacts on water quality and water availability and indirectly influence the nature of a watershed ecosystem.This means that, for catchments, the mapping and analysis of the spatiotemporal variations occurring within the watersheds and their interactions with the catchment hydrological components will enhance not only the formulations of water conservation strategies but also overall sustainable planning and management of the catchment.RFR is a nonlinear model suitable for the regression of datasets that do not exhibit linearity.RFR is considered to be able to capture the existing and potential interactions between the type and intensity of LULC classes and the interactions with the influencing or driving factors [47].The proposed RFR model is compared with two hybrid models, logistic regression-CA (LR-CA) and ANN-CA.LR-CA is proposed due to its ability to take the dynamic process of LULC changes into consideration [60] and can connect the categorical variables and the continuous variables and build the potential relationships between them [61].Ref. [62] established an LR-CA framework for the prediction of urban changes in the Istanbul metropolitan area, while [63] explored the availability of an integrated LR-CA in predicting the future LULC in the Hamilton area of Ohio in the USA.On the other hand, the ANN has the advantage of being nonparametric and requires minor or no prior knowledge of the distribution of input data in the estimation of nonlinear relationships.In adopting the ANN-CA, the self-organizing and adaptive approach of the ANN can make its integration with CA more self-adaptive than the traditional models [64].Because of their self-learning ability, the unknown relationships among variables can then be addressed with the automatic approximation of nonlinear functions using the ANN which is considered to be superior to linear-based regression models like LR and AHP [65].Several researchers have also reported success in applying ANNs for LULC modeling in different regions [66][67][68].
By simulating the spatiotemporal LULC change within the GDC in Botswana for three decades from 1986 to 2019 at 5-year intervals and incorporating the LULC change drivers comprising physiographical variables (elevation (DEM), slope and aspect) and neighborhood proximity factors (distances to water bodies, roads and urban areas), the objectives of the study are to: (1) derive the spatiotemporal LULC change trends within the Gaborone dam catchment from 1986 to 2019 using machine learning and multiple features; (2) develop an RFR model that integrates transition mapping with driving factors for LULC change prediction within the GDC study area; (3) compare the performance of the proposed RFR-LULC change prediction model with the parametric LR-CA and nonparametric ANN-CA hybrid models for the simulation and prediction of short-term LULC change in GDC for 2025 and 2030 and ( 4) analyze the relationship between land-use change and dam water variability within the Gaborone dam catchment.
In addition to improving the LULC mapping using machine learning with multiple classification features comprising multispectral bands, vegetation and soil indices and GLCM texture cues, the proposed RFR for LULC change simulation is a novel approach aimed at overcoming the drawbacks in the parametric LR-CA and nonparametric ANN-CA hybrid models especially for complex catchment landscapes.By testing the efficacy of machine learning in simulating and predicting LULC change within dam catchment environments, and for a case study where the proposed approach has not been applied before, this study offers a benchmark for applying machine learning in such environments, with the goal of providing accurate future LULC developments in support of land-use planning for sustainability [69].In addition, the current research aligns with the aim and scope of SDG 11 and SDG 15 as it highlights the significance of interdisciplinary studies, innovative approaches and technological advancements in promoting integrated resource monitoring and management for sustainable development at different spatial scales.

Study Area
Gaborone dam catchment (GDC) is a sub-catchment in Botswana's Limpopo River Basin (BLRB).BLRB is part of the larger Limpopo River Basin (LRB), located in southern Africa, and is a transboundary basin encompassing portions of Botswana, Mozambique, South Africa and Zimbabwe.It is formed by several sub-catchments, with GDC being one of them (Figure 1).GDC is a sub-catchment within Botswana's Limpopo River Basin (BLRB) in the southeastern part of Botswana and lies between latitudes 25 • 16 ′ E to 26 • 30 ′ E and longitudes 24 • 42 ′ S to 25 • 34 ′ S. The catchment overlaps the northwest side of South Africa (Figure 1) and has a total area of approximately 4344 km 2 .It is one of the upstream sub-catchments within the Limpopo River Basin, with Notwane River as the main drainage running from South Africa through the Gaborone dam to the Limpopo River.The catchment has an average altitude of approximately 1292 m AMSL and average temperatures of between 19.7 • C and 32.7 • C and annual average rainfall of approximately 500 mm.The dam catchment is one of the main water catchments and also one of the most densely populated catchments in the arid and semiarid Botswana.The catchment is thus considered as environmentally sensitive resulting from the characteristics of the fragile dry desert environment and its susceptibility to rapid successive changes due to human activities and climate change.Catchment dynamics as influenced by human activities and natural phenomena like climate changes results in LULC changes that make LULC prediction an inevitable step in resource planning and management.before, this study offers a benchmark for applying machine learning in such environments, with the goal of providing accurate future LULC developments in support of landuse planning for sustainability [69].In addition, the current research aligns with the aim and scope of SDG 11 and SDG 15 as it highlights the significance of interdisciplinary studies, innovative approaches and technological advancements in promoting integrated resource monitoring and management for sustainable development at different spatial scales.

Study Area
Gaborone dam catchment (GDC) is a sub-catchment in Botswana's Limpopo River Basin (BLRB).BLRB is part of the larger Limpopo River Basin (LRB), located in southern Africa, and is a transboundary basin encompassing portions of Botswana, Mozambique, South Africa and Zimbabwe.It is formed by several sub-catchments, with GDC being one of them (Figure 1).GDC is a sub-catchment within Botswana's Limpopo River Basin (BLRB) in the southeastern part of Botswana and lies between latitudes 25°16′ E to 26°30′ E and longitudes 24°42′ S to 25°34′ S. The catchment overlaps the northwest side of South Africa (Figure 1) and has a total area of approximately 4344 km 2 .It is one of the upstream sub-catchments within the Limpopo River Basin, with Notwane River as the main drainage running from South Africa through the Gaborone dam to the Limpopo River.The catchment has an average altitude of approximately 1292 m AMSL and average temperatures of between 19.7 °C and 32.7 °C and annual average rainfall of approximately 500 mm.The dam catchment is one of the main water catchments and also one of the most densely populated catchments in the arid and semiarid Botswana.The catchment is thus considered as environmentally sensitive resulting from the characteristics of the fragile dry desert environment and its susceptibility to rapid successive changes due to human activities and climate change.Catchment dynamics as influenced by human activities and natural phenomena like climate changes results in LULC changes that make LULC prediction an inevitable step in resource planning and management.

Data
For LULC mapping, Landsat 5 TM was used for 1986, 1989, 1994, 1999, 2004 and 2009, and Landsat 8 OLI was utilized for 2014 and 2019.The Landsat data were downloaded from the United States Geological Survey Earth Explorer (www.earthexplorer.usgs.gov).To minimize the seasonality effects, the datasets were acquired during the same time of year in April and May.Cloudless images for the study years were filtered for the month using Google Earth Engine (GEE).The pre-processing of the data comprised geometric correction, mosaicking of the different paths and rows and atmospheric and radiometric correction.Geometric correction involved orienting the imagery to the local coordinate projection system, and to minimize radiometric errors, the radiometric tool in ERDAS Imagine was used to calibrate the satellite images.The radiometric correction process involves the conversion of the digital number (DN) as raw data from sensors to top-ofatmosphere reflectance as actual ground surface reflectance.The multitemporal images were all calibrated for top-of-atmosphere (TOA) reflectance.Atmospheric effects restrict the dynamic range with the image having haziness and low contrast.The atmospheric effects on the satellite images were corrected using a haze reduction tool in ERDAS Imagine, based on dark object subtraction (DOS).The multitemporal images were radiometrically calibrated by converting raw digital numbers (DNs) to sensor spectral radiance (L λ ).
The LULC change drivers are usually characterized by proximity-neighborhood factors (distance to major towns and cities, roads, rain networks, water bodies, streams); topography (elevation, slope, aspect) and demographic variables (population and population density).The driver variables are spatially and temporally multifaceted.The LULC change drivers used in the LULC simulation modeling included natural physiographical variables (elevation (DEM), slope and aspect), and neighborhood-proximity factors (distances to water bodies, roads and urban areas) (Table 1).The datasets and the data sources are presented in Table 1.The datasets were recategorized and resampled to 30 m spatial resolution as presented in Figure 2.

Methods
The summary flow diagram for the implementation of the LULC simulation, prediction and analysis for the land-water analysis nexus is presented in Figure 3.The implementation is in four phases: (i) LULC mapping and derivation of LULC change-driving factors, (ii) generation of transition probability and transition maps and enforcing the neighborhood influence, (iii) prediction for future LULC scenarios for 2025 and 2030 and (iv) investigating the relationship between LULC change and dam water variability within the catchment.
The LULC classification, simulation and prediction were carried out within the Google Earth Engine (GEE) using the Java programming language and supported by Modules for Land-use Change Simulations (MOLUSCE) in QGIS.The LULC change area transition matrix and the corresponding transition possibility matrix which reveals the likelihood of the LULC transitions in Figure 3 were generated using LR, ANN and RFR.The transition potentials were evaluated for the historical (1999-2019) years and integrated in the prediction of the future LULC cell status using CA (for LR-CA and ANN-CA) and RFR-LULC transition modeling.

Methods
The summary flow diagram for the implementation of the LULC simulation, prediction and analysis for the land-water analysis nexus is presented in Figure 3.The implementation is in four phases: (i) LULC mapping and derivation of LULC change-driving factors, (ii) generation of transition probability and transition maps and enforcing the neighborhood influence, (iii) prediction for future LULC scenarios for 2025 and 2030 and (iv) investigating the relationship between LULC change and dam water variability within the catchment.The multitemporal LULC mapping was carried out within the GEE platform using random forest classifier (RFC).RFC uses "parallel ensembling" which fits several decision tree classifiers in parallel on different data sub-samples and uses majority voting or averages for the outcome class as depicted in Figure 4. To build a series of decision trees with The LULC classification, simulation and prediction were carried out within the Google Earth Engine (GEE) using the Java programming language and supported by Modules for Land-use Change Simulations (MOLUSCE) in QGIS.The LULC change area transition matrix and the corresponding transition possibility matrix which reveals the likelihood of the LULC transitions in Figure 3 were generated using LR, ANN and RFR.The transition potentials were evaluated for the historical (1999-2019) years and integrated in the prediction of the future LULC cell status using CA (for LR-CA and ANN-CA) and RFR-LULC transition modeling.

LULC Mapping Using Machine Learning with Multiple Input Features
The multitemporal LULC mapping was carried out within the GEE platform using random forest classifier (RFC).RFC uses "parallel ensembling" which fits several decision tree classifiers in parallel on different data sub-samples and uses majority voting or averages for the outcome class as depicted in Figure 4. To build a series of decision trees with controlled variations, RFC combines bootstrap aggregation (bagging) and random feature selection.RFC minimizes the classification overfitting problem and increases the class prediction accuracy and control.As such, the RFC learning model with multiple decision trees is typically more accurate than a single decision-tree-based model, especially in detecting different LULC classes [70,71].The overall advantage of RFC is that it can produce stable and accurate results even with minimal tuning of the hyperparameters.The algorithm is easy to parameterize, insensitive to overfitting and deals with outliers in training data, reporting the classification error and variable significance [72].For each year, the training samples were visually collected in polygons with each polygon comprising 200 pixels.For all the LULC classes except water, 117 polygons were used for training and 20 polygons for validation of the results.For the water class, and due to its being smaller in areal extent, the training and validation comprised 70 and 30 polygons, respectively.The optimization and hyperparameterization of the RFC were implemented as detailed [70].To improve the classification accuracy, multiple input features comprising mean and variance gray-level co-occurrence matrix (GLCM) textures from the first principal component of the multispectral image data were found to be useful in capturing the structural heterogeneity of classes.In addition, the normalized difference vegetation index (NDVI) (Equation ( 1)) and dry bare soil index (DBSI) (Equation ( 2  For each year, the training samples were visually collected in polygons with each polygon comprising 200 pixels.For all the LULC classes except water, 117 polygons were used for training and 20 polygons for validation of the results.For the water class, and due to its being smaller in areal extent, the training and validation comprised 70 and 30 polygons, respectively.The optimization and hyperparameterization of the RFC were implemented as detailed [70].To improve the classification accuracy, multiple input features comprising mean and variance gray-level co-occurrence matrix (GLCM) textures from the first principal component of the multispectral image data were found to be useful in capturing the structural heterogeneity of classes.In addition, the normalized difference vegetation index (NDVI) (Equation ( 1)) and dry bare soil index (DBSI) (Equation ( 2)) were computed and integrated in the classification database.The indices and textural features are included to enhance different land-cover classes as vegetation and bare soil and to improve the overall classification accuracy.Adopting the EU Copernicus Global Land Cover classification scheme (https://lcviewer.vito.be/2019(accessed on 10 August 2023)), the classified classes included tree cover, shrubland (shrubs and savanna), grassland, cropland, water (surface water; dams, rivers and ponds), built-up areas (buildings, roads and airports) and bare soil open land.
The LULC classification accuracy assessment was carried out using the overall accuracy (OA) and the kappa coefficient (K) (Equations ( 3) and ( 4)).For n classes with i categories (i = 1, 2, 3, . . ., n), the TP, TN, FP and FN representing true positives, true negatives, false positives and false negatives, respectively, are calculated and the OA and K derived.
where R is the red wavelength; G is the green wavelength; NIR is the near-infrared; SWIR1 is the shortwave infrared band 5 (Landsat 5/7 TM/ETM+) and band 6 (Landsat 8 OLI); Total, i , and P Total, i = TP i + FP i + TN i + FN i .

Logistic Regression
Logistic regression (LR) is a statistical model in the generalized linear model class.It allows for the formation of multivariate regression relationships between a dependent variable and multivariate independent variables.In LULC prediction, LR uses the binary logistic regression model to analyze and evaluate the driving factors for land-use class transitions.The goal of LR in LULC prediction is to determine the best-fitting model to map the probability of transition of an LULC class to another class based on a set of driving factors (independent variables).The LR model prepares the LULC change probability maps for determining the adaptable model that conveys the correct relationships between the probabilities of the dependent and independent variables.The resultant output of the adaptable model includes the probability surface maps of the dependent variable based on the coefficients of independent variables.
As expressed in Equation ( 5), LR can be used to determine the transition probability P of various LULC types Y i (Equation ( 6)) in a specific spatial location [46].
where P i,j is the probability of a class transitioning to another; P i,j / 1 − P i,j is the "odds ratio" of an event which represents the probability that an outcome will occur given a particular condition compared to the odds of the outcome occurring in the absence of that condition; X i is the independent variable representing the driving factor and β i = b is the estimated regression coefficient of each selected variable X i .

Multilayer Perceptron (MLP) Artificial Neural Networks
The backpropagation MLP neural network can perform nonparametric regression analysis.With an input layer, an output layer and one or more hidden layers between the input and output layers, each hidden and output layer neuron processes its inputs by multiplying each input I n i by a weight W j i , summing the product and then processing the sum (if that exceeds the neuron threshold, then the neuron is activated) using a nonlinear activation function to derive I n+1 i .Equation ( 7) defines how a neuron in the receiver layer receives values from the neurons from the sender layer, where I n i is the input value from the ith neuron in the sender layer and I n+1 j (Equation ( 8)) is the output generated by the jth neuron in the receiver layer.W The training data are a set of points (pixels) that are randomly selected from the maps.Each pixel is represented by a vector that is composed of the values of the driving factors.In addition, each pixel is assigned to a land-use type of the corresponding LULC map for a given year.The points are used to train the corresponding ANNs, with an ANN for each land use and its complements.The resulting probability maps are used as potential transition maps to improve the CA simulation.By combining the transition rules, the transition matrix and the potential maps that are produced by the ANN, the ANN-CA considers both spatial and temporal dynamics of LULC change and adequately incorporates the influence of the driving forces.
The transition potential for this study was trained with a momentum of 0.05 and learning rate of 0.1 for the stabilization of the learning graph.The number of iterations was set to 150 to minimize model overfitting.These thresholds are obtained from the ANN learning process and are derived when the highest transition probability is greater than the threshold value of 0.9 as proposed in previous experiments [67,73].Below a probability of 0.9, the cells remain unchanged as the threshold of 0.9 is used to keep the LULC changes stable in each iteration, thus obtaining fine simulation patterns.In the ANN-CA simulation, the state of the new cell is determined by the existing state of a current cell and changes in the neighborhood cells in the CA.The ANN-CA simulation selects raster data, such as classes of LULC, raster of spatial parameters and transition potential model, based on the ANN algorithm [74].The potential changes are determined for each class, and the simulation creates a raster of the most likely transitions.The simulation examines a fixed number of pixels, with the greatest certainty for each transition corresponding to the most likely transitions, and then it adjusts the class of the pixel.Further detailed implementation steps for the ANN-CA model are outlined [67].

Cellular Automata (CA) LULC Change Prediction Model
CA has the ability to model the proximity influences, which are considered as an essential spatial element that reflects the dynamics of land-use changes.The basic principle of CA is that the past LULC patterns affects future development through the local interactions which collectively constitute the global growth patterns within a region [75].The CA simulation assumes that an LULC class has a higher tendency to transition to another category if the neighboring regions belong to or are influenced by that class category [76].Discrete in space, time and state, CA is able to carry out complex time-space simulations [77].
CA consists of grid cells, cell space, neighbor, rule and time.Each cell has an internal state with a value which belongs to a set and updating these states is carried out simultaneously with the transition rule of each neighboring cell.Every cell is equivalent to a pixel on the area map.Each land-use category is represented by a cellular state and the cell data state S t+1 are decided by the cell and its neighboring cells in the S t state.The transition rules then determine the change in the cell state.The results are derived from the suitability maps that represent the potential of a cell to change from one state to another.In general, the CA model shows a cell's interaction and its state of change which affects the spatiotemporal pattern of neighboring cells.The cell neighborhood is determined by a filter and the closer the distance between the nuclear cell and the neighbor, the higher the influence weight.In the CA implementation, a contiguity filter of 5 × 5 pixels with 30 m × 30 m spatial resolution is adopted as suggested by [55].By combining the weight with the transition probability, the next potential state of the adjacent cells is derived.The transition model is expressed as in Equation ( 9): where t and t + 1 are the beginning and end of simulations; S t+1 i,j and S t i,j are the state of the cell in row i and column j at time t and t + 1; Ω t i,j is the state of neighbors of the cell in row i and column j at t; V is the set of suitability factors and f is the transition law or function considered either as the sum [78] or as the product of all the terms [79,80].

Random Forest Regression for LULC Prediction
For multiple predicted factors, the uncertainty of the inspection variable changes with the LULC type and its intensity.This implies that the interactions between the LULC classes and the driving factors tend to be nonlinear.RFR is a nonlinear model that is suitable for the regression of datasets that do not exhibit linearity.RFR can capture the existing and potential interactions between the type and intensity of an LULC class and the influencing (driving) factors [81].In its implementation, the proposed RFR-LULC change simulation and prediction model first extracts the multitemporal class area changes, from which the transition probability matrices for the successive years are derived.This is followed by the generation of transition potential maps which are combined with the transition probabilities to simulate the future LULC future scenarios.
The RFR machine learning technique for modeling LULC can be regarded as a classification problem [82] and can formally be represented such that if C = {c 1 , c 2 , . . ., c k } is a set of k LULC classes represented in grids, then each grid cell at time point t can be represented as a row vector x t = {x t 1 , . . ., x t i ,. . ., x t n }, where x t i represents the ith spatial attribute value assigned to the cell.Let y t from C be an LULC class of x t at time t, then a prediction function f p : x t →y t+1 can be applied over every x t on the study area grid on the condition that f p (x t ) = y t+1 holds whenever the cell for x t changes its land use to the class y t+1 .The model constraint is that each cell can belong to only one class at any time t.The transition function f p maps the grid at time t to the LULC classes at the next time t + 1, and the ML learns the function f p ′ which approximates the unknown f p using the training set in which all of the attribute values at time t and land-use classes at time t + 1 are known a priori.
For each time interval (t, t + 1), the initial dataset I t, t+1 is created and contains the driver attributes x t with y t+1 as the LULC class at time t + 1 for each grid cell.To learn the predictive function f p ′ , a training set {(x t , y t+1 ) i }, i = 1, 2, . . ., n, where n is the training set sample, is constructed from the initial dataset (I t, t+1 ), with x i t representing the grid cell at time t and y i t+1 being the corresponding LULC class at time t + 1.The training model is tested using an independent test set comprising {(x t+1 , y t+2 ) i }, i = 1, 2, . . ., m, where m is the test dataset samples, and the test dataset is derived from the corresponding initial set I t+1 , t+2 .This implies that three time points are used for training (t, t + 1) and for testing (t + 1, t + 2) the proposed RFR prediction model.
The RFR training process generally involves each tree using concentrated training data and learning from samples of the random sets.The predicted variables from the random set increase exponentially without trimming of the trees and the second step is iterated until the number of trees increases.In the final iteration, the average predicted LULC types and intensity are derived [83].In predicting the areal extents of LULC classes using RFR, the class area (CA) for each classification year is computed according to Equation (10): where CA = area of the LULC class type (in square kilometers); C = class cell count; X = height of class cell and Y = width of class cell.
To evaluate the predictive power of the approximation, it is necessary to test the model that is built in the previous step with inputs x t+1 and to compare f p ′ (x t+1 ) with known y t+2 .Hence, RFR-LULC change prediction model building and validation assume the availability of data on spatial attributes and land-use classes from three different time epochs.

Validation of LULC Prediction
For the training of the LR-CA, ANN-CA and RFR models, two consecutive years (t, t + 1) are used, and the prediction performance is measured by comparing the predicted with the classified LULC at (t + 2).From the derived difference statistics, the LULC prediction results are validated using the percentage of correctness (PC) in Equation (11).PC determines the agreement and disagreement between the simulated and the reference LULC maps.
where p a is the proportion of observed agreements (actual accuracy); p e is the proportion of agreements expected by chance and p i is the ideal accuracy (100%).p i T p T j , where p ij is the ith and jth cell of the contingency table, p i T p is the sum of all cells in the ith row, pT j is the sum of all cells in the jth column and c is the count of the raster category.

LULC Change Patterns in the Gaborone Dam Catchment
Figure 5 presents the classified LULC maps for the eight study years using the RF classifier.As already stated above, the mapping adopted the EU Copernicus Global Land Cover classification scheme (https://lcviewer.vito.be/2019(accessed on 10 August 2023)).The summary accuracy of the LULC classification results is presented in Table 2.The overall accuracy (OA) is observed to be above 81% for all the years, with the highest OA at 89.6% for the year 2004, and the lowest OA at 81.3% for 2009.The corresponding kappa coefficients ranged from 0.74-0.84(Table 2).The LULC classification accuracy for 2019 was dynamically compared with the EU Copernicus Global Land Cover of the same year, with high a degree of visual agreements for the classes.From the classification results, the LULC change trends in the catchment are presented in Figure 6.Shrubland, tree cover and cropland are observed to be the dominant LULC in terms of area coverages (Figure 6a).Tree cover is noted to decline from 1673.70 km 2 to 752.46 km 2 during the 1989 to 1999 period.The decline in tree cover could be attributed to the conversion of the forest cover into shrubland, which is mainly driven by urban development and agricultural activities.As a consequence, an increase in shrubland, bare soil and built-up areas is observed.Grassland decreased from 207.84 km 2 in 1986 to 167.24 km 2 in 1999 followed by a surge in the 2004-2009 period.Cropland had an increase from 552.97 km 2 in 1986 to 639.36 km 2 in 1989 but reduced between 1994 and 1999 from 584.81 km 2 to 451.62 km 2 .From 2004 to 2019, tree cover had an increase in area coverage following a previous decline.Shrubland and grassland reduced in area followed by a rise in bare soil and built-up areas.Water bodies had a marginal decline in area followed by an increase from 7.66 km 2 in 2014 to 17.88 km 2 2019.A gradual but steady growth is observed for the built-up areas throughout the 30year period from approximately 20 km 2 (1986) to 173 km 2 (2019).Particularly rapid growth was observed in the total area covered by the built-up class from 1999-2009 as the area increased from 70.25 km 2 to 172.95 km 2 .This increase was, however, at the expense of areas covered by shrubland which decreased considerably.The area covered by surface water increased from 13.39 km 2 in 1986 to 21.35 km 2 in 1989, and thereafter a steady decline was observed, which is mainly attributed to climate variations.The corresponding class areal gains and losses are presented in Figure 6b and shows net 30-year losses for tree cover (−10.7%),cropland (−0.6%) and water bodies (−0.3%), while gains are observed for shrubland/grassland (+6.7%), bare soil (+1.3%) and built-up areas (+3.4%).Some of the ob-  A gradual but steady growth is observed for the built-up areas throughout the 30-year period from approximately 20 km 2 (1986) to 173 km 2 (2019).Particularly rapid growth was observed in the total area covered by the built-up class from 1999-2009 as the area increased from 70.25 km 2 to 172.95 km 2 .This increase was, however, at the expense of areas covered by shrubland which decreased considerably.The area covered by surface water increased from 13.39 km 2 in 1986 to 21.35 km 2 in 1989, and thereafter a steady decline was observed, which is mainly attributed to climate variations.The corresponding class areal gains and losses are presented in Figure 6b and shows net 30-year losses for tree cover (−10.7%),cropland (−0.6%) and water bodies (−0.3%), while gains are observed for shrubland/grassland (+6.7%), bare soil (+1.3%) and built-up areas (+3.4%).Some of the observed changes, especially in water and natural vegetation cover, are not only attributed to anthropogenic activities but also to climatic conditions.Due to the high correlations between grassland and shrubland covers, the two classes are combined into one class as shrubland/grassland or just shrubland for further analysis.

LULC Class Transition Analysis
The transitional probabilities for the different LULC classes between 1986 and 2019 are presented in Table 3.During the early years with minimal built-up area, it was observed that the built-up area for the period of 1986-1989 had the likelihood of being converted to shrubland or grassland with a transitional probability of >50%.This is also perceived for 1989-1994 and 1994-1999 periods as the prospect of the built-up class remaining the same is shown to be unlikely with <35% transitional probability.From 1999 onwards, however, the built-up class is observed to be resistant to change to other land-use classes with >40% probability of remaining unchanged.The water class showed a constant trend of remaining unchanged throughout the entire study period.This is especially evident for the periods of 1986-1989, 1994-1999, 2004-2009 and 2014-2019 exhibiting very minimal transitional probabilities of 87%, 82%, 96% and 93%, respectively.Similarly, the shrub/grassland class also showed a steady trend of remaining unchanged throughout the study period with a probability of >63% except for bare soil with 32-63% conversion probability.The cropland class showed a high possibility of being converted to another land-use class from the year 1994 onwards, having a probability of <43% to remain as cropland, implying that it may be converted to shrub/grassland with a probability of >44%.Tree cover showed moderate resistance to conversion to other land-use classes.This is especially the case during the periods of 1986-1989 and 1989-1994 with corresponding probabilities of 57% and 63%.Conversely, from 1994 to 1999, tree cover had a likelihood of being converted to shrub/grassland exhibiting a transitional probability of 50%.Nonetheless, the tree cover class remained unchanged from 1999 onwards with a transitional probability of >60%.

Calibration of LULC Transition Potential
The LULC calibrations were carried out for 1994,1999,2004,2009,2014 and 2019 to evaluate the performance of the prediction algorithms.The average percentage of correctness was computed between the classified and predicted LULC transition maps.The LR prediction results (Figure 7) show a fair correlation with the reference classified LULC in terms of areas covered by the vegetation classes, with shrubland being the most underestimated vegetation cover in 1999.LR, however, underestimated the built-up area except for 1994 and 2019.The most accurate built-up area prediction using LR was recorded in 2019 with 167 km 2 compared to the classified area of 172.95 km 2 , indicating an error of 3.2% (5.51 km 2 ).LR predicted an increase of 82 km 2 in cropland from 1994-1999, with a steady decrease between the next subsequent years of −37 km 2 from 2004-2009 and −35 km 2 from 2014 to 2019.Bare soils had mixed prediction results with overestimation in 1994, 2004, 2014 and 2019, while underestimation was observed in 1999 and 2009.
Using ANN prediction, the results in Figure 8 present the areas covered by the LULC classes from 1994 to 2014.For the years 1999, 2009 and 2019, there is a considerable difference between the classified and predicted results for shrubland.This could be attributed to the fluctuations of the areas covered by the vegetation classes which affected the learning of the prediction model.The built-up area shows a consistent increase throughout the years for both classified and predicted output, however, with under-and overestimations for different years.The water class is the most accurately predicted class as the differences between the classified and the predicted areas are observed to be minimal.timated vegetation cover in 1999.LR, however, underestimated the built-up area except for 1994 and 2019.The most accurate built-up area prediction using LR was recorded in 2019 with 167 km 2 compared to the classified area of 172.95 km 2 , indicating an error of 3.2% (5.51 km 2 ).LR predicted an increase of 82 km 2 in cropland from 1994-1999, with a steady decrease between the next subsequent years of −37 km 2 from 2004-2009 and −35 km 2 from 2014 to 2019.Bare soils had mixed prediction results with overestimation in 1994, 2004, 2014 and 2019, while underestimation was observed in 1999 and 2009.Using ANN prediction, the results in Figure 8 present the areas covered by the LULC classes from 1994 to 2014.For the years 1999, 2009 and 2019, there is a considerable difference between the classified and predicted results for shrubland.This could be attributed to the fluctuations of the areas covered by the vegetation classes which affected the learning of the prediction model.The built-up area shows a consistent increase throughout the years for both classified and predicted output, however, with under-and overestimations for different years.The water class is the most accurately predicted class as the differences between the classified and the predicted areas are observed to be minimal.As compared to the LR and ANN LULC class prediction validation results, the results in Figure 9 show that RFR predictions have a higher degree of closeness to the classified LULC.Except for shrubland which is overestimated by RFR in all the years except in 2019, the rest of the predicted classes have marginal differences from the classified LULC classes.The average LULC validation of the prediction accuracies per year using LR, ANN and RFR models is presented in Figure 10 with marginal difference between LR (PC = 60.7%) and ANN (PC = 62.1%), while the average percentage correctness for RFR is observed at 84.9%.As compared to the LR and ANN LULC class prediction validation results, the results in Figure 9 show that RFR predictions have a higher degree of closeness to the classified LULC.Except for shrubland which is overestimated by RFR in all the years except in 2019, the rest of the predicted classes have marginal differences from the classified LULC classes.The average LULC validation of the prediction accuracies per year using LR, ANN and RFR models is presented in Figure 10 with marginal difference between LR (PC = 60.7%) and ANN (PC = 62.1%), while the average percentage correctness for RFR is observed at 84.9%.Using ANN prediction, the results in Figure 8 present the areas covered by the LULC classes from 1994 to 2014.For the years 1999, 2009 and 2019, there is a considerable difference between the classified and predicted results for shrubland.This could be attributed to the fluctuations of the areas covered by the vegetation classes which affected the learning of the prediction model.The built-up area shows a consistent increase throughout the years for both classified and predicted output, however, with under-and overestimations for different years.The water class is the most accurately predicted class as the differences between the classified and the predicted areas are observed to be minimal.As compared to the LR and ANN LULC class prediction validation results, the results in Figure 9 show that RFR predictions have a higher degree of closeness to the classified LULC.Except for shrubland which is overestimated by RFR in all the years except in 2019, the rest of the predicted classes have marginal differences from the classified LULC classes.The average LULC validation of the prediction accuracies per year using LR, ANN and RFR models is presented in Figure 10 with marginal difference between LR (PC = 60.7%) and ANN (PC = 62.1%), while the average percentage correctness for RFR is observed at 84.9%.

LULC Prediction for 2025 and 2030
Using LR-CA, ANN-CA and RFR models, the LULC prediction summary results for each class are presented in Figure 11 and Table 4 for 2019-2030.From the results, water bodies are predicted to remain constant in area for all three prediction models, with LR-CA and ANN-CA predicting minor decrease of −0.01%for 2019-2025 and no changes for 2025-2030.RFR showed a total increase of +0.05% for the entire duration of 2019-2030.LR-CA and ANN-CA predicted the built-up area to change by nearly the same magnitude of +0.14% between 2019 and 2030, thus occupying an area of 179.23 km 2 increased from 172.95 km 2 .RFR, however, showed a net increase of +2.6% within the same period with a higher growth rate of +1.63% predicted from 2025-2030, thus predicting an increase in urban built-up area from 3.98% (2019) to 6.56% (2030).Tree cover prediction largely remained nearly constant at 31.41% from LR-CA and ANN-CA predictions for the two time periods.RFR however, showed a net decrease of −0.68% for the two periods which can be explained by the observed increase in built-up area taking up more space within the catchment.For shrubland/grassland, only LR-CA predicted a net increase of +0.03% from 2019-2030, however, for the same period ANN-CA and RFR showed respective decreases of −0.19% and −8.91%.The decrease in tree cover, shrubland/grassland and cropland may not only be explained by the increase in built-up area but also by the increase in bare soil cover of 8.79%, with most of the increase (8.65%) observed from 2025-2030.

LULC Prediction for 2025 and 2030
Using LR-CA, ANN-CA and RFR models, the LULC prediction summary results for each class are presented in Figure 11 and Table 4 for 2019-2030.From the results, water bodies are predicted to remain constant in area for all three prediction models, with LR-CA and ANN-CA predicting minor decrease of −0.01%for 2019-2025 and no changes for 2025-2030.RFR showed a total increase of +0.05% for the entire duration of 2019-2030.LR-CA and ANN-CA predicted the built-up area to change by nearly the same magnitude of +0.14% between 2019 and 2030, thus occupying an area of 179.23 km 2 increased from 172.95 km 2 .RFR, however, showed a net increase of +2.6% within the same period with a higher growth rate of +1.63% predicted from 2025-2030, thus predicting an increase in urban built-up area from 3.98% (2019) to 6.56% (2030).Tree cover prediction largely remained nearly constant at 31.41% from LR-CA and ANN-CA predictions for the two time periods.RFR however, showed a net decrease of −0.68% for the two periods which can be explained by the observed increase in built-up area taking up more space within the catchment.For shrubland/grassland, only LR-CA predicted a net increase of +0.03% from 2019-2030, however, for the same period ANN-CA and RFR showed respective decreases of −0.19% and −8.91%.The decrease in tree cover, shrubland/grassland and cropland may not only be explained by the increase in built-up area but also by the increase in bare soil cover of 8.79%, with most of the increase (8.65%) observed from 2025-2030.

LULC Prediction for 2025 and 2030
Using LR-CA, ANN-CA and RFR models, the LULC prediction summary results for each class are presented in Figure 11 and Table 4 for 2019-2030.From the results, water bodies are predicted to remain constant in area for all three prediction models, with LR-CA and ANN-CA predicting minor decrease of −0.01%for 2019-2025 and no changes for 2025-2030.RFR showed a total increase of +0.05% for the entire duration of 2019-2030.LR-CA and ANN-CA predicted the built-up area to change by nearly the same magnitude of +0.14% between 2019 and 2030, thus occupying an area of 179.23 km 2 increased from 172.95 km 2 .RFR, however, showed a net increase of +2.6% within the same period with a higher growth rate of +1.63% predicted from 2025-2030, thus predicting an increase in urban built-up area from 3.98% (2019) to 6.56% (2030).Tree cover prediction largely remained nearly constant at 31.41% from LR-CA and ANN-CA predictions for the two time periods.RFR however, showed a net decrease of −0.68% for the two periods which can be explained by the observed increase in built-up area taking up more space within the catchment.For shrubland/grassland, only LR-CA predicted a net increase of +0.03% from 2019-2030, however, for the same period ANN-CA and RFR showed respective decreases of −0.19% and −8.91%.The decrease in tree cover, shrubland/grassland and cropland may not only be explained by the increase in built-up area but also by the increase in bare soil cover of 8.79%, with most of the increase (8.65%) observed from 2025-2030.The results in Figure 12 show the RFR-predicted LULC for 2025 and 2030, as RFR presented the best validation and prediction results as compared to LR-CA and ANN-CA (Figure 10).From visual comparison of the 2025 and 2030 predicted LULC, most of the increase in built-up area is detected in the southeast parts of the catchment.These are also the regions with close proximity to forest tree cover and hence are more suitable for   The results in Figure 12 show the RFR-predicted LULC for 2025 and 2030, as RFR presented the best validation and prediction results as compared to LR-CA and ANN-CA (Figure 10).From visual comparison of the 2025 and 2030 predicted LULC, most of the increase in built-up area is detected in the southeast parts of the catchment.These are also the regions with close proximity to forest tree cover and hence are more suitable for settlement given the semiarid and arid climatic conditions within the catchment.The forest cover is observed to be largely conserved, and so is the dam water body.
Sustainability 2024, 16, x FOR PEER REVIEW 21 of 31 settlement given the semiarid and arid climatic conditions within the catchment.The forest cover is observed to be largely conserved, and so is the dam water body.

Comparison of the LULC Prediction Models
The prediction of the future LULC changes is important for, among others, urban and regional planners, hydrologists, water policymakers and environmentalists in making appropriate decision policies for future developments and sustainability.As such, spatiotemporal LULC changes in catchments are modeled to detect the consequences of longterm interactions between humans and the environment [84], hence contributing to sustainable development [2].For this purpose, location-based models are commonly utilized [85] to simulate the LULC dynamics.
Cellular automata are the most widely used location-based models especially for simulating the spatiotemporal evolution of LULC and urban expansion [86].The simplicity and explicit representation of LULC changes [87] make the CA model a standard model for simulating such dynamics.While the CA model assumption is based on the effects of past changes on future transitions [75], this background convention limits the model's ability to realistically simulate the complex nature of LULC over time.In addition, the transition rule which extracts the state of a cell over time [86] and is an essential function in the CA model varies in terms of geographical regions and neighborhood interactions.Integration of CA with other statistical and geospatial models can enhance the CA model's predictability [3,75].Commonly used reinforcement methods are the LR model [88,89], Markov chain analysis (MCA) [90,91], agent-based model [92,93] and ANNs [2,86].
This study successfully compared hybrid statistical LR-CA and nonparametric ANN-CA models with a machine learning RFR approach for the simulation and prediction of LULC in a dam catchment.The models were validated using the degree of correctness and ANN performed better LR with a percentage of correctness of 62.1% which was slightly higher than that of LR by 1.4%.The results of the validation of the performance prediction models show that the proposed RFR outperformed the hybrid models by about 23% in

Discussions 4.1. Comparison of the LULC Prediction Models
The prediction of the future LULC changes is important for, among others, urban and regional planners, hydrologists, water policymakers and environmentalists in making appropriate decision policies for future developments and sustainability.As such, spatiotemporal LULC changes in catchments are modeled to detect the consequences of long-term interactions between humans and the environment [84], hence contributing to sustainable development [2].For this purpose, location-based models are commonly utilized [85] to simulate the LULC dynamics.
Cellular automata are the most widely used location-based models especially for simulating the spatiotemporal evolution of LULC and urban expansion [86].The simplicity and explicit representation of LULC changes [87] make the CA model a standard model for simulating such dynamics.While the CA model assumption is based on the effects of past changes on future transitions [75], this background convention limits the model's ability to realistically simulate the complex nature of LULC over time.In addition, the transition rule which extracts the state of a cell over time [86] and is an essential function in the CA model varies in terms of geographical regions and neighborhood interactions.Integration of CA with other statistical and geospatial models can enhance the CA model's predictability [3,75].Commonly used reinforcement methods are the LR model [88,89], Markov chain analysis (MCA) [90,91], agent-based model [92,93] and ANNs [2,86].
This study successfully compared hybrid statistical LR-CA and nonparametric ANN-CA models with a machine learning RFR approach for the simulation and prediction of LULC in a dam catchment.The models were validated using the degree of correctness and ANN performed better LR with a percentage of correctness of 62.1% which was slightly higher than that of LR by 1.4%.The results of the validation of the performance prediction models show that the proposed RFR outperformed the hybrid models by about 23% in percent accuracy.While LR lacks representation of the effects of LULC drivers [86] and spatial dependency [94], the ANN algorithm is more able to address the spatial probability of changes and can be trained to estimate the probability of occurrence from nonlinear functions through training by weight change and calibration to simulate a more realistic projection of LULC changes [68,95].The combination of LR and ANN is envisaged to increase the LULC change prediction accuracy, however, the main drawback in CA is the inability to include trends from previous states and driver variables responsible for LULC changes in the basin area.This could contribute to lower performance of the LR-CA and ANN-CA hybrid models.
Comparatively, in a study carried out for Addis Ababa (Ethiopia) using MC-CA, for example, an average validation accuracy of 87% was achieved for three test years of 2005, 2011 and 2015 [96].However, in the current research, the employment of ANN with the CA model resulted in an overall accuracy of 62.1%.This is because, as opposed the previous studies that were based on well-structured urban areas, the current study considered an entire watershed in which LULC and driving factors may be more heterogeneous and complex.Aburas et al. [44] utilized the MC-CA model with AHP and frequency ratio (FR) to simulate urban growth in Seremban (Malaysia).The models performed well in determining important factors for urban growth with similar accuracies of 88.1% and 88.2%.However, their incorporation of subjective weighting of variables renders the results difficult to replicate when using other experts to weigh the driving forces [97].Other studies applied physical and proximity drivers for LULC change simulation based on integrated models, including the ANN-CA-MCA [98], ANN-CA [60,68,86] and CA-MCA [39,88,99].
Despite the extensive efforts to improve LULC predictions, there appear to be shortcomings resulting from subjective methods of weighting the variables such as the approach used in AHP.LR shows limitations where there is insufficient knowledge about the area of interest or failure in covering all aspects and variables affecting land-use change.Ref. [100], however, implemented an LR model and achieved an accuracy of 81%.Compared to LR, ANN can be considered as an unbiased tool that is appropriate to assign weights that are derived with minimum prediction errors.As a result, it is fair to say that the ANN approach reduces inaccuracy as well as the possibility of expert bias.Ref. [68], for example, integrated an ANN with CA-MC which improved the accuracy from 86.3% using CA-MC to 90%.This implied that the integration of CA-MC with the ANN allows the model to capture the different variables and dynamics behind land transformations, which significantly improves the CA-MC model's prediction capability.
Influenced by multiple factors, the traditionally used empirical-statistical models (e.g., MC and regression models) and dynamics models (e.g., CA model, agent-based model and system dynamic models) are capable of predicting land-use change in the future for LULC, however, they fail to provide precise explanations on the impacts of the LULC-changedriving factors or variables and tend to either overestimate or underestimate the prediction of LULC changes [47].Hybrid or integrated models (e.g., MC-CA, MC-CA-ANN, LR-CA model and the conversion of land-use and its effects (CLUE) model) serve to improve LULC change prediction by combining elements of different modeling techniques have thus been suggested to improve the LULC change prediction.Nonetheless, due to the complexity of studying land-use change, especially with catchments, most of the proposed hybrid models have limitations that include insufficient knowledge about the area of interest and subjectivity in weighting the variables and they cannot model nonlinear phenomena, resulting in unreliable the prediction results [49].
RFR can capture the nonlinear relationships between factors and deal with complex patterns and changes in land use with great efficiency.This is based on its provision of nonlinearities and its ability to deal with missing or fuzzy data as well [44].Thus, RFR can detect potential interdependencies through implied driving forces.Moreover, the significance of using the RFR model is that the model illustrates the effects of each driving factor used in the simulation operation and specifies which factors affect the land change more to give a clearer understanding of the land change process.RFR acts independently regardless of the statistical data distribution or the lack of statistics for specific variables [101].The outcome from this study demonstrates the ability of RFR to train for prediction even with limited inputs and driving forces, thus allowing for the detection of potential interdependencies [49].

Case Study Assessment
From the LULC prediction results for the case study, the highest net change is observed in the decrease in natural vegetation cover, with forest cover predicted to decrease by −0.02% (LR and ANN) and by −0.68% (RFR).Shrubland/grassland are predicted to decease by −0.19% using the ANN and −8.91% using RFR.An equivalent gain is detected in bare soil having the highest increase ranging from +0.59% (from LR and ANN) to +8.79% using RFR.Cropland area is also predicted to decrease during the 10-year period by −2.8% using RFR and marginally by −0.77% according to LR and ANN predictions.From the RFR predictions of built-up area from 2019-2030, a net increase of 2.49% is determined.Compared to the magnitude of decrease in vegetation cover and cropland, it can be inferred that most of the cropland areas are likely to be converted to built-up areas, while the vegetated surface is mainly converted to bare soil.The water body is predicted to decrease marginally by −0.01%using LR and ANN models and to increase by +0.05% from RFR simulations.Conversely, it is observed that LR-and ANN-based predictions only resulted in a +0.15% increase in built-up area and a −0.77% decrease in cropland.Comparatively, the magnitudes and directions of the LULC predictions using LR and ANN are observed to be nearly equal, however, they tended to underestimate future LULC scenarios.It can also be argued that while the RFR predictions represented the likely scenarios in the future, the large magnitude of changes in 10 years, especially for vegetation cover and bare soil, could signal minor overestimation, especially given that the driving factors did not include climate and economic variables.
Judging from the LULC prediction results, it is observed that areas adjacent to the main road networks were more prone to change to urban areas.This observation is also linked to distance to commercial land use since high-intensity built-up areas are often located within the neighborhoods of the main road networks [102].The fact is that LULC change modeling requires a deep understanding of the driving factors [86], historical changes [103] and environmental predictors [104].Extensive research has modeled the effects of physical factors, including elevation, slope and aspect [86,98,105], on LULC changes.Several attempts have explored the importance of proximity factors in LULC changes.Among others, distance to roads [94,106], distance to water bodies and rivers and distance to cities [2,68] were the most important factors.Thus, understanding the changes and dynamics of LULC is challenging and requires modeling techniques utilized with spatiotemporal data [99].Likewise, the selection of the most suitable predictors which detect the nature and structure of LULC and determine the pattern of changes is essential in LULC change modeling.
To improve the current results, a more integrated model is still needed to overcome the limitations in simulating human behavior as well as policies in LULC change prediction.Thus, future studies will consider multiple-scenario simulations and predictions to appropriately address the uncertainty in land development problems relating to land-use policies [50].Furthermore, socio-economic factors that affect the urban expansion process need to be included since human behaviors and their interactions with natural and social changes also play critical roles in the dynamic process.Similarly, the inclusion of climate variables should be considered as a boundary condition for improving the prediction of natural land cover such as vegetation and water classes.Although adding more variables is preferable as it is expected to reduce errors, some of the data may not be accessible all the time.In addition, there are model limitations in which nonlinear and qualitative variables cannot be incorporated.Lastly, for a given case study focal and local transition rules should be considered, including the global transition rule for CA models, in which different areas may be subjected to different urban expansion policies and dynamics.

Insights into LULC Change and Land-Water Sustainability
In general, human activities characterized by urban development, agricultural activities and deforestation systematically leads to LULC changes, which then results in environmental changes that impacts on the earth-atmosphere interactions and sustainable development [107].The increasing population and related human activities coupled with climate change have progressively led to the decline in per capita water availability [108].The decline is estimated to have a 6-fold increase over the past century and is predicted to rise annually at a rate of 1% [109].The increase in land-use development thus has direct effects on multiple environmental aspects, which are interwoven with sustainable development [110,111].In terms of water resources, LULC changes can impact different hydrological processes and responses [112].This means that the accurate quantification of the various hydrological factors under varied catchment conditions is critical for continuous and sustainable management of water resources.
Figure 13 shows the land-cover trends and dynamics within the Gaborone dam catchment following from the LULC classification (1986-2019) and LULC prediction (2019-2030).The increase in population within the catchment, as characterized by an increase in built-up area, can be correlated to the decreases in shrub/grassland and cropland.The unprecedented increase in population results in numerous environmental and human problems, such as disturbance in economic development, poor livelihood and law and order situations, limiting the available water supply and deteriorating of its quality and contributing to climate change [113].The competing demands on land use as observed by the gains and losses in Figure 13 can also lead to conflicts in terms of demand for water supply, for example, between companies and the resident population, in the form of conversion of agricultural land to residential and/or the conversions of natural habitats for agricultural production.These conflicts resulting from different land-use demands and options have been observed to increase in most parts of the world in recent years [111,114].This means that land use can be understood as the functional dimension of land for different human demands and goals, and its continuous monitoring is necessary for sustainable planning and development.

Insights into LULC Change and Land-Water Sustainability
In general, human activities characterized by urban development, agricultural activities and deforestation systematically leads to LULC changes, which then results in environmental changes that impacts on the earth-atmosphere interactions and sustainable development [107].The increasing population and related human activities coupled with climate change have progressively led to the decline in per capita water availability [108].The decline is estimated to have a 6-fold increase over the past century and is predicted to rise annually at a rate of 1% [109].The increase in land-use development thus has direct effects on multiple environmental aspects, which are interwoven with sustainable development [110,111].In terms of water resources, LULC changes can impact different hydrological processes and responses [112].This means that the accurate quantification of the various hydrological factors under varied catchment conditions is critical for continuous and sustainable management of water resources.
Figure 13 shows the land-cover trends and dynamics within the Gaborone dam catchment following from the LULC classification (1986-2019) and LULC prediction (2019-2030).The increase in population within the catchment, as characterized by an increase in built-up area, can be correlated to the decreases in shrub/grassland and cropland.The unprecedented increase in population results in numerous environmental and human problems, such as disturbance in economic development, poor livelihood and law and order situations, limiting the available water supply and deteriorating of its quality and contributing to climate change [113].The competing demands on land use as observed by the gains and losses in Figure 13 can also lead to conflicts in terms of demand for water supply, for example, between companies and the resident population, in the form of conversion of agricultural land to residential and/or the conversions of natural habitats for agricultural production.These conflicts resulting from different land-use demands and options have been observed to increase in most parts of the world in recent years [111,114].This means that land use can be understood as the functional dimension of land for different human demands and goals, and its continuous monitoring is necessary for sustainable planning and development.LULC changes through urbanization and deforestation have direct influence on hydrological patterns such as infiltration, evaporation and runoff [115,116].Figure 14 presents the relationship between the urban growth rate and the dam water capacity from 1986 to 2020.Assuming the dam relies on the catchment for its water supply, the trends shows that the urban population, which has the highest demand for water, continues to grow while the dam water capacity is observed to have a general decline over time.The decreasing pattern in dam water surface area illustrates a trend towards stress on the dam water availability, which can also be attributed to decreasing precipitation combined with increasing consumption demands.While the analysis of impacts of land use on water LULC changes through urbanization and deforestation have direct influence on hydrological patterns such as infiltration, evaporation and runoff [115,116].Figure 14 presents the relationship between the urban growth rate and the dam water capacity from 1986 to 2020.Assuming the dam relies on the catchment for its water supply, the trends shows that the urban population, which has the highest demand for water, continues to grow while the dam water capacity is observed to have a general decline over time.The decreasing pattern in dam water surface area illustrates a trend towards stress on the dam water availability, which can also be attributed to decreasing precipitation combined with increasing consumption demands.While the analysis of impacts of land use on water resources is not the goal of this study, the identification of the general trends of LULC change and their correlations with water resources will assist the local management authorities to enable a sustainable use of the available water resources.This implies that land, water and environmental managers should consistently quantify and analyze the LULC changes (Figure 13) within the catchment and how the changes are related to or influence the dam water availability (Figure 14).resources is not the goal of this study, the identification of the general trends of LULC change and their correlations with water resources will assist the local management authorities to enable a sustainable use of the available water resources.This implies that land, water and environmental managers should consistently quantify and analyze the LULC changes (Figure 13) within the catchment and how the changes are related to or influence the dam water availability (Figure 14).In recent decades, catchments have witnessed significant transformations stemming from the interplay of socio-economic processes, demographic dynamics and climate/environmental shifts, which have reshaped the layout of sustainable development.There is therefore the need to understand the social-ecological system so as to strengthen strategies that ensure sustainable socio-economic benefits to local people, while minimizing ecosystem degradation to allow for the sustainable utilization and protection of the resource base.As such, there is need for targeted policies and instruments for addressing vulnerable regions, such as catchments.Information on land use and land management at basin scales is not only important in managing the water resource but also for water allocation for an array of interrelated sectors (e.g., water supply, agriculture, energy production, ecosystems, forest management, energy production, etc.), as well as basin environmental management and protection, the design and operation of water infrastructure and broader issues related to water resource development, planning policy and governance.Therefore, accurate information on land use and its prediction are not only important for planning and management of land-use practice but also in determining how its evolution impacts on water availability and supply.
The integrated understanding of the temporal dynamics of the two resources is important for sustainable land management (SLM) and sustainable water management (SWM).To meet the UN goals for sustainability of SLM and SWM, countries need to set specific land and water policies and targets [117] .Such policies will ensure sustainable environmental planning, monitoring, identification of possible threats and the formulation of mitigation and adaptation strategies [8,9].In addition to the UN's established international goals for land use, countries need to set specific land-use policies and targets within the scope of their relevant domestic SDG strategies [117].
By comprehensively analyzing the existing interactions between land and water, different scenarios for a new land-water nexus can be developed for the advancement of relevant guiding targets as set by UNEP, FAO, OECD and the World Energy Outlook.The nexus can enable the mapping of the current and future hot spots of available resources as well as their productivity across different sectors.From such "nexus maps", planners and managers can formulate means and ways of managing and improving resource productivity [1].Above all, suitable models are required for simulation of future patterns, In recent decades, catchments have witnessed significant transformations stemming from the interplay of socio-economic processes, demographic dynamics and climate/environmental shifts, which have reshaped the layout of sustainable development.There is therefore the need to understand the social-ecological system so as to strengthen strategies that ensure sustainable socio-economic benefits to local people, while minimizing ecosystem degradation to allow for the sustainable utilization and protection of the resource base.As such, there is need for targeted policies and instruments for addressing vulnerable regions, such as catchments.Information on land use and land management at basin scales is not only important in managing the water resource but also for water allocation for an array of interrelated sectors (e.g., water supply, agriculture, energy production, ecosystems, forest management, energy production, etc.), as well as basin environmental management and protection, the design and operation of water infrastructure and broader issues related to water resource development, planning policy and governance.Therefore, accurate information on land use and its prediction are not only important for planning and management of land-use practice but also in determining how its evolution impacts on water availability and supply.
The integrated understanding of the temporal dynamics of the two resources is important for sustainable land management (SLM) and sustainable water management (SWM).To meet the UN goals for sustainability of SLM and SWM, countries need to set specific land and water policies and targets [117].Such policies will ensure sustainable environmental planning, monitoring, identification of possible threats and the formulation of mitigation and adaptation strategies [8,9].In addition to the UN's established international goals for land use, countries need to set specific land-use policies and targets within the scope of their relevant domestic SDG strategies [117].
By comprehensively analyzing the existing interactions between land and water, different scenarios for a new land-water nexus can be developed for the advancement of relevant guiding targets as set by UNEP, FAO, OECD and the World Energy Outlook.The nexus can enable the mapping of the current and future hot spots of available resources as well as their productivity across different sectors.From such "nexus maps", planners and managers can formulate means and ways of managing and improving resource productivity [1].Above all, suitable models are required for simulation of future patterns, to predict growth rates and for accurate understanding of the cause-effect relationships of the forces driving LULC change within catchments.

Conclusions
Prediction of future LULC scenarios within hydrological units such as water catchments is important for effective land use planning, sustainable management of land and water resources, and for planning towards climate adaptation and mitigation for overall sustainable management.This paper presents results of the simulation and prediction of LULC change within the Gaborone dam catchment (Botswana) by comparing two hybrid models, LR-CA and ANN-CA, and the proposed random forest regression (RFR) machine learning model for catchment land-use change modeling.For LULC change detection within the catchment, land-use classification of Landsat TM and ETM+ was carried out from 1986-2019 using the random forest classifier.In the generation of the LULC change transition area matrix and the corresponding change transition possibility matrix, the RFR outperformed LR and the ANN, obtaining a higher percentage of accuracy in the validation of simulated LULC for the previous years with respective accuracies of 84.9%, 62.1% and 60.7%.For the prediction of the LULC change scenarios in the catchment for the 2019-2025 and 2025-2030 periods, physiographic factors (elevation, slope and aspect) and proximity-neighborhood variables (distances to water bodies, roads and urban areas) were integrated as land-use change drivers in the models.From the prediction results, RFR detected a net increase of 2.58% in built-up area over 1 years for 2019-2025 (0.95%) and 2025-2030 (1.63%), while for the same period, LR-CA and ANN-CA predicted a net increase in built-up area by nearly the same rates of 0.15% and 0.14%, respectively.RFR predictions showed bare soil cover increasing to 8.9%, which was nearly equivalent to the combined predicted decrease in tree cover and grassland at 9.5%.ANN-CA and LR-CA detected marginal decreases in vegetation cover (−0.02%) and increases in bare soil cover at 0.55%, while cropland was predicted to have a net decrease in percent cover by LR-CA (−0.76%),ANN-CA (−0.77%) and RFR (−2.84%).The study results shows that the proposed RFR is suitable for the prediction of LULC patterns in complex catchment environments as it can capture the potential interactions between the type and intensity of LULC classes and their interactions with the driving parameters.Given that improving land-use management and efficiency is a necessary step towards achieving specific SDGs (e.g., SDG 11 and SDG 15), the approach and results in this study hold significant potential for informed policy formulation and decision making for sustainable catchment monitoring and development, as it provides accurate spatial, quantitative and qualitative LULC change scenarios and their relationships with dam water availability at the catchment scale.The demonstrated relationship between built-up land use and dam water is important in addressing the cross-sectoral links between resources for increased and efficient monitoring and exploitation of resources at local scales.Overall, the findings of this study can aid environmental and urban planners and policymakers to better understand the nature and potential patterns of land-use change within the catchment and the critical factors that drive the LULC change processes.This can aid in the formulation of more appropriate and applicable policies and strategies for mitigating any impacts from the LULC transformations.In addition to the land-water nexus, it is recommended that research should be carried out on the impact assessment of LULC changes on water quality and quantity and the integration of micro-climate changes within the catchment.

Figure 1 .
Figure 1.Location map of the transboundary Limpopo River Basin (LRB), Botswana's LRB (BLRB) and the Gaborone dam catchment image from Landsat 8 data overlaid with Notwane River.

Figure 1 .
Figure 1.Location map of the transboundary Limpopo River Basin (LRB), Botswana's LRB (BLRB) and the Gaborone dam catchment image from Landsat 8 data overlaid with Notwane River.
Distance to water bodies (e) Distance to roads (f) Distance to urban areas

Figure 2 .
Figure 2. LULC change driving factors for Gaborone dam catchment.

Figure 2 .
Figure 2. LULC change driving factors for Gaborone dam catchment.

31 Figure 3 .
Figure 3. Schematic approach for LULC mapping and prediction using LR-CA, ANN-CA and RFR and land-water nexus analysis.2.3.1.LULC Mapping Using Machine Learning with Multiple Input Features

Figure 3 .
Figure 3. Schematic approach for LULC mapping and prediction using LR-CA, ANN-CA and RFR and land-water nexus analysis.

Figure 4 .
Figure 4. RFC schematic model for LULC classification with multiple input features.The RFC model is trained for each year with different LULC training data.
)) were computed and integrated in the classification database.The indices and textural features are included to enhance different land-cover classes as vegetation and bare soil and to improve the overall classification accuracy.Adopting the EU Copernicus Global Land Cover classification scheme (https://lcviewer.vito.be/2019(accessed on 10 August 2023)), the classified classes included tree cover, shrubland (shrubs and savanna), grassland, cropland, water (surface water; dams, rivers and ponds), built-up areas (buildings, roads

Figure 4 .
Figure 4. RFC schematic model for LULC classification with multiple input features.The RFC model is trained for each year with different LULC training data.

ji
denotes the weights of the input values and b j is a bias value added to the summation of all inputs.f is the function which determines how each cell in the grid will change based on the neighboring LULC classes.The determination of f is dependent on the characteristics of the data in the training and the suitability of the ANN model structure in determining the transition probability of LULC change using multiple output neurons for simulating the LULC changes within the ANN-CA structure.

Figure 5 .
Figure 5. LULC classified maps of the Gaborone dam catchment from 1986 to 2019.Figure 5. LULC classified maps of the Gaborone dam catchment from 1986 to 2019.

Figure 5 .
Figure 5. LULC classified maps of the Gaborone dam catchment from 1986 to 2019.Figure 5. LULC classified maps of the Gaborone dam catchment from 1986 to 2019.

Figure 6 .
Figure 6.Spatiotemporal variability of LULC in the Gaborone dam catchment: (a) class area coverage and (b) LULC gain and loss.

Figure 6 .
Figure 6.Spatiotemporal variability of LULC in the Gaborone dam catchment: (a) class area coverage and (b) LULC gain and loss.

Figure 10 .
Figure 10.Average LULC prediction percent correctness using LR, ANN and RFR for Gaborone dam catchment.

Figure 10 .
Figure 10.Average LULC prediction percent correctness using LR, ANN and RFR for Gaborone dam catchment.

Figure 13 .
Figure 13.Trends in LULC change in Gaborone dam catchment from 1986 to 2030.

Figure 13 .
Figure 13.Trends in LULC change in Gaborone dam catchment from 1986 to 2030.

Figure 14 .
Figure 14.Relationship between urban land-use development and water supply in the Gaborone dam catchment.

Figure 14 .
Figure 14.Relationship between urban land-use development and water supply in the Gaborone dam catchment.

Table 1 .
Description of LULC change driving factors in the Gaborone dam catchment.

Table 2 .
Overall Accuracy (OA) and Kappa coefficients for LULC classification.

Table 3 .
Transition probability matrix for Gaborone dam catchment from 1986 to 2019.
Comparison of LR-predicted LULC and classified LULC.