Climate-Based Regionalization and Inclusion of Spectral Indices for Enhancing Transboundary Land-Use/Cover Classification Using Deep Learning and Machine Learning

Accurate land use and cover data are essential for effective land-use planning, hydrological modeling, and policy development. Since the Okavango Delta is a transboundary Ramsar site, managing natural resources within the Okavango Basin is undoubtedly a complex issue. It is often difficult to accurately map land use and cover using remote sensing in heterogeneous landscapes. This study investigates the combined value of climate-based regionalization and integration of spectral bands with spectral indices to enhance the accuracy of multi-temporal land use/cover classification using deep learning and machine learning approaches. Two experiments were set up, the first entailing the integration of spectral bands with spectral indices and the second involving the combined integration of spectral indices and climate-based regionalization based on Koppen–Geiger climate zones. Landsat 5 TM and Landsat 8 OLI images, machine learning classifiers (random forest and extreme gradient boosting), and deep learning (neural network and deep neural network) classifiers were used in this study. Supervised classification using a total of 5140 samples was conducted for the years 1996, 2004, 2013, and 2020. Average overall accuracy and Kappa coefficients were used to validate the results. The study found that the integration of spectral bands with indices improves the accuracy of land use/cover classification using machine learning and deep learning. Post-feature selection combinations yield higher accuracies in comparison to combinations of bands and indices. A combined integration of spectral indices with bands and climate-based regionalization did not significantly improve the accuracy of land use/cover classification consistently for all the classifiers (p < 0.05). However, post-feature selection combinations and climate-based regionalization significantly improved the accuracy for all classifiers investigated in this study. Findings of this study will improve the reliability of land use/cover monitoring in complex heterogeneous TDBs.


Introduction
Unsustainable utilization of natural resources across drainage basins globally threatens livelihoods and biodiversity. Changes in land use (defined as the function of surface cover) and land cover (defined as the natural and artificial material covering the earth surface) due to anthropogenic activities and climate change affect the supply and distribution of ecosystem services (ES) across basins [1,2]. The situation is more complex for transboundary basins (TDBs) because they provide ES to people across different nations. The exploitation of provisioning services by different nations within TDBs is often not equal due to variations in access to resources as a result of social and ecological structures [3,4]. Furthermore, nations sharing resources could be associated with different climate and biophysical systems, which intensify variation in the availability and distribution of ES and resources [5,6]. This is further compounded by differences in legal frameworks, cultural backgrounds, public attitudes, and historical environmental management practices, all of which contribute to discordant resource utilization in TDBs [7,8]. Remote sensing is central to producing land use/cover (LULC) information for effective land-use planning, environmental monitoring, hydrological modeling, climate change mitigation, and natural resource management of drainage basins [9][10][11][12]. However, the accuracy of LULC information is often an issue due to the complexity of TDBs [13,14]. It is therefore advantageous to continuously investigate robust approaches to improve the accuracy of LULC products to enhance the monitoring of basins.
Remote sensing has been instrumental for mapping LULC change since the 1970s because of its objectivity, cost-saving, and repetitive coverage over wide spatial and temporal scales [15][16][17]. Recently, there is growing availability of freely available satellite data products and improved classification techniques. Such developments provide a good environment to explore innovative mechanisms capable of improving the accuracy of LULC products even under complex and heterogeneous landscapes, such as TDBs [18][19][20]. One of the mechanisms to enhance LULC accuracy has been the integration of spectral bands with spectral indices [21][22][23]. While spectral indices can improve LULC classification accuracy, categories of spectral indices and their contributions to LULC classification vary. The common categories include ratio-based (RBS) indices that are based on the ratio between a pair of spectral bands [19] and orthogonal spectral (OS) indices that are based on the existence of a hyperplane in spectral space in which bare soils of varying brightness will lie with vegetation, increasing along the hyperplane [24,25]. OS indices have been reported to perform better than RBS indices in previous studies [22]. While research on the integration of spectral bands with indices in heterogeneous urban landscapes reported great potential in discriminating features with improved accuracy compared to those based on spectral bands only [23], the integration of spectral bands with spectral indices generally results in large datasets (big data). This is often associated with an increase in feature dimensionality that demands high computational power. Additionally, this results in problems of imbalances between training samples and features, which causes the so-called "curse of dimensionality" [26]. Obtaining large training samples to address the imbalances is often a costly challenge. Instead, the use of feature selection techniques could help address challenges associated with data sparsity.
Feature selection involves selecting a subset of important features to reduce data dimensionality for building robust learning models [27,28]. Common feature selection techniques include selection approach by a filter, semantic groups, wrappers, and embedded methods [29,30]. Filter feature selection technique is a pre-processing step that involves selecting a subset of features independent of the learning algorithm [31], whereas semantic feature selection involves the selection of features according to their type, e.g., multispectral bands, textural, topographic, and spectral indices [32]. The wrapper approach requires one predetermined learning model that interacts with the original feature set to identify the best feature subset [27]. Previous studies recommended the use of feature selection by wrappers, mainly because they involve interaction between the learning algorithm and feature subset search, which improves predictions [33,34]. Although the wrapper approach is computationally intensive, its performance has been reported to be better than other feature selection techniques in many studies [27,35,36]. Examples of wrappers include recursive feature elimination [27], sequential feature selection [37], and genetic algorithms [38]. Random forest-based recursive feature elimination (RF-RFE) has emerged as potentially more accurate and robust than other wrapper techniques [33]. The use of RFE coupled with segmentation (regionalization) of complex transboundary study sites could enhance LULC classification accuracy.
Much literature has reported on the regionalization of heterogeneous landscapes to reduce complexity and improve LULC classification accuracy [39,40]. However, a precise strategy to delineate regional boundaries when classifying LULC, particularly for transboundary catchments, has not received much attention. Manis et al. [36] used a mixed and phased approach to regionalize the southwestern part of the United States. Their approach involved participatory collaborations with representatives from different states, photo interpretation to identify major life zones from Landsat imagery, and the use of geological parameters. They observed that geological factors controlling vegetation will not always coincide with phenology and that human evaluation is often subjective and biased, especially when dealing with large areas. In contrast, Kassawmar et al. [12] successfully established the effectiveness of regionalization of a heterogeneous landscape based on a combination of biophysical, socio-economic, and spectral factors. However, their investigation was on a local scale (Ethiopian highlands). Assessing innovative strategies for regionalization of complex, expansive, and heterogeneous TDBs is therefore important for improved LULC mapping. This can be improved by the use of robust ML and deep learning (DL) techniques.
ML and DL classifiers are popular in remote sensing due to their ability to use known data (training samples) to classify large sets of imagery and to incorporate ancillary spatial data [28,41,42]. Contrary to traditional parametric classifiers, they possess the capacity to handle input variables that are not normally distributed [43]. The performance of DL and ML classifiers have proven to be better than conventional parametric classifiers when evaluating LULC change across different landscapes [41]. Deep learning classifiers are a group of algorithms structured around the neural network architecture [42]. While common deep learning techniques include the deep neural networks (DNN), convolutional neural networks (CNN), and recurrent neural networks (RNN), common ML classifiers for LULC mapping include the random forest (RF), extreme Gradient boosting (XGBoost), k-nearest neighbor (k-NN), classification and regression trees (CART), and support vector machine (SVM). Most studies have used ML and DL classifiers for LULC because of their robustness [44][45][46]. However, there has been a recent surge of studies that have reported DL classifiers to be superior. The performance of DL and ML classifiers vary with the complexity of the landscape, time of analysis, and type of spectral data [47]. For example, Abdi [44]compared the performances of RF, XGBoost, SVM, and DNN classifiers for land use/cover classification using the Sentinel-2 multispectral imagery data. They found that SVM yielded a high overall accuracy (OA), followed by XGBoost, RF, and DNN. On the other hand, Li et al. [46] evaluated the performance of DNN, RF, SVM, and artificial neural networks (ANN) for continent-wide landcover mapping. They established that the DNN performed better than other classifiers (RF, SVM, ANN, MLC), with OA of about 78.99, 76.03, 77.74, and 77.86, respectively. Although there is overwhelming evidence that the performance of ML and DL classifiers vary with landscape conditions, studies acknowledge that the potential to fully utilize remote sensing as a reliable source on LULC change is yet to be realized. The calibration of ML and DL classifiers using combinations of spectral bands and spectral indices in regionalized study sites could yield better LULC results.
To date, there exists a paucity of literature on the combined significance of the inclusion of spectral indices and climate-based regionalization in enhancing LULC classification accuracy. Most studies that included spectral indices when mapping LULC simply incorporated them [48][49][50]. However, there are presently few studies that conducted rigorous selection of the best combinations of spectral bands and indices prior to LULC classification. The value of Koppen Geiger's climate zones for improving the accuracy of LULC classification is also missing from the literature. Yet the pressures, limitations, and priorities that most natural resource managers face heavily rely on the availability of more reliable LULC information. Addressing these gaps could facilitate reliable and sustainable natural resource monitoring in complex and heterogeneous landscapes such as TDBs.
This study aimed to evaluate the significance of integrating spectral bands with indices and climate-based regionalization on the accuracy of LULC based on ML and DL classifiers. The specific objectives of this study were, firstly, to assess the value of integrating spectral bands with spectral indices in relation to the accuracy of land cover classification using ML and DL classifiers, secondly, to investigate the value of climate-based regionalization to improve the accuracy of LULC classification within the Okavango Basin, and thirdly, to assess the performance of ML and DL classifiers in climate-based regionalization and inclusion of spectral indices. The setting of the study is a complex heterogeneous transboundary basin, namely the Okavango Basin. The results of the study are interpreted in the context of streamlining a robust LULC under a complex transboundary environment. The methodology investigated here is envisaged to inform a firm ground for regular production of LULC products for modeling the impact of landscape change on the supply and distribution of natural resources within the Okavango Basin. State of the art ML and DL classifiers that are known to be robust in LULC classification are used in this study.

Study Site
The study is conducted in the Okavango drainage basin. The Okavango Basin is a unique endorheic (internally draining) transboundary drainage basin (TDB) that covers three countries, namely Angola, Namibia, and Botswana ( Figure 1). The area covers 224,894.64 km 2 and consists of three different climate systems; it is semi-arid in the southern part, monsoon in the central, and tropical in the northern part [39]. The average annual rainfall amount and distribution varies with the climate zone. However, it generally ranges between 500 to 1400 mm. High rainfall amounts occur in the northern zone, which falls within the subtropical highland zone (Cwb), and gradually decrease southwards, with low rainfall amount received in the southern part, which falls in the semi-arid zone (Bsh). As with rainfall, the average temperature varies widely as a factor of variation in topography and seasonality of each climate zone, and the annual average temperature is 20 °C [51,52]. The area is rich in biodiversity of flora and fauna, which varies in distribution as a factor of land use type [50]. The major river in this landscape is the Okavango River, which flows from the Angolan highlands through Namibia and disappears in the Kalahari Desert of Botswana, forming the pan-like shape of the acclaimed Okavango delta [53].
The Okavango Basin has experienced wars, droughts, floods, changes in land tenure, and harmful land-use practices (for example illegal logging, overgrazing, and intensive tillage) during the period between 1970 and 2020 [54][55][56]. The Angolan Civil War ended in August 2002, leading to a post-war rebound in population and anthropogenic activities [57].

Methods
The methods section provides a comprehensive description of procedures that were used in this study. Image acquisition and processing is described first (Section 2.2.1), followed by procedures used in processing spectral features for this analysis (Section 2.2.2). Descriptions of the collection of training and validation samples follow thereafter (Section 2.2.3), and the experimental design of the study is given in Section 2.2.4. The experimental design entails the integration of bands with spectral indices, the value of post-feature selection, and the regionalization of the study site based on Koppen zones to enhance LULC accuracy. These aspects were tested using state of the art ML and DL approaches.

Satellite Image Acquisition and Processing
Landsat 5 and Landsat 8 OLI images were used in this study. The images were sourced from the Google Earth Engine (GEE) platform and were pre-processed to Tier 1 surface reflectance. All the images were captured during the month of June for the years 1996, 2002, 2013, and 2020 (See details in Supplementary Materials Table S1). The temporal period was chosen for two main reasons: (1) to capture changes in LULC of the period during and after the Angolan civil war, and (2) the availability of cloud-free images. The month of June was chosen because of the consistent availability of a complete set of image tiles for the entire study area in the same month. Previous studies recommend the use of Sentinel 2 images over Landsat images [58]; however, this investigation used Landsat images since the chosen study period did not match the availability of Sentinel 2 images. The images were already geometrically and atmospherically corrected. Cloud masking is performed using the CFMASK algorithm to mask clouds and cloud shadows in GEE [54]. To minimize variations in the temporal and spatial information of the images before LULC classification, per band median composites for several images corresponding to different days in the month of June of each year were produced following previous studies [59][60][61]. Median composite images were exported from GEE for further analysis.

Spectral Features
The spectral features used in the analysis comprise a combination of seven bands (namely visible, near-infrared, short wave infrared, and thermal infrared), as well as RBS (8) and OS (4) indices. For Landsat 5 bands 1, 2, 3, 4, 5, 6, and 7 were used, while bands 2, 3, 4, 5, 6, 7 and 10 were considered for Landsat 8. While the thermal bands were originally captured with a resolution of 120 m and 100 m for Landsat 5 and 8, respectively, thermal bands available on GEE's Tier 1 surface reflectance collections are readily resampled using cubic convolution to a 30 m resolution. RBS indices were produced from ratios between pairs of spectral bands and OS indices were produced from the use of transformation coefficients on multiple spectral bands. Using pre-processed image composites, eight RBS indices and three OS indices were calculated in GEE. Although the literature is rich with spectral indices that aid in the discrimination of LULC classes, in this study, the commonly used RBS and OS indices were used to test their performances in enhancing LULC accuracy. The RBS and OS indices used in this study are given in Tables 1 and 2, respectively. According to Sturari et al. [62], the inclusion of an elevation layer in LULC analysis minimizes the effects of topographic heterogeneity, hence, the Shuttle Radar Topographic Mission (SRTM) Digital Elevation Model (DEM) with a 30 m resolution was included in this analysis [63]. All spectral features were exported from GEE for further analysis. Table 1. Ratio based spectral indices used in this study.

Name of Spectral Indices
Formulae References

Training and Validation Samples
Sample points include ground points sourced from organizations working in the Okavango basin and additional points generated through visual analysis of very-highresolution images. Ground points were sourced from the Okavango River Basin Water Commission (OKACOM) geodatabase and the National Geographic Okavango and Wilderness Project (NGOWP). OKACOM was established by the riparian states of Angola, Botswana, and Namibia to jointly manage the water resources of the Cubango-Okavango River Basin. The commission contracted GIS specialists to conduct social and hydrological surveys in the basin. The consultants used a random sampling technique to collect locations of settlements, water bodies, woodlands, shrubland, and wetlands. These were used to create samples for built-up, wetland, woodland, shrubland, and water classes. NGWOP conducted surveys to explore the least known and the most accessible areas in the basin. During their surveys, they navigated along river lines (as transects), taking geotagged images of riverine vegetation. Their data helped to generate samples for water, grasslands, woodlands, and wetlands. The total number of samples from these sources is 3420.
Additional samples were collected using Google Earth by visual interpretation of available Landsat and other high-resolution satellite imagery [71]. The stratified random sampling based on the 2009 GLOBCOVER (as a stratum) was used to generate additional samples [72]. To minimize sample imbalance per class, the minimum fifty sample rule per class as advocated by Foody and Mathur [73] was adopted in this study. To avoid the inclusion of points falling on areas that would have changed during the temporal period, training data was overlaid on high-resolution imagery in Google Earth Pro and the time slider was used to visually assess for consistency. Points that fell on areas with inconsistent LULC were not included in the analysis. A total of 5140 samples were generated for eight LULC classes, namely bare land, built-up land, bushland, forest/woodland, grassland, cultivated land, water, and wetland, as summarized in Table 3. The LULC classes are based on the Food and Agricultural Organization (FAO) Landcover Classification System (LCCS) [74]. Table 3. A summary of the number of training and validation samples that were used in this study.

Experimental Design
The study was designed to assess the value of integrating spectral bands with spectral indices and the significance of climate-based regionalization on the accuracy of LULC classification in a complex heterogeneous landscape. Two experiments were set. The first experiment investigated the value of the inclusion of spectral indices and feature selection, and the second assessed the value of climate-based regionalization on the accuracy of LULC classification in the Okavango Basin The study was conducted using R-statistics. The workflow of the study design is depicted in Figure 3. The following sections give a detailed description of the methods.

Inclusion of Spectral Indices and Feature Selection
The analysis was run using combinations of seven spectral bands, eight RBS indices, and three OS indices. The first analysis was run using spectral bands only, the second using a combination of all spectral bands and spectral indices, and the third using a combination of spectral bands and spectral indices following a feature selection. Random forest-based recursive feature elimination (RF-RFE) was used for feature selection following recommendations by [33]. RF-RFE uses the provided input features and the random forest classifier to select the best combination of features based on feature importance [28,46]. To make the best combination of features, the RF-RFE iterates over various feature combinations through a repeated 10-cross validation and eliminates the least important features until the most parsimonious model is identified [75]. In this study, the RF-RFE is tuned to repeatedly iterate 30 times over 20 features (bands and spectral indices) based on the caret package in R statistical software.

Climate Based Study Area Regionalization
To evaluate the impact of climate-based regionalization on the accuracy of classification methods, the analysis was conducted first using the whole study area and thereafter based on the Koppen climate regions. Studies have reported that the performance of classifiers varies with space and time [76,77]. The Okavango Basin consists of the following climate zones, Cwa, Cwb, and Bsh Koppen zones ( Figure 4).

LULC Classification Using Deep Learning and Machine Learning
Non-parametric DL and ML classifiers were used in this study. Two state-of-the-art ML classifiers, namely RF and XGBoost, were implemented using the caret package, and DL classifiers, namely neural network (Nnet) and DNN, were implemented using the caret and H20 packages in R statistics, respectively.

Machine Learning Classifiers
The RF classifier uses tree bagging to form an ensemble of trees by searching random subspaces in the given features and then splitting the nodes by minimizing the correlation between the trees [78]. The RF classifier has been widely used in landcover mapping [79], [80,81]. Previous studies reported varying performance of the RF in different landscapes [82]. However, most studies claim that it is robust to overfitting and produces better accuracies with high efficiency when working with high dimensional data [78]. The major input parameters of the RF are the number of trees at each split (ntree) and the number of variables randomly sampled as candidates at each split (mtry).
The Xgboost algorithm is based on the boosting ensemble technique [76]. It builds on decision trees of weak learners that are combined into strong learners through an iterative process of learning from an ensemble of trees built on subsets of data. The models are weighted based on their performance and the ensemble model is built based on the weighted sum of the base layers [77]. Xgboost has been used in LULC mapping of numerous studies, in which it has outperformed the benchmark ML classifiers such as the RF and SVM [79,83]. The major parameters of the Xgboost include the maximum number of iterations (nround), maximum depth of a tree (max_depth), learning rate (eta), the minimum relative improvement in squared error reduction for a split to happen (gamma), and the minimum number of rows to assign to the terminal nodes (nodesize).

Deep Learning Classifiers
DL classifiers are based on neural networks [80]. They are biologically inspired algorithms that make predictions using a concept similar to an animal brain and its interconnections [84,85]. The basic structure of a neural network is a network of input layers that are connected to the output layer through hidden layers. This network of layers is responsible for transforming input data to output data with the help of activations and parameters. The weights on the nodes of each connection modify values at each neuron to determine how the input values are translated to output values. Neural networks require tuning, in which the tuning parameters and the number of layers involved make up the different types of DL models. In this study, the neural network (Nnet) algorithm and DNN are used.
The Nnet mimics a feed-forward neural network that uses the backpropagation algorithm for training coupled with one hidden layer [86]. To calibrate the Nnet algorithm, there are three important parameters required, namely the size, decay, and maxit, which control the number of neurons in the hidden layer, the weight decay, and the maximum number of iterations, respectively. Unlike the Nnet, the DNN uses a multi-layered feedforward neural network which comprises more than three hidden layers [44]. Ideally, increasing the number of hidden layers and neurons increases the potential to make predictions in complex situations [87]. The key parameters for DNN include the activation function (activation), number of hidden layers (hidden), size of each hidden layer (number of neurons per hidden layer), and the number of times to iterate (epoch). DL classifiers have shown good results in previous landcover studies [88,89].

Parameter Tuning of DL and ML Classifiers
In this study, parameter tuning (hyper parameterization) was performed to select the optimal parameters for each classifier. A repeated k-fold cross-validation technique was used based on a randomized sampling of hyperparameters reported in previous studies [90,91]. The optimization procedure was followed based on descriptions provided by Abdi [44]. The RF, Xgboost and Nnet were run using the caret package [75] and the DNN was run using the H2O package [85] in the R statistical software environment version 3.4.2 [92]. All computations were run on a Windows machine with 16 GB RAM and a Core i7 CPU@ 2.40 GHz made by Dell in China. Table 4 summarizes the optimal parameters that were determined for each classifier. For classification, sample points were randomly split 50 times into training (70%) and validation (30%) [93,94]. The classification was first run based on the whole study area and thereafter on each Koppen climate region. The classifications were separately run using three different combinations of bands and spectral indices as inputs per study site for the years 1996, 2004, 2013, and 2020. Average accuracy measures (overall accuracy and Kappa) were calculated from the model runs for each year. The results were then summarized for each classifier.

Accuracy Assessments and Validation
From each model run, 30% of the samples were randomly chosen from the overall sample set for validation. The validation samples were used to calculate overall accuracy (OA) and the Kappa statistic (Kappa) evaluations metrices. The OA ranges from 0 to 100, where 0 represents the lowest accuracy and 100 the highest, whereas Kappa ranges from 0 to 1, with 0 representing the lowest accuracy and 1 the highest accuracy [89]. Average OA and Kappa were calculated from model results of the four time-steps, namely 1996, 2004, 2013, and 2020. This was repeated for different combinations of spectral bands and indices under different study sites. The classification outputs were visually validated with high-resolution images (See supplementary Figure S1). The proportions test (χ2 test) described by Agresti [90] was implemented in R software. The χ2 test was used to analyze the statistical difference in the performance of each classifier based on different combinations of spectral bands and spectral indices. A p < 0.05 was used as the critical level of significance.

Integration of Spectral Indices to Spectral Bands
The results for the inclusion of spectral indices are given in Figure 5. Integrating spectral indices to bands increases the accuracy of LULC classification. Feature selection from combinations of bands and spectral indices further improves accuracy.
DNN recorded the highest OA, followed by Xgboost, RF, and Nnet ( Figure 5). The average OA for inclusion of all spectral indices was 76.80, 84.35, 88.02, and 89.32 for Nnet, RF, Xgboost, and DNN, respectively. Regarding combinations of spectral bands and indices determined by feature selection (post-feature selection), the average OA was 81.24, 87.40, 90.12, and 91.68 for the Nnet, RF, Xgboost, and DNN, respectively. Improvement in accuracy based on post-feature selection combinations was significant for the DNN, RF, and Xgboost (p < 0.05); however, it was not significant for the Nnet. OA based on the postfeature selection combinations was not significantly different from that based on combinations of bands and all spectral indices (p > 0.05) for all the classifiers (see Supplementary  Table S2).
Overall, the highest OA of 91.68 (Kappa = 0.90) was from the DNN based on the postfeature selection combination, which comprises an average of 13 features (See Supplementary Table S6). The lowest OA of 70.65 (Kappa = 0.68) was also for the DNN based on the bands only combination. Overall, integration of bands and spectral indices appear to improve the accuracy of LULC classification in the unregionalized study site and the improvement further increases when implementing a feature selection. Although the integration with spectral indices and feature selection improves the accuracy, the improvement was not significant for all the classifiers. DL classifiers (DNN) generally yielded higher classification accuracies in comparison to ML classifiers (RF and Xgboost).

Bsh-Hot Semi-Arid Zone
Results for climate-based regionalization based on Koppen-Geiger climate zones are depicted in Figures 6-8.
For the Bsh hot semi-arid zone, the inclusion of spectral indices increased OA values to 85.55, 87.32, 90.15, and 94.12 for Nnet, RF, Xgboost and DNN, respectively (Figure 7). The kappa values were 0.84, 0.86, 0.88, and 0.92 for Nnet, RF, Xgboost, and DNN, respectively. DNN recorded the highest OA, followed by Xgboost, RF, and Nnet. Integration of bands with all spectral indices improved landcover classification significantly for the DNN classifier in the Bsh climate zone (p < 0.05). However, the OA did not improve significantly for the RF, Xgboost, and Nnet (p > 0.05). It can be seen that DL classifiers (DNN) benefited more from the inclusion of all spectral indices (improvement in OA > 20) compared to ML classifiers (Xgboost and RF); however, DL classifiers did not benefit as much from feature selection as ML classifiers did.
Concerning combinations of spectral bands and indices following a feature selection  Supplementary Table S7). Although post-feature selection combinations significantly improved accuracy for all the classifiers (p < 0.05), their OA is not significantly different from those using combinations of bands and all spectral indices (p > 0.05).   In general, a combined inclusion of spectral indices and regionalization using climatic zones increase the accuracy of LULC classification using DL and ML classifiers. Climate-based regionalization and inclusion of spectral indices with bands did not statistically improve the accuracy of LULC consistently for all the ML and DL classifiers investigated in this study. However, climate-based regionalization and incorporation of postfeature selection combinations significantly improved the accuracy of LULC consistently for all the DL and ML classifiers used in this study. However, it was observed that the OA based on incorporating bands and all spectral indices was not significantly different from that based on the incorporation of post-feature selection combinations. This suggests that, although there are benefits of feature selection to accuracy when compared to mere incorporation of all features, the margin of benefit is slight. However, it appears that incorporating an average of about 15 features (based on feature selection) is sufficient to dramatically improve the accuracy of LULC more than what is achieved with a set of 20 features (bands and all indices). Overall, DNN consistently outperformed Xgboost, RF, and Nnet.

Discussion
Results show that climate-based regionalization and integration of spectral bands with indices improve the accuracy of LULC in the Okavango Basin. Additionally, incorporation of feature selection to combinations of bands and indices further enhances the accuracy. Statistical results show that the integration of indices and feature selection for unregionalized study areas does not significantly improve the performance of all the classifiers consistently. However, regionalization based on Koppen zones significantly improves LULC accuracy for all the classifiers. This is attributable to a reduction in spatiotemporal variability as a result of climate regionalization. This observation is in line with previous studies, which reported that spatiotemporal variability due to atmospheric conditions, soil moisture, sun elevation, view angle, and topography causes the similarity of spectral signatures of landcover types that are spectrally different [95,96] Regionalization creates zones that have uniform ecological and spectral characteristics, thereby controlling the sensitivity of spectral signatures to LULC variation in heterogeneous landscapes and enhancing the efficiency of classifiers [12]. Noormets [97] and Richardson et al. [93], concurred that climate influences the phenology of vegetation cover and other ecosystem processes by driving the seasonality of albedo, surface roughness length, canopy conductance, and fluxes of water. Since optical remote sensing depends on reflectance, which is mainly influenced by phenology, regionalization based on climate systems could enhance the distinguishing of LULC classes [12,98]. While regionalization based on climate zones shows great potential in enhancing accuracy, intra-climate zone heterogeneity could potentially compromise LULC accuracy. In this study, spectral indices and feature selection in Koppen-Geiger regionalized study sites were integrated to further refine the LULC classification. The effects of including spectral indices to enhance accuracy when mapping LULC have been reported before. Mushore et al. [23] evaluated the inclusion of RBS indices when using the SVM on an urban landscape and found that the inclusion of indices yielded an increase in OA from 82.65 to 89.33. Results for the current study yielded higher accuracy values as compared to previous studies. OA values for the DNN based on bands and all spectral indices in this study improved from 74.24 to 96.04. This could be due to variation in the classifiers used, the number of training samples, and the scale at which the study was conducted. Mushore et al. [23] used 840 training samples on a study site with an extent of 94,000 ha. In this study, 5140 samples were used and the extent of the study area was 22,489,464 ha based on the Nnet, RF, Xgboost, and DNN classifiers. Notwithstanding that, their findings concurred with our findings that spectral indices improve the definition of LULC classes by enhancing the separability of one class against another. That said, Zeng et al. [99] assert that when spectral indices are used together with spectral bands in LULC classification, they have little effect on accuracy, as the addition of spectral indices only yielded a slight increase in OA, from 76.43 to 76.55. Their study only used RBS indices without incorporating feature selection on an unregionalized local study site using only the RF classifier. In this study, RF yielded higher accuracies, with OA ranging between 72.62 and 87.75. The strength of this study rests in that it incorporated regionalization, OS indices, and feature selection on a highly heterogeneous transboundary basin. Although previous studies have separately reported the effect of regionalization and incorporation of spectral indices in enhancing LULC maps [53,100], the combined effect of regionalization based on climate zones, the inclusion of spectral indices, and feature selection in a heterogeneous TDB is novel. This study is amongst the first to provide evidence for the combined influence of climate-based regionalization and integration of spectral bands with indices to enhance the accuracy of LULC maps in a TDB setting.
This research reveals that incorporation of feature selection improves accuracy more than the mere inclusion of all indices in all study sites and classifiers investigated. This finding agrees with previous studies, which reported that using feature selection has a crucial impact on the improvement of LULC accuracy [101,102]. Unlike simply including all features, feature selection helps to remove redundant features that tend to introduce the problems of dimensionality and uncertainty in model performance [103,104]. Feature selection creates parsimonious models that have high predictive power even with fewer variables. Although feature selection improves OA, in this study, OA of the post-feature selection combinations was not statistically significant from that of bands and all indices. This finding is in line with that of Georganos et al. [83], who observed that the increase in accuracy (OA from 77.7 to 78.9) based on feature selection (using the RFE) for the RF classifier was not significant. This is likely a result of RF having high predictive accuracy and effectively deducing patterns from data owing to its bagging and random algorithms [102,105,106]. Our findings for Xgboost are, however, contrary to those of Georganos et al. [83], who reported a significant increase in accuracy following the incorporation of postfeature selection combinations. This could be because of the variation in the number and type of features that they incorporated in their analysis as compared to those of the current study. Their study used a total of 169 features that include band descriptive statistics, gray level co-occurrence matrix (GLCM) for each spectral band, object compactness, perimeter, area, fractal dimension, and spectral indices as the initial input to the classification, whereas the current study only used 20 features of raw bands and spectral indices. A wide array of features could provide a large pool from which feature selection algorithms can draw, resulting in a larger improvement in accuracy. Although their study observed significant improvements in accuracy, their peak OA (79.8) was achieved using more features (23). The current study used fewer features (15), but achieved better peak OA (95.27).The strength of this study lies in the incorporation of combinations of RBS and OS indices together with climate-based regionalization, which aids in distinguishing land cover classes [21].
This study determined that DL classifiers (DNN) consistently yield more accurate results than ML classifiers (Xgboost, RF). Furthermore, DNN outperformed its fellow DL classifier (Nnet). This is likely as a result of the DNN having an effective way of deducing patterns from data owing to its structured neural networks, which allow the hierarchical flow of information from the input layers to output layers through hidden layers and activation functions. This structure, coupled with the backpropagation of errors, allows refinement of predictions before they are finally incorporated into the output [100]. Unlike the Nnet classifier, DNN has an increased number of hidden layers and activation functions that boosts model approximations through iterating various learning parameter values when making predictions [102]. Unlike ML classifiers (Xgboost, RF), the DNN requires intensive parameterization, which refines its performance [103]. The results of this study are in line with those of Li et al. [46], who reported DNN to be superior in LULC accuracy when compared to other ML classifiers (RF, SVM, and MLC). Abdi [44], however, reported that the DNN was outperformed by the Xgboost and RF when evaluating the LULC of the boreal landscape based on Sentinel 2 imagery. Abdi [44] tuned the DNN using the tanh activator, which easily saturates and slows the DNN when input values are large [104]. This study has strength in that it used the rectifier activation, which can hold incremental gradient descent because of its non-saturating function, thereby yielding higher accuracies even with complex data [105].
With regards to ML classifiers, results of this study show the superiority of Xgboost over the RF classifier. This is confirmed by previous literature, which reported that Xgboost outperformed the RF classifier [107,108]. According to Georganoes et al. [107], Xgboost performed 5% better than RF. Saini and Ghosh [108] reported that Xgboost outperformed RF by 1%. The current study observed that Xgboost performed better in OA by 4%. Although the level of difference in accuracy varies, most studies attribute the improved performance of Xgboost to its use of a large number of tuning parameters as compared to that of RF [108]. This is further enhanced by its use of an ensemble of trees, which incorporate weaker learners to make final predictions [76]. Despite the fact that the findings of this study are in line with previous work, the maximum accuracy values reported for Xgboost in this study are better than those of previous studies (OA = 95.27). The results of the present work demonstrate the superiority and strength of climate-based regionalization coupled with rigorous integration of bands and indices in complex landscapes.

Conclusions
This study aimed to evaluate the significance of the integration of spectral indices with bands and of climate-based regionalization for enhance the accuracy of LULC based on Landsat imagery using ML and DL. The specific objectives of this study were, firstly, to assess the value of integrating spectral bands with spectral indices on the accuracy of land cover classification using ML and DL classifiers, secondly, to investigate the value of climate-based regionalization to improve the accuracy of LULC classification within the Okavango Basin, and thirdly, to assess the performance of ML and DL classifiers for climate-based regionalization and inclusion of spectral indices. Inclusion of all indices and combinations of post-feature selection were separately used for each analysis to assess their performance in enhancing LULC mapping in a TDB. The DNN, Xgboost, Nnet, and RF classifiers were used for LULC classification. Combinations of bands and all indices and post-feature selection were then separately implemented for each climate zone to assess the value of climate-based regionalization on LULC classification. The results show that: (1) Inclusion of spectral indices improves the accuracy of LULC mapping for both ML and DL (with increase in OA > 5%); (2) Conducting a feature selection when evaluating LULC classification further improves accuracy as compared to mere inclusion of all spectral indices (with increase in OA > 10%); however, the increase was not consistently significant for all the classifiers; (3) Combined incorporation of post-feature selection combinations and climate-based regionalization significantly improves LULC accuracy based on all DL and ML classifiers (p < 0.05); (4) DL classifiers performed better than ML classifiers in all study sites and combinations of bands and spectral indices.
Based on findings from this study, it is concluded that spectral indices, feature selection, and climate-based regionalization result in statistically more reliable results. Those who wish to perform multitemporal analysis in heterogeneous landscapes using ML and DL classifiers should also consider combining climate-based regionalization, the inclusion of spectral indices, and feature selection based on RF-RFE. Additionally, preference should be given to DL classifiers when analyzing LULC in complex environments.
Although this study answered an important question on the integration of spectral indices and the influence of climate-based regionalization on the accuracy of LULC classification in a heterogeneous transboundary basin, there are some limitations associated with it. The study could not test the effect of climate-based regionalization on other climate zones outside of the Okavango Basin or the role of socioeconomic variables in delineating regions. Future research should test the effect of climate-based regionalization on other TDBs that have a different mix of climate zones using combinations, while at the same time incorporating socioeconomic factors for the delineation of regions. Pixel-based supervised classification based on the ML and DL classifiers was used in this study; further studies should investigate the use of object-based and unsupervised classification. The use of higher-resolution images and fusing data from different sensors is also advocated for future investigations.

Supplementary Materials:
The following are available online at www.mdpi.com/article/10.3390/rs13245054/s1, Figure S1: Land use/cover classification outputs during the years 1996, 2004, 2013, and 2020 together with corresponding very-high-resolution satellite images for validation. Classification results are derived from the top performing results; Table S1: Details of sources and acquisition dates of satellite images used in this study; Table S2: P-values of proportional test (X 2) results for pairwise comparison of the performance of different combinations of spectral bands with spectral indices and post-feature selection combinations based on the unregionalized study site; Table S3: p-values of proportional test (X 2 ) results for pairwise comparison of the performance of different combinations of spectral bands with spectral indices and post-feature selection combinations based on the Bsh Koppen climate zone; Table S4: P-values of proportional test (X 2 ) results for pairwise comparison of the performance of different combinations of spectral bands with spectral indices and post-feature selection combinations based on the Cwa Koppen climate zone; Table  S5: p-values of proportional test (X 2) results for pairwise comparison of the performance of different combinations of spectral bands with spectral indices and post-feature selection combinations based on the Cwb Koppen climate zone; Table S6: List of important features determined for the different study years using the random forest-based recursive feature elimination (RF-RFE) technique in an unregionalized study site; Table S7: List of important features determined for the different study years using the random forest-based recursive feature elimination (RF-RFE) technique in the Bsh Koppen climate zone; Table S8: List of important features determined for the different study years using the random forest-based recursive feature elimination (RF-RFE) technique in the Cwa Koppen climate zone; Table S9: List of important features determined for the different study years using the random forest-based recursive feature elimination (RF-RFE) technique in the Cwb Koppen climate zone. Data Availability Statement: Training data which was used in this study are available upon request from the authors.

Acknowledgments:
We thank USAID Resilient Waters for funding this project, implemented under prime contract number 72067418C00007. Our appreciation goes to the OKACOM and Okavango Wildland Trust for supplying us with some ground data which was used in this study. We express our gratitude to Sally Kuiper for editing the manuscript for English. We also thank the Stellenbosch University Language Centre for reviewing the language and flow of the manuscript.

Conflicts of Interest:
The authors declare that there exists no competing financial interests or personal relationship that could appear to influence the work reported in this study.