Probabilistic Mapping and Spatial Pattern Analysis of Grazing Lawns in Southern African Savannahs Using WorldView-3 Imagery and Machine Learning Techniques

: Savannah grazing lawns are a key food resource for large herbivores such as blue wildebeest ( Connochaetes taurinus ), hippopotamus ( Hippopotamus amphibius ) and white rhino ( Ceratotherium simum ), and impact herbivore densities, movement and recruitment rates. They also exert a strong inﬂuence on ﬁre behaviour including frequency, intensity and spread. Thus, variation in grazing lawn cover can have a profound impact on broader savannah ecosystem dynamics. However, savannah conditions (Satara) compared to the mesic savannah landscape (Lower Sabie). Additionally, the results show strong negative correlation between grazing lawn spatial structure (fractional cover, patch size and connectivity) and distance from waterbodies, with larger and contiguous grazing lawn patches occurring in close proximity to waterbodies in both landscapes. The proposed machine learning approach provides a novel and robust workﬂow for accurate and consistent landscape-scale monitoring of grazing lawns, while our ﬁndings and research outputs provide timely information critical for understanding habitat heterogeneity in southern African savannahs.

for broader ecosystem dynamics. Against this backdrop, we seek to develop a robust machine learning framework for mapping grazing lawns in southern African savannahs by (i) parameterising and assessing the quality of Random Forest (RF), Support VectorMachines (SVM), Multilayer Perceptron (MLP) and Classification and Regression Trees (CART) models for savannah land cover classification in a localised context, and (ii) comparing model performance for probabilistic mapping of grazing lawns on a wider scale. Additionally, we analyse spatial patterns in grazing lawn distribution along a gradient of proximity to water bodies, which has been hypothesised to influence grassland spatial structure [10,36].

Study Area
Kruger National Park (KNP) lies between 30 • 53 18"E, 22 • 19 40"S and 32 • 01 59"E, 25 • 31 44"S in South Africa. The park spans approximately 20,000 km 2 , extending about 360 km from north to south. The significant diversity in the landscape is expressed in its climate, soil, flora and fauna, making it a globally important site for ecological studies. The climate is subtropical with maximum mean annual rainfall between 500 mm and 700 mm in the northern and southern parts of the park, respectively [37]. Geologically, KNP is divided into granitic soils to the west and basaltic soils to the east, which are separated by a narrow band of shale from the south to the mid portion of the park and rhyolite on the eastern extreme [38]. This, coupled with spatial and temporal rainfall gradients, as well as disturbance events, exert enormous influence on vegetation type distribution across the landscape [6]. More open, productive grasslands occur on the basalt, while denser bushland savannah occupy the granite. Mopane (Colophospermum mopane), red bushwillow (Combretum apiculatum) and silver clusterleaf (Terminalia sericea) constitute some of the dominant vegetation types in the northern half of the park [37]. The open grasslands of the eastern plain are dominated by species like blue buffalo grass (Cenchrus ciliaris), red grass (Themeda triandra), stinking grass (Bothriochloa radicans) and finger grass (Digitaria eriantha), dotted with knob-thorn acacia (Acacia nigrescens) and marula trees (Sclerocarya birrea) [39]. Mixed broadleaf woodlands of bushwillow (Combretum sp) with corridors of grassland cover the central-western part of KNP, while thorn thickets (e.g., Acacia robusta), silver clusterleaf (Terminalia sericea) and sour grasses (Hyparrhenia filipendula) form a dominant part of the higher rainfall southern landscape [39]. Alongside variations in abiotic factors which influences vegetation type distribution within the KNP landscape, the presence of a great diversity of herbivores exert significant impact on vegetation structure. For example, high population density of the African elephant (Loxodonta africana) has been suggested to be the major driver of woody vegetation change in KNP [17,40]. The dominant grass consumers (with >50% of grass in diet) includes impala (Aepyceros melampus), blue wildebeest (Connochaetes taurinus), zebra (Equus quagga), buffalo (Syncerus caffer) and white rhino (Ceratotherium simum) [41].
Management of KNP is generally focused on maintaining habitat heterogeneity through adaptive fire management regimes [38] alongside natural fire events. Natural burns vary in frequency and intensity depending on rainfall patterns and the prevalence of high grass biomass [37,42]. The consequence of varying fire regimes is the different spatial configurations of grass productivity and biomass accumulation [42]. For example, the distribution of short grass grazing lawns whose persistence depends on positive feedback loops associated with frequent grazing has been observed to be highly influenced by variation in burn size and frequency [11,12].
Two study sites located within the Satara and the Lower Sabie regions of the park (Figure 1), each covering 5.7 × 5.7 km, and extending over a range of habitat conditions including rainfall, geology and vegetation type were used. The Satara site is a well-studied grazing system close to the latitudinal center of KNP, and covers both granitic and basaltic soil types interspersed with a strip of ecca shales. The landscape is semiarid with mean annual rainfall of 400-500 mm. The granite areas to the west are generally more wooded and undulating than the flat and more open and grassy basaltic plains.
In contrast, the Lower Sabie site falls under mesic landscape conditions with mean annual rainfall of 600-700 mm. The area has an underlying granite geology and encompasses portions of the Sabie River catchment. A number of sodic sites are also present within the Lower Sabie study site. Sodic sites typically occur at footslopes of catenas and are known to have high soil and vegetation sodium content which concentrates grazers and aids the formation and maintenance of continuous grazing lawn patches [10].

Land Cover and Classification Scheme
The study sites were selected to exclude as much anthropogenic influence as possible due to the natural ecological focus of our study. Thus, natural and semi-natural land cover features such as vegetation patches, bare soil surfaces and waterbodies dominate the selected study sites, with the only artificial surfaces being roads and isolated structures which serve as rest stops and picnic sites for tourists. Four plant functional types (PFTs) were identified in order to distinguish grazing lawns from other vegetation types (Figure 2). These included evergreen woody components, deciduous woody components, bunch grasses and short grass grazing lawns. The PFT categories were finalised following the vegetation nomenclature provided in [17] and were modified based on knowledge from dry season field survey and the spectral reflectance properties of the different vegetation components contained within satellite imagery. Within the landscape, woody components are mainly trees and shrubs, which in many cases were challenging to objectively differentiate. This is a well-known dilemma in savannah landscapes due to structural complexities such as multiple stems, varying disturbance adaptations and height limitations [43]. The common practice has been to use arbitrary morphological traits like diameter and height thresholds depending on research objectives and ecological relevance. For example, [17] used a main trunk diameter threshold of 7 cm to distinguish between trees and shrubs, where woody components with >7 cm diameter were classified as trees and those with <7 cm diameter as shrubs. Moreover, optical satellite imagery only records planar-view spectral reflectance information from surface cover with little structural detail. This presents a further challenge for successful differentiation of vegetation structure. The woody components were thus differentiated based on dry season phenological differences (leaf on/off), which allowed for objective delineation both in situ and on satellite imagery. Bunch grasses were identified as tall grass patches with height >20 cm. In contrast, grazing lawns were identified as short grass areas with stoloniferous growth forms and height <20 cm. Unlike bunch grasses which generally occurred as dense patches, grazing lawns within the study sites often had a sparse distribution and exhibit a relatively smooth texture in appearance, which aided visual interpretation of VHR satellite imagery. In addition to the PFTs, waterbodies, bare soil surfaces, built-up features and shadows on satellite imagery were identified, which constituted the classes used in this study ( Figure 3). Table 1 provides a summary of the land cover classification nomenclature used.

Satellite Imagery
Multispectral VHR imagery from the WorldView-3 satellite sensor was used in this study. Ortho-ready standard 8-band multispectral scenes (in UTM/WGS 84 projection) were acquired which had been processed to level 2A by the vendor ( Table 2). The images were acquired in the dry season, on July 1, 2019 (at 28.97 • Sun Azimuth and 19.49 • off Nadir) and July 7, 2019 (at 30.98 • Sun Azimuth and 1.85 • off Nadir) for the Lower Sabie and Satara sites, respectively, under cloud-free conditions. Dry season imagery has previously been used to successfully discriminate vegetation types in similar contexts [17,44]. Apart from the reduced persistence of cloudy conditions in the dry season, which is an advantage to optical satellite remote sensing particularly in the tropics [32], spectral differentiation is maximized due to phenological differences among different vegetation types [17,45]. Image acquisition was timed to coincide with our field survey season (June 26-July 21, 2019), which allowed for the collection of consistent reference information for land cover classification and validation.

Reference Data
Reference data on the different land cover types were generated from georeferenced field survey locations, and were extended via further interpretation of VHR images augmented by field photos and Google Earth satellite scenes. Overall, data from (i) 111 predefined field locations, systematically distributed within 200 m buffer beyond 100 m distance from access roads, and (ii) 5122 randomly distributed points from augmented visual interpretation, formed the reference data points (i.e., total of 5233 points) for training (3807-i.e., 73%-reference points) and validation (1426-i.e., 27%-reference points). Polygons of spectrally homogeneous areas were manually digitised and labelled according to land cover class IDs (see Table 1) using the locations of training points. The polygon extents (Table 1) were then used to extract image pixels for model training. Of the many potential approaches that could be used to extract training pixels, polygon objects have been shown to provide the most accurate classification outcomes [46,47]. Spectral plots of the different land cover classes were examined and reviewed alongside Jefferies-Matusita distance measures to ascertain adequate spectral separability prior to model training [48]. Models were parametrised and trained for prediction based on data from a sub-area within the Lower Sabie field site (see training region in Figure 1). This was necessary to ensure high model quality as a greater proportion of the georeferenced field sample locations was concentrated within the training region, while reducing computational cost. In contrast, map validation was conducted using site-specific reference data.

Auxiliary Data
Multiple buffer distances (100 m divisions) from water source were used to analyse spatial pattern in grazing lawn distribution. Water points represent significant resources and important predictors of grazer movement [49] and spatial heterogeneity in general within semi-arid savannah landscapes [50]. The data ( Table 2) was downloaded from OpenStreetMaps surface water archive (streams, rivers and reservoirs) [51] and was validated against a drainage network and stream order data obtained from the Scientific services of South African National Parks. The OSM surface water layer contributed in November 2019 had the closest temporal coverage to the acquisition dates of satellite imagery and field data, and was selected for spatial analysis.

Preparation of Image Features
Following acquisition, a cubic convolution resampling approach was used to upsample images to 2 m spatial resolution, which is reflective of the average minimum patch size of short grass grazing lawns in southern African savannahs [6,7]. We calculated a series of spectral indices highlighting greenness, moisture and soil properties in order to increase utility of the spectral information contained in the original image bands (Table 3). Greenness, moisture and soil indices are derived from arithmetic combination of spectral information recorded in visible and near-infrared image bands and exhibit high correlation with vegetation characteristics such as phenology [52][53][54], biomass [55][56][57] and moisture content [58,59]. To complement the spectral information, spatial heterogeneity measures were calculated as a selection of simple and advanced Haralick texture features based on Gray Level Co-occurrence Matrix (GLCM) [60]. The GLCM variables (Table 3) were calculated on the near-infrared band (NIR1) (see Table 2 for details on image bands), which contains valuable spectral information for differentiating vegetation characteristics. We used a probabilistic quantizer, with 32 quantization levels in a 3 × 3 moving window, at an offset distance of 1 pixel in all directions (0 • , 35 • , 90 • and 135 • ) [32,61]. In total, 27 spectral indices and 18 texture features were processed using the Orfeo-Toolbox remote sensing image processing software [62] ( Table 3). The spectral indices as well as texture features in combination with the original image bands served as input data in the machine learning models and analysis workflow (summarised in Figure 4). Incorporating spectral and textural image features is well known to enhance discrimination space for more accurate land cover mapping particularly in heterogeneous savannah landscapes [32,63,64].

Feature Selection
Remote sensing image features such as spectral indices (vegetation, moisture and soil indices) as well as texture variables tend to exhibit high levels of collinearity. Highly correlated features increases data redundancy and risk of overfitting, which could have adverse consequences for algorithm performance especially for high-dimensional datasets [66], a problem that results from the Hughes phenomenon [67]. Although nonparametric machine learning algorithms are thought to be less susceptible to Hughes phenomenon, recent findings shows they benefit from dimensionality reduction nonetheless [68]. In our case, we aimed to target the most robust predictor set while reducing prohibitive computational efforts.
Image variables were selected by combining two procedures. First, we checked for collinearity with the Variance Inflation Factor (VIF) using the "usdm" package [69] within R-programming environment [70]. This was done separately for the spectral indices (i.e., vegetation, moisture and soil) and the Haralick texture features derived from the WorldView-3 imagery (see Table 3). VIF measures the degree to which predictor variables are correlated. For example, given k independent predictor variables, each variable is regressed with the remaining k − 1 variables and coefficient of determination (R 2 ) is estimated. The VIF of the dependent variable is thus computed as Large values of VIF implies a corresponding high degree of collinearity and vice versa. Following VIF analysis, correlated variables were subsequently removed by considering a stepwise elimination threshold of VIF ≥ 10 [71]. The VIF assessment resulted in six spectral indices and 15 Haralick texture features being retained.
Second, we combined the less correlated spectral indices and texture variables with the original image bands to select final image feature subset using Random Forest-Recursive Feature Elimination (RF-RFE) [72,73]. Recursive Feature Elimination (RFE) is an iterative process that uses some measure of feature importance to rank and select features by backward elimination [72]. The technique basically builds a model with the entire feature set, computes an importance score for each feature, removes the least important features and repeats the process until a user-defined number of features subset is reached. We used feature importance scores derived from random forest out-of-bag (OOB) error estimates for ranking features in the RF-RFE process. We then determined the final subset of features by analysing the relationship between number of features and accuracy scores derived from a stratified 10-fold cross-validation assessment. Overall, 26 WV-3 image features achieved optimal accuracy (See Supplementary Data in Appendix A). All steps in the RF-RFE process were implemented using Scikit-learn python library [74].

Machine Learning Algorithms
There is a proliferation of machine learning algorithms, which, coupled with the conflicting reports of their performance in remote sensing classification literature [75], makes it challenging to select the optimal method for any specific application. The optimal classification algorithm is generally context-specific and in most cases depends on the landscape and classes mapped [76], parameter settings [75,77,78], nature of training data [79][80][81][82] and data dimensionality [75,83]. Lawrence and Moran [76] recommend prior experimentation with multiple classifiers to determine optimal performance.
For this study, we tested four state-of-the-art nonparametric machine learning algorithms: Random Forest (RF) [27], Support Vector Machines (SVM) [28], Classification and Regression Trees (CART) [29] and Multilayer Perceptron (MLP) [30]. All have been shown to achieve high performance in many remote sensing applications, and in particular, land cover mapping [75]. Their superiority in handling complexity and high-dimensional data makes them ideal for application in highly heterogeneous savannah landscape conditions [23]. The selected algorithms were configured and implemented in the python programming environment using Scikit-learn python library [74]. Optimal parameter values (see in Table A10 in supplementary data, Appendix A) from hyperparameter tuning were used in each model. Summary descriptions of how the algorithms work are presented below.

RF
The RF classifier is an ensemble of decision tree algorithms with demonstrated robustness in remote sensing image classification compared to single classifiers [27,79]. The algorithm relies on unit vote contributions from each classifier within the ensemble to assign input vectors to different classes, where the most frequently voted class is retained [27]. The individual decision trees are parameterised using several independent random subsets of training data sampled through bootstrap aggregation or bagging. This reduces multicollinearity and generalization error [26,27]. The input vectors that do not form part of the bootstrap sample (i.e., "out-of-bag" (OOB) sample) are used for evaluation and variable importance estimation [27,84]. By design, decision tree classifiers require some measure for selecting suitable features per class, which maximizes dissimilarities between classes [79]. The RF algorithm uses Gini Index for feature selection at each node [85]. When assigning an input pixel to a class (C i ), for a given training set (T), the Gini Index measures feature impurity with respect to the different classes and is expressed as is the probability that the selected pixel belongs to class C i [79,86]. Each decision tree therefore grows to a maximum depth using a combination of features. The number of features used to grow a tree at each node and the number of decision trees are the required user-defined parameters to instantiate a RF prediction model [86].

SVM
SVM was developed based on statistical learning theory [87]. The algorithm creates an optimal separating hyperplane based on the location of a small subset of training samples at class boundaries, the so-called "support vectors" [28]. Given a simple binary linear classification problem, the SVM uses quadratic optimization techniques to select the optimum margin of separation between the two classes such that the distance to the hyperplane from the closest support vectors of both classes is maximal [28,82].
For a nonlinear classification problem, the algorithm selects the optimal margin by (i) allowing some misclassification errors and (ii) transforming the original input space into a higher dimensional feature space using nonlinear functions φ [87], making linear separation possible in the new feature space. To reduce computational cost, kernel functions, , such as polynomials, radial basis and sigmoid functions, are used for the transformation [88]. The decision function is given by where α i is a slack variable (Lagrange multiplier).
To classify new datasets, the algorithm uses learned parameters from the decision function based on training data. The trade-off between margin of class separation and misclassification errors is controlled by defining a regularisation parameter C, where C ∈ Z and 0 < C < ∞ [28].

CART
CART is a decision tree algorithm that builds classification or regression trees based on categorical or numerical attributes, respectively [85]. The structure of the tree is typified by a root node and a series of internal nodes (splits) and terminal nodes (leaves). Within this framework, the algorithm builds a model by recursively partitioning the training dataset into increasingly homogeneous subsets using tests applied at each node to training features [89]. Given a continuous data set, the test performed at each node is of the form for decision functions based on a single feature (i.e., univariate decision trees), where x i is a measurement in n feature space (n = 1 in this case) and c is a decision threshold estimated from the range of x i measurements [90]. The threshold (c) value is determined using an impurity measure such as entropy [91] and the Gini index [85]. If the decision boundaries are defined by a combination of features (i.e., multivariate decision trees), the test takes the form where a i is a vector of coefficients of a linear discriminant function estimated from the training data [90]. The series of testing outputs form the branches of the tree which proceeds sequentially through internal nodes until a terminal node is reached. At each terminal node, class labels are assigned based on maximum probabilities [92].

MLP
The MLP is a feedforward artificial neural network (ANN) classifier which is trained using back-propagation [93]. Learning in ANN is inspired by the functioning of neurons within the brain, which is based on parallel and distributed processing of information [94]. Similarly, the MLP architecture is composed of multiple layers of fully connected processing units called neurons, which are arranged sequentially as a network of input, hidden and output layers. During training, each unit in a hidden layer receives data from the input/previous layer, processes it and feeds it forward to units in the next layer [95]. This allows more abstract representations of the data to be learned until the output layer is reached [26]. The connections between units carry weights, which are modified iteratively to minimise a cost function. Apart from the input layer, the net input to each unit is therefore the weighted sum of outputs from the previous layer [94,95]. The net input is wrapped in an activation function to produce the output for that unit. The output for each processing unit is expressed as where o i is the output of a neuron in layer i, w ij is the connecting weight between layers i and j, o i is output from layer j and b i is bias and f is the activation function [94,95].

Algorithm Calibration and Evaluation
The machine learning algorithms, namely, RF, SVM, CART and MLP, were first calibrated and evaluated for general land cover classification using data from a sub-area within the Lower Sabie landscape (see Figure 1) via a nested cross-validation approach. For each algorithm, the combination of parameters that returned the best expected classification accuracies were then used in the prediction of grazing lawn occurrence probabilities in the broader Lower Sabie and Satara Landscapes. The steps employed are broadly summarised into (i) data preparation, (ii) parameterisation training and classification and (iii) accuracy assessment. All processing was done using Intel(R) Core(TM) i5-6200U CPU with 8GB RAM on 64 bit Windows 10 operating system, and was supplemented by leveraging the power of Google's free GPU hardware, using the Google Collaborator platform.

Data Preparation
We used the post-RF-RFE spectral and texture variables as input predictors for modelling. The input dataset was then transformed by subtracting the mean and scaling to a unit variance to generate normalised scores per feature using Equation (7): where x i , µ j and σ j are pixel value, mean and standard deviation of pixels in the jth feature, respectively, and z ij is the transformed value of x ij [96]. Normalising input features is a crucial preprocessing technique which approximately equalises dynamic data ranges in features for unbiased and improved learning [97]. Further, it is a common requirement prior to the training of machine learning estimators such as Support Vector Machines and Artificial Neural Networks [96].

Parameterisation, Training and Classification
Each of the selected algorithms comes with a set of hyperparameters which has to be tuned to maximise performance during training. Algorithm training thus involved hyperparameter optimisation whereby optimal hyperparameter sets were selected for RF, SVM, CART and MLP algorithms from a predefined grid (see Analysis Script in Appendix B). The optimisation process and selection of best model parameters were performed using randomised grid search in a 2 × 5 nested cross-validation approach with 10 iterations. Nested cross-validation incorporates optimal hyperparameter selection and unbiased estimation of model performance in inner and outer cross-validation loops respectively [98]. The approach is mostly recommended against traditional "flat cross-validation" which results in biased accuracy estimates due to information leakage and the split sample method plagued by insufficient availability of training and test datasets [99]. The chosen thresholds for tune-length and train-test splits were deemed appropriate to provide a reasonable trade-off between ensuring a robust model and computational time. Hyperparameters that returned the best expected classification accuracies were selected and used as input parameters in the machine learning algorithms. The algorithms were retrained with the full training data for landscape-wide prediction of land cover occurrence probabilities in both the Lower Sabie and Satara landscapes.
Individual image variable weights were computed using permutation feature importance estimates [74] in order to assess their relative contributions in each machine learning model. Permutation feature importance (PFI) generates variable weights based on an observed decrease in model score when a single variable is randomly shuffled [27]. The drop in model score thus represents the degree to which the model depends on the variable of interest. The PFI technique is model agnostic, which makes it suitable for comparison of feature importance estimates from RF, SVM, CART and MLP models used in this study.
The predicted occurrence probability surface for grazing lawns was selected and used as input in an optimised probability thresholding procedure. Optimised probability thresholding involved the selection of a single occurrence probability value (threshold) which maximises some measure of classification accuracy [100] for the target class. We tested a series of probability values at 0.05 intervals to determine the threshold that maximises F-score of grazing lawn detection. Grazing lawn (G) and non-grazing lawn (O) classes were assigned using simple relational expressions represented by Equation (8) and Equation (9), respectively, where p is occurrence probability and t is the optimal probability threshold, t ∈ p.

Accuracy Assessment and Comparison
Model performance in discriminating different savannah land cover types during hyperparameter tuning was assessed using point and interval estimates of Overall Accuracy (OA) and F-score, based on a 2 × 5 nested cross-validation approach (see Section 2.7.2). Further, accuracy of grazing lawn/non-grazing lawn binary maps was assessed by confusion matrix [101], from which precision, recall, F-score and OA metrics were calculated using Equations (10)- (13). Accuracy-adjusted estimates of grazing lawn area coverage were obtained following Olofsson et al. [102].
where tp, fp, tn and fn represent the number of true positive, false positive, true negative and false negative cases, respectively. Marginal homogeneity between predictions from model pairs was tested at 5% level of significance using the McNemar chi-squared (χ 2 ) test [103]. The McNemar test compares the error matrices of two classification methods to test the null hypothesis that the two methods have the same error rate. The method is based on χ 2 -test and provides a robust statistical comparison of class-wise predictions between two algorithms [104]. Additionally, the estimated proportion of grazing lawn cover (PGLC) was compared for model pairs using the two-proportion Z-test at 5% level of significance. The two-proportion Z-test follows a χ 2 distribution with one degree of freedom [26], and was used to test the null hypothesis of no difference between PGLC for model pairs.

Spatial Analysis of Grazing Lawn Distribution
Using spatial metrics, we determined characteristics of grazing lawn distribution at landscape-scale and analysed spatial patterns along a gradient of distance from water source in both Lower Sabie and Satara landscapes. Spatial metrics provide vital information on landscape configuration and composition [105]. Spatial-contextual information such as density, shape, size and aggregation of land cover patches can be extracted from spatial metrics to better understand ecological processes at the landscape-scale [105,106]. The classification output with the least error properties was selected as input in the calculation of (i) Number of Patches (NP), (ii) proportion of landscape covered by grazing lawns (PL), (iii) maximum patch area (MPA) and (iv) cohesion index (CI) [106], from which patterns in grazing lawn structure were determined. Further, Pearson's correlation and coefficients of determination were estimated in order to identify the nature and significance of the relationship between grazing lawn structural attributes and proximity to water source. Calculation of the selected spatial metrics was carried out using the "SpatialEco" package [107] in the R programming environment [70].

Model Quality for Land Cover Classification
Cross-validation accuracy results for individual models using F-score and Overall Accuracy (OA) measures are presented in Table 4. Generally, all models achieved high accuracies in differentiating the different land cover classes, with median F-score and OA measures ranging between 92.75 ± 0.005% and 95.73 ± 0.003% and 90.92 ± 0.002% and 94.27 ± 0.003%, respectively. RF, SVM and MLP models had similar accuracy scores and marginally outperformed CART for both F1 and OA measures (Table 4).   Figure 6 shows permutation feature importance estimates across all models. A mix of image features from original spectral bands, spectral indices and texture variables showed high importance in each model. There was generally more agreement among SVM, CART and MLP models in assigning relatively more importance to original spectral bands in terms of both magnitude of feature weight and number of features. However, image features that exhibited high importance in the RF model were largely dominated by texture variables (Figure 6). Image features that were of highest importance in RF, SVM, CART and MLP models were S_SI5, B_G, B_R and B_Y, respectively.  Table 3. A summary of the most important predictors for each feature group (i.e., spectral bands, spectral indices and texture variables) is presented in Table 5. We concur with the authors of [108] that identification of the most important image features considers both the magnitude of feature weights and consistency of being assigned high importance across all models. In terms of magnitude, image features that were considered highly important in each model were limited to the first three features, in each feature group ( Figure 6). Conversely, features were deemed consistent if they were assigned high importance in at least three models (Table 5). Among the most important image features were B_C and B_Y for spectral bands, V_GEMI and V_MSAVI2 for spectral indices and T_Mean and T_SAvrg for texture variables (see Table 5 for features in bold).

Grazing Lawn Occurrence Probability Prediction and Classification
The outputs of grazing lawn occurrence probability surfaces for RF, SVM, CART and MLP models are shown in Figures 7A and 8A for Lower Sabie and Satara landscapes respectively. The general pattern of grazing lawn occurrence probability surfaces at both study sites is comparable among the four models. Within the Lower Sabie site, high grazing lawn occurrence probabilities were mostly confined to the eastern and north-eastern part of the landscape, and were similar across all models ( Figure 7A). The obvious qualitative difference among models is the relative lack of many very low values in the CART probability surface compared to RF, SVM and MLP models.
Within the Satara landscape, high grazing lawn occurrence probabilities mostly aligned along a diagonal stretch from northwest to southeast ( Figure 8A), which is the interface of the granite and basalt geologies. Despite similarities in spatial distribution of high occurrence probability values, there were noticeable qualitative differences in range among the four models. The RF probability surface exhibited a relatively high prevalence of a continuous range of very low to medium probability values across the landscape, and very few distinctively high occurrence probabilities. In contrast, the CART model predicted relatively more medium to high probability values across the landscape, while MLP and SVM predictions were similar in the distribution of very low and very high occurrence probability values ( Figure 8A). Plots of model F-score, Precision, Recall and OA values generated over a series of predicted probabilities for the Lower Sabie and Satara landscapes are presented in Figure 7B and Figure 8B, respectively. Analysis of the relationship between F-score and predicted probabilities revealed the optimal threshold for classifying grazing lawns. The optimal threshold is the probability value which maximises model F-score of grazing lawn detection, and was found to coincide with or lie close to the equilibrium point between model Precision and Recall. The resulting values varied across models, with the F-score of RF, SVM, CART and MLP models peaking at 0.5, 0.4, 0.6 and 0.35, respectively, for Lower Sabie as seen in Figure 7B. Similar analysis on predicted probability surfaces for the Satara landscape resulted in relatively lower thresholds for RF, SVM and CART (0.35, 0.25 and 0.35, respectively) and a higher threshold for MLP (0.6) where model F-scores were maximum Figure 8B. The grazing lawn/non-grazing lawn binary maps resulting from applying corresponding thresholds to each of the four predicted probability surfaces are shown in Figures 7C and 8C for both Lower Sabie and Satara landscapes, respectively. Analogous to the probability surfaces, patterns of grazing lawn distribution were similar for all classifications within both landscapes. However, local variations persisted and were consistent with the distribution of predicted probability values for each model in both landscapes. Overall, the Satara maps showed a considerable level of speckling ( Figure 8C).
A summary of accuracy measures for grazing lawn detection is presented in Table 6. F-scores for Lower Sabie ranged between 0.81 for CART to 0.94 for MLP, while SVM and RF classifications achieved F-score of 0.93 and 0.89 respectively (Table 6). Grazing lawn detection accuracy results were high, but relatively lower for the Satara area compared to Lower Sabie. F-score ranged between 0.75 for CART to 0.88 for SVM, while RF and MLP achieved F-scores of 0.87 and 0.85, respectively (Table 6). Accuracy-adjusted estimates of area covered by grazing lawns within both landscapes are presented in Table 7. As expected, all model classifications gave comparable estimates of grazing lawn cover within the Lower Sabie site, ranging between 2.46 km 2 (for RF) and 2.98 km 2 (for CART) ( Table 7). In contrast, estimates of grazing lawn cover were significantly different (p ≤ 0.05) for all models within the Satara landscape (see test results in Table A1 of supplementary data).  Table 8 showed statistically significant differences (p ≤ 0.05) in grazing lawn detection error rate when comparing CART to RF, SVM and MLP models in both Lower Sabie and Satara landscapes. In contrast, no significant differences were observed for all the other model pairs (Table 8).

Spatial Patterns in Grazing Lawn Cover
Landscape-scale summary of number of grazing lawn patches, total coverage, connectedness and patch size distribution are presented in Figure 9. Number of grazing lawn patches was relatively higher in Lower Sabie compared to the Satara landscape ( Figure 9A). However, analysis of patch size distribution revealed the Satara landscape as having relatively larger grazing lawn patches ( Figure 9D), and higher area coverage compared to the Lower Sabie landscape ( Figure 9B). Spatial connectedness of grazing lawn patches was however comparable in both landscapes ( Figure 9C).
Further analysis of patterns in the proportion of landscape covered by grazing lawns (PL), maximum patch area (MPA) and cohesion (CI) revealed significant relationships with distance from water sources in both landscapes. Grazing lawn PL, MPA and CI showed an inverse relationship with distance from water source in both landscapes (see correlation coefficients in Table 9). However, the trends were relatively less distinct in the Lower Sabie landscape (Figure 10), as also suggested by the differences in magnitude of correlation coefficients between both landscapes (Table 9). Overall, grazing lawn fractional cover, patch size and spatial connectedness were highest within 0.7 km from water sources in both Lower Sabie and Satara landscapes ( Figure 10).

Model Quality for Savannah Land Cover Classification
In this study, efforts were focused on developing robust machine learning framework for grazing lawn detection by first assessing model performance for general classification of savannah land cover types. The convergence of remote sensing and data science techniques through machine learning offers unparalleled capacity for more accurate processing of satellite imagery, especially for the purposes of land cover monitoring. While this presents many advantages for remote sensing-based ecosystem monitoring, the choice of fit-for-purpose machine learning algorithms often requires some experimentation. This is partly due to the vast availability of options to select from, but most importantly also due to contextual differences in application such as varying landscape conditions, data and research objectives [76,82]. Robust evaluation of algorithm performance is therefore vital for the selection of optimal models for application. The nested cross-validation approach used here allowed the simultaneous tuning of hyperparameters and unbiased estimation of individual model performance. In so doing, the optimum combination of algorithm hyperparameters which enhanced model quality could be selected. Model quality evaluation via nested cross-validation has been proven effective in avoiding biased accuracy estimates common in "flat cross-validation" due to information leakage, while preventing poor model generalisation capabilities due to data paucity-a regular challenge of the split sample approach to model evaluation [99].
All machine learning models (RF, SVM, CART and MLP) demonstrated high performance in classifying the different savannah land cover categories. Even so, RF, SVM and MLP marginally outperformed the CART model. The relatively lower performance of CART observed in this study is consistent with widely reported inferiority of single decision tree (DT)-based classifiers relative to other machine learning algorithms for land cover classification [23,31,109]. For example, similar findings were reported by Kaszta et al. [23] in a comparative assessment of classification algorithms for seasonal separation of southern African savannah components. The authors recorded CART as having the lowest accuracy score in both pixel-based and object-based approaches relative to SVM and RF algorithms. Camargo et al. [31] compared the performance of RF, SVM, MLP and DT for classifying the Brazilian tropical savannah biome, and found that DT produced a relatively lower performance than RF, SVM and MLP classifiers. Similar to our findings, the authors recorded closely comparable performance for the latter three algorithms. In a related Mediterranean land cover monitoring study, Rodriguez-Galiano and Chica-Rivas [109] reported significantly lower mapping accuracy for DT than SVM, ANN and RF algorithms.
Although CART is relatively more flexible and intuitive to implement, it is very sensitive to small variability in data, which, in this case, may have contributed to the relatively lower performance in a heterogeneous savannah landscape. Compared to single decision trees such as CART, the RF algorithm draws its higher generalisability from the contribution of multiple decision trees parameterised using random subsets and bootstrap aggregation [27]. RF is therefore highly adaptable to different data ranges and robust against multicollinearity. Similarly, the MLP architecture allows more abstract representations of data to be learned [95]. However, the performance of MLP is strongly influenced by input data structure, and performs better when data ranges of all input features are equal [96]. Thus, the inclusion of data normalisation during preprocessing likely aided gains in classification accuracy. In the case of SVM, the use of nonlinear vector mapping functions facilitates creation of decision boundaries for effectively dealing with nonlinearly separable classes [88]. The superiority of RF, MLP and SVM algorithms in dealing with the typical spectral homogeneity of the heterogeneous savannah landscape could thus be attributed to their relatively higher adaptive capacity in complex nonlinear classification problems. It should be noted that RF was used in an RF-RFE procedure for selecting final input image features for classification (see Section 2.5). This may have aided the performance of RF, although the RF-RFE algorithm was configured with different (default) hyperparameters during implementation.
As expected, a combination of original image bands, spectral indices and texture variables enhanced discrimination capacities of the machine learning models in savannah land cover classification. Across all models, the most important predictors-B_C, B_Y (original bands), V_GEMI, V_MSAVI2 (spectral indices), T_Mean and T_SAvrg (texture features)-highlight variations in photosynthetic status and structure of savannah vegetation. The high importance of the coastal blue (B_C) and yellow (B_Y) WorldView-3 image bands could be attributed to their strong sensitivity to differences in foliar chrolophyll content [24], given that images were acquired in the dry season, which is when phenological differences are most pronounced [32,45]. Several studies have reported the contribution of these bands and the red edge band in mapping vegetation components [23,110,111]. Unlike the reported studies, the red edge band was not very important in our classification of savannah land cover. The coastal blue wavelength is absorbed by chlorophyll in healthy plants while the yellow band detects dryness/"yellowness" of vegetation, both of which are instrumental in vegetative analysis. The high importance of the spectral indices could be explained by their strong correlation with vegetation biomass [56,57] and moisture content [58,59], which helps to capture the varying characteristics of heterogeneous savannah vegetation that would otherwise be attenuated when using the original image bands alone. Additionally, the high importance of the texture features highlighted the chromatic variations in dry season savannah vegetation components. Both T_Mean and T_SAvrg measures inter-pixel average in brightness values which were sufficiently captured at 2 m resolution of the WorldView-3 in a heterogeneous savannah landscape.

Grazing Lawn Detection and Model Comparison
After ascertaining the optimal parameters for training, models were refitted with the entire training set for wide-scale prediction of land cover occurrence probabilities in both Lower Sabie and Satara landscapes. For each landscape, the general pattern of grazing lawn occurrence probabilities was comparable for all models, particularly the distribution of high occurrence probability values. Individual model outputs however exhibited local predictive variations in the distribution of low to medium occurrence probabilities, which could be reflective of differences in model complexity and uncertainties [112,113]. Binary maps were derived from predicted grazing lawn occurrence probability surfaces using thresholds which maximised F-score. Hird et al. [100] adopted a similar approach for large area classification of wetlands and drylands using True Skill Statistics (TSS), and achieved 85% OA score. As expected, derived maps showed more coherent representation of grazing lawn areas within the Lower Sabie landscape, while maps for the Satara landscape were characterised by relatively higher degree of noise, particularly for the CART-derived map. This was also reflected in F-score measures, where relatively higher grazing lawn detection accuracies were recorded across all models for the Lower Sabie savannah landscape. It should be emphasised that models were trained using data from the Lower Sabie landscape-for reasons related to training data quality and computational cost-which allowed testing their spatial transferability. Possible differences in dry season reflectances due to the different rainfall regimes and underlying geologies as well as difference in image acquisition geometries, may have increased prediction bias of model spatial transfer and contributed to the observed differences both across the different landscapes and models. Further analysis is required to comprehensively ascertain the impacts of differences in environmental conditions and image acquisition characteristics on spatial transferability of the classification models. McMenar and two-proportion Z-test results showed significant differences (p ≤ 0.05) in error rates for pairs of of CART versus other models, further highlighting the relative preponderance of RF, SVM and MLP. Overall, differences in accuracy of grazing lawn detection were attenuated when models were applied in a different spatial context. This suggests the need to calibrate predictive models with local contextual information prior to application. Additionally, our findings re-emphasise recommendations from [114][115][116] to conduct statistically rigorous comparison of accuracy statements before drawing definitive conclusions on map quality assessment.

Spatial Patterns in Grazing Lawn Distribution
The formation and persistence of grazing lawns in southern African savannah landscapes has been identified to be dependent on a number of interacting top-down and bottom-up ecological processes. Among the most widely reported are continued grazing [4,117], which can be linked to fire, rainfall and nutrient hotspots concentrating grazers on specific areas [11][12][13]. Spatial variations in such factors are thus expected to shape spatial patterns and distribution of grazing lawns [10].
Both fire and grazers consume grass biomass, and have the potential to shift grass communities into tall grass or short grass grazing lawn states [11]. However, the rate at which these alternate grassland states are established is strongly influenced by landscape productivity [12]. Tall grasses are strong light competitors and are well adapted to fire-prone conditions due to their extensive rooting system, which makes them the dominant grass community under high productivity conditions [10]. On the other hand, short grass grazing lawns can withstand high grazing pressure due to their stoloniferous growth form which protect reproductive parts from being destroyed. Subject to similar grazing conditions, the proportion of grazing lawn cover within the savannah landscape is expected to be lower under high rainfall regimes. This is consistent with the relatively high grazing lawn coverage in the semiarid Satara landscape compared to the mesic Lower Sabie landscape.
Different parts of the savannah landscape may be predisposed to grazing lawn formation due to the presence of resource hot-spots that attract grazers [10]. This study explored the relationship between water sources as resource hot-spots and patterns in grazing lawn distribution. Grazing lawn structural attributes expressed as fractional cover (PL), maximum patch area (MPA) and connectivity (CI) were examined relative to distance from water sources. Generally, patterns in grazing lawn structure significantly correlated with distance from water sources, and was similar in both mesic and semiarid landscapes. The largest contiguous grazing lawn patches were found within 0.7 km from water sources, which is suggestive of the prevalence of grazing lawns in close proximity to waterbodies. This could be attributed to increased grazer activity around water sources [49,118] and is consistent with observations that landscape-scale distribution of grazers is generally biased towards areas around reliable rivers and permanent waterholes [49,119]. For example, Smit [118] found that different grazers of varying body mass and digestive requirements had significantly strong association with both rivers and artificial waterholes in Kruger National Park. This phenomenon is especially evident during dry seasons when moisture content of graze is low [120] and surface water is spatially restricted [119]. Additionally, sodic sites which are highly utilised by grazers and hence have extensive grazing lawn cover, occur close to waterbodies and drainage lines, and may have contributed to the observed patterns.
It is important to note that other landscape phenomena may be influencing spatial patterns in grazing lawn distribution. For example, the prevalence of open grasslands, which is typical of the Satara landscape, may lead to the formation of more grazing lawn patches. Burkepile et al. [121] found that more open savannah grasslands with sparse woody cover make attractive habitats or grazing grounds for selection by herbivores such as zebra (Equus quagga) and blue wildebeest (Connochaetes taurinus), in part to mitigate the risk of predation.

Conclusions
Dynamics in grazing lawn communities in southern African savannahs have been directly linked with fluctuations in mega-herbivore densities and changing fire regimes, with cascading effects on ecological processes such as nutrient cycling, plant community composition and habitat structure. Knowledge of their coverage and distribution is therefore critical to understanding habitat heterogeneity and the overall ecology of these vital grassland systems. This study presents the first attempt to develop a broad-scale approach for grazing lawn detection using very high-resolution satellite images. We demonstrated the successful application of machine learning techniques for mapping grazing lawn occurrence from WorldView-3 satellite imagery, and further analysed their spatial structure and distribution in southern African savannahs. The RF, SVM and MLP models produced comparable accuracies in the classification of different plant functional types (PFTs) and other land cover, all of which outperformed the CART model. Differences in grazing lawn detection accuracy followed a similar trend particularly within the same landscape (Lower Sabie). Performance for all models however reduced when they were transferred to a different landscape (Satara) even though high accuracies were achieved. Analysis of grazing lawn spatial structure and distribution showed that the Satara savannah landscape supports a relatively higher proportion of grazing lawn cover than Lower Sabie. Additionally, larger and contiguous patches persist in close proximity to water sources, which concentrate grazers within the savannah landscape, irrespective of differences in underlying environmental conditions. The proposed approach provides a novel and robust workflow for accurate and consistent landscape-scale monitoring of grazing lawns. Additionally, our findings ascertain experimental and local-scale reports on grazing lawn dynamics at a wider landscape scale, and provide timely information critical for understanding habitat heterogeneity in southern African savannahs.  Acknowledgments: Great thanks to former colleague, Daniel Knight for his immense help in field data collection. We are grateful to South Africa National Parks (SANParks) Scientific Services for the initial comments on the project proposal, which helped shape the focus of this study. We would like to sincerely thank Corli Wigley-Coetsee, Samantha Mabuza and Obert Mathebula for providing logistical and administrative support during our field stay in KNP. We are also immensely grateful to our game guards Martin Sarela and Annoit Mashele.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Supplementary Data
Appendix A. 1

. Multicollinearity and Feature Selection
Multicollinearity analysis among derived image features showed that spectral indices exhibited higher correlation than texture features based on a stepwise elimination threshold of VIF ≥ 10. Twenty-one out of the 27 spectral indices (i.e., vegetation, moisture and soil combined) had collinearity problems. After eliminating the collinear variables, VIF values of retained spectral indices ranged from 1.19 to 3.94 ( Figure A2A) with linear correlation coefficients between −0.002 (S_SI9∼M_NDWI) and −0.666 (S_SI9∼S_SI5). In contrast, only three out of the 18 texture features exhibited collinearity. The retained texture variables had VIF values ranging from 1.00 to 9.22 ( Figure A2B) and linear correlation coefficients ranging between −6.14 × 10 −5 (T_Dent∼T_Diss) and 0.79 (T_IDM∼T_Ener). Overall, six spectral indices (three soil indices, two vegetation indices and one moisture index) and 15 texture features (seven simple and eight advanced Haralick features) were retained ( Figure A2A,B) and combined with the eight original image bands for final feature selection.    Selection of final input image features was conducted using Random Forest Feature Elimination (RF-RFE). The RF-RFE procedure resulted in 26 image features ( Figure A1) comprising of eight image bands, six spectral indices and twelve texture features. Figure A2C shows the relative importance scores of selected final input features in differentiating the different land cover categories ( Table 1). The first 13 (50%) most important features were dominated by nearly equal proportions of spectral bands and spectral indices (six and five, respectively), with two texture features, while the remaining features were largely composed of texture variables ( Figure A2C). Amongst the original bands, variables that exhibited high importance were B_C, B_R and B_Y. V_MSAVI2, S_SI5 and V_GEMI were the spectral indices of high importance, and the most influential texture variables included T_Ener and T_Mean. Appendix A.5. Final Model Hyperparameters Table A10. Results of optimal hyperparameter values used in each model, from 2 × 5 nested cross-validation using Scikit Learn python package. Refer to the work in [74] for more details on model hyperparameters.