Empirical Prediction of Leaf Area Index (lai) of Endangered Tree Species in Intact and Fragmented Indigenous Forests Ecosystems Using Worldview-2 Data and Two Robust Machine Learning Algorithms

Leaf area index (LAI) is an important biophysical trait for forest ecosystem and ecological modeling, as it plays a key role for the forest productivity and structural characteristics. The ground-based methods like the handheld optical instruments for predicting LAI are subjective, pricy and time-consuming. The advent of very high spatial resolutions multispectral data and robust machine learning regression algorithms like support vector machines (SVM) and artificial neural networks (ANN) has provided an opportunity to estimate LAI at tree species level. The objective of the this study was therefore to test the utility of spectral vegetation indices (SVI) calculated from the multispectral WorldView-2 (WV-2) data in predicting LAI at tree species level using the SVM and ANN machine learning regression algorithms. We further tested whether there are significant differences between LAI of intact and fragmented (open) indigenous forest ecosystems at tree species level. The study shows that LAI at tree species level could accurately be estimated using the fragmented stratum data compared with the intact stratum data. Specifically, our study shows that the accurate LAI predictions were achieved for Hymenocardia ulmoides using the fragmented stratum data and SVM regression model based on a validation dataset (R 2 Val = 0.75, RMSE Val = 0.05 (1.37% of the mean)). Our study further showed that SVM regression approach achieved more accurate models for predicting the LAI of the six endangered tree species compared with ANN regression method. It is concluded that the successful application of the WV-2 data, SVM and ANN methods in predicting LAI of six endangered tree species in the Dukuduku indigenous forest could help in making informed decisions and policies regarding management, protection and conservation of these endangered tree species.


Introduction
Indigenous forests in South Africa cover about 0.2% of the country's land surface [1].In KwaZulu-Natal Province, coastal lowland indigenous forests occur in small, fragmented and largely scattered patches in relatively dry landscapes [1,2].One such fragmented forest in KwaZulu-Natal is the Dukuduku indigenous forest, which is one of the largest and the best preserved remnants of South African coastal forests [3].The coastal forest in the Dukuduku area is highly threatened by the rapid growth of informal human settlements and agricultural systems [1].As one of the key features of landscape in South Africa, careful indigenous forest monitoring, management and conservation is critical.One way to manage and monitor indigenous forest ecosystems is to estimate the trees' structural (e.g., height), biophysical (e.g., leaf area index: LAI) and biochemical (e.g., foliar N content) traits.In general, these tree characteristics are measures and proxies for ecosystem resilience, services, conservation, landscape integrity and environmental health.The tree structural and bio-physiological traits could also be used to study the effect of climate change on the indigenous forest ecosystems.For instance, indigenous forest LAI and biomass are indicators of the amount of sequestered carbon which provides a relatively cheap means for offsetting significant shares of the annual greenhouse gas emissions [4][5][6][7].Specifically, forest LAI is an important biophysical trait for modeling the energy and mass exchange characteristics between the land surface and the atmosphere of terrestrial ecosystems [8].LAI is one of the most useful indicators of forest growth, biomass and net primary production [8].Therefore, predicting LAI of some tree species (e.g., endangered tree species) that play a vital role in ecosystem services is necessary and valuable information.Such information is important for tree growth monitoring, management and conservation measures to optimize, for instance the indigenous forest services, and forest health in general.On the other hand, the different land use/cover matrices and other ecological threats like land degradation can also significantly affect the tree biophysical and structural traits, as well as the productivity (i.e., net primary productivity) in the indigenous forest ecosystems.Specifically, studies have shown that LAI, which is a dimensionless variable that is defined as the total one-sided surface area of all leaves in the canopy per unit ground area [9], is an important biophysical trait that determines forest photosynthetic capacity processes [10][11][12].
Commonly, LAI is estimated through destructive and non-destructive ground-based methods [13].However, these ground-based methods for predicting LAI like the handheld optical instruments are time-consuming, laborious, subjective and pricy, particularly when carried out in large fragmented landscapes.Hence, studies sought complementary approaches that use rapid, up-to-date, cost-effective and synoptic data for predicting and modeling LAI [4,[14][15][16].The development in remote sensing technologies, data, and processing as well as analytical approaches make it possible to explicitly and accurately estimate LAI of forests and croplands [15][16][17][18][19].These studies predicted LAI using different empirical (e.g., linear and polynomial regressions) and physical (e.g., inversion of radiative transfer models) approaches.For example, the physical approaches have been used to estimate LAI of forests in different landscapes [15][16][17][18][19][20][21].Physically-based retrieval methods, which refers to inversion of radiative transfer models against remote sensing observations [22,23], while the empirical approaches that include multiple linear regressions based on various statistical approaches also have been applied in predicting LAI [19,24,25].However, identifying suitable variables for developing a multiple regression model is often critical because some variables are either weakly correlated with LAI or are highly correlated to each other [24].Moreover, the major challenge is that the multiple linear regressions have often produced limited results, mainly because of their requirements to satisfy some statistical conditions or to assume normal distribution of the input dataset as well as it suffers from multi-collinearity [26,27].In addition, the aforementioned studies utilized remotely sensed data of varying spectral (multispectral and hyperspectral) and spatial (fine and medium) resolutions and relatively accurate LAI prediction models were obtained.However, to our knowledge no study has modeled LAI of specific tree species in indigenous forest ecosystem.Predicting LAI at species level could help resource managers to understand the impact of various socio-ecological mechanisms on indigenous endangered tree species, for example and the vulnerability of these trees to external and internal perturbations.
Specifically, the previous studies have mostly focused on the use of spectral vegetation indices (SVIs) that combined the advent of two or three wavebands as opposed to the use of spectral features at a single waveband on modeling forests LAI [12,[28][29][30].SVIs are mathematical combinations of different spectral bands, commonly located in the visible and near-infrared (NIR) regions of the electromagnetic spectrum.These mathematical models for forming vegetation indices are two classes that include ratios and linear combinations, both of which exploit surface reflectance or raw digital number features of the satellite data.Ratio SVIs may be the simple ratio index (SRI) of any two spectral bands, or the ratio of sums, differences or products of any number of bands.Linear combinations are orthogonal sets of n linear equations calculated using data from n spectral bands [31].A prime reason for using these two mathematical transformations is to improve the interpretations and visualization of the included spectral information [31].SVIs have become one of the most important sources of information for monitoring vegetation, tree species, and other forest biophysical traits.The advantage of the SVIs is to enhance the information contained in the spectral reflectance by detecting the spectral variability that might be due to different vegetation, plant, canopy and leaf physiological, and morphological characteristics [32].Moreover, SVIs are efficient remotely sensed variables in reducing the noise in the spectral data due to, for example, the ambient atmospheric conditions, sun view angles, canopy geometry, shading, and soil background [33,34].Therefore, a number of SVIs have been developed and tested to estimate forest LAI.It is found that SVIs are suitable for detecting the within forest LAI spatial variability [32,33].The most commonly used SVI, the normalized different vegetation index (NDVI), simple ratio index (SRI) and soil adjusted vegetation index (SAVI) are calculated from multispectral data of low and medium spatial resolutions like Landsat and MODIS (Moderate Resolution Imaging SpectroRadiometer) and utilized for predicting forests LAI [12,29,[34][35][36].However, because of the low and medium spatial resolutions, the previous studies could not model the LAI at tree species level.
On the other hand, the newly launched multispectral sensors like Sentinel-2, WorldView-2 (WV-2), WorldView-3 (WV-3), and Pleiades provide data of the fine spatial resolution that can also capture the vegetation spectral properties at some unique portions of the electromagnetic spectrum (e.g., red edge).The fine spatial resolutions of these newly launched sensors and the inclusion of the uniquely located bands in calculating the SVI can offer a great opportunity for predicting LAI at tree species level.The new additional bands (e.g., yellow and red edge), which were previously contained in the hyperspectral data, could overcome the limitations of the conventional bands (e.g., red) while reducing the unnecessary redundancy in the hyperspectral data [37][38][39].Studies have shown that the key bands like the red edge of multispectral data are useful in characterizing the spatial variability of vegetation biophysical traits like LAI [14].
When the empirical regression methods are concerned, most of the above-mentioned studies either employed multiple linear regressions or machine learning approaches like random forest (RF) to estimate LAI at forest level.However, as previously mentioned the conventional empirical methods are constrained by some limitations related to the normal distribution of the response variables and multi-collinearity [26,27].The use of machine learning methods has therefore regarded as efficient and robust protocols for predicting forest biophysical traits in the field of remote sensing [40][41][42].Particularly, these methods, which make no assumption on the out response variables distribution, have increasingly offered a better capability to analyze remotely sensed data [43,44].In particular, there is a lack of knowledge on whether high resolution multispectral data (e.g., WV-2) could be employed for predicting LAI of individual tree species.Moreover, ecologists might need to test whether there are significant differences between LAI of endangered tree species grown in intact and fragmented forests in order to assess the ecosystems services and resilience in the value chain, for instance.The empirical prediction of tree LAI in such complex and dynamic forest ecosystems using remotely sensed data may require efficient and robust machine learning regression algorithms like RF, support vector machines (SVM), artificial neural networks (ANN) and partial least squares (PLS).RF is a robust non-linear algorithm for predicting forest LAI [45].However, one drawback of RF regression algorithm when many input predictors are utilized is that it selects predictors that could be correlated to each other [46,47].On the other hand, PLS regression, which is a linear method that used for predicting forest LAI [48], could not accurately handle higher dimensional data with relatively fewer training samples like the ones utilized in this study and it lacks of an excellent empirical performance compared to other machine learning regression algorithms [49,50].Hence, this study used two non-linear machine learning algorithms, viz., SVM and ANN, to predict endangered tree species LAI using WV-2 spectral variables.SVM is a well-known machine learning algorithm that has frequently used to change the nonlinear regression problem into a linear projection by a variety of kernel approaches [51,52].The advantages of SVM compared to other conventional linear and non-linear regression methods are that it offers excellent generalization abilities.It also provides sparse solutions where only the most relevant sample of the calibrating data are weighted resulting in low computational cost and memory requirements [34,53].The algorithm has been applied to relate SVIs to various vegetation biophysical traits like LAI [54][55][56].On the other hand, ANN regression comprises of an interconnected group of artificial neurons and processes information using a connectionist approach for computation [57,58].The approach therefore offers a very efficient regression method to simulate the relationship between SVIs and LAI [59][60][61].To the best of our knowledge, no study utilized SVIs calculated from WV-2 data to predict LAI of endangered tree species in tropical indigenous forests.In this study, we tested the utility of SVIs calculated from the multispectral WV-2 data, SVM and ANN machine learning regression algorithms for predicting LAI at tree species level.The LAI of six endangered tree species in the Dukuduku intact and fragmented (open) indigenous forest ecosystems was estimated.We also tested whether there are significant differences between LAI of the six endangered tree species grown in intact and fragmented indigenous forest ecosystems.We further explored the possibility of using the combined data sets across the six endangered tree species on the one hand, and across the two forest strata (i.e., intact and fragmented forests) on the other hand for deriving universal models for predicting LAI.

Study Area
The study was conducted in the Dukuduku indegenous forest which is located on the northern bank of Umfolozi River floodplain, South Africa.The study area stretches between latitute 28 ˝52 1 25 11 S and longitude 32 ˝17 1 23 11 E (Figure 1).The Dukuduku forest area covers more than 6000 hectares (ha) of indigenous coastal forest on the rolling savannahs of the inland across the dune line along the KwaZulu-Natal coast, from southern KwaZulu-Natal province to beyond the Mozambican boundary [2,62].It is considered to be one of the largest remaining stretches of coastal lowland forest in South Africa.However, due to a high number of individual illegal residence and intensive agricultural activities, the natural vegetation surrounding the forest has widely been removed [1,2,62].Similarly, increasing human activities and settlements in the area have led to an increase in ecosystem fragmentation [1,2,62].Therefore, the Dukuduku forest is facing many threats presented by the destruction of indigenous vegetation, forest plantations and agricultural farmlands.Two forest management protocols are practiced in the Dukuduku area: (i) Fragmented (open) forests, which are managed by the local communities and the traditional leaders.These are the forests or woodlands where individual tree crowns do not overlap to form a continuous canopy layer and are widely spaced.(ii) Intact forests, which are managed by the officials (e.g., Department of Forestry) and are defined as areas of land that are occupied by trees of closed canopies (continuous canopy layer).The majority of forests in both the intact and fragmented forests are dominated by various natural indigenous vegetation species.The vegetation species includes different age groups and other types of land use/cover classes.The most dominant tree species in the area include Syzygium cordatum and Cussonia zuluensis.However, it is observed that six other tree species; namely Albizia adianthifolia, Ekebergia capensis, Harpephyllum caffrum, Hymenocardia ulmoides, Sclercarya birrea and Trichilia dregeana in the Dukuduku forest are under severe threat and endangered in both the fragmented and intact forest strata as they face rapid harvesting for woodcarving and traditional medicine [63][64][65].In a previous study, these six endangered tree species were accurately mapped (overall accuracy = 77%) and distinguished from other land use/cover classes in the Dukuduku area [66].It is therefore of some interest to monitor the growth and health of these six endangered tree species through the prediction of key biophysical traits (e.g., LAI).

Sampling Procedure and Field Data Collection
A field campaign was carried out between 1 and 7 December 2013 following stratified purposive sampling method to collect LAI measurements from the six selected endangered tree species (Albizia adianthifolia, Ekebergia capensis, Harpephyllum caffrum, Hymenocardia ulmoides, Sclercarya birrea and Trichilia dregeana).A handheld Leica Geosystem GS20 Geographical Positioning System (GPS) of sub-meter (0-0.25 m) accuracy [67] was used to geo-locate the sample trees.The networks of road and open paths were used to assist in selecting the endangered tree species by walking in various directions in the intact forest.A handheld LAI-2200 plant canopy analyzer was used to estimate LAI of each sample tree under overcast sky conditions at low solar elevation, i.e., around early morning (8:00-10:00 a.m., Greenwich Mean Time: GMT +2) and late afternoon (3:00-6:00 p.m., GMT +2) with 180° view restrictor on the sensor [38,68].To avoid direct sunlight on the sensor, it was required for the operator to take samples of below and above canopy radiation in the opposite direction to the sun.For each sample tree, one above canopy measurement was taken by walking to an adjacent open field.Next, five below canopy measurements were performed on the individual trees at regular space points around each tree diameter from which the average sample

Sampling Procedure and Field Data Collection
A field campaign was carried out between 1 and 7 December 2013 following stratified purposive sampling method to collect LAI measurements from the six selected endangered tree species (Albizia adianthifolia, Ekebergia capensis, Harpephyllum caffrum, Hymenocardia ulmoides, Sclercarya birrea and Trichilia dregeana).A handheld Leica Geosystem GS20 Geographical Positioning System (GPS) of sub-meter (0-0.25 m) accuracy [67] was used to geo-locate the sample trees.The networks of road and open paths were used to assist in selecting the endangered tree species by walking in various directions in the intact forest.A handheld LAI-2200 plant canopy analyzer was used to estimate LAI of each sample tree under overcast sky conditions at low solar elevation, i.e., around early morning (8:00-10:00 a.m., Greenwich Mean Time: GMT +2) and late afternoon (3:00-6:00 p.m., GMT +2) with 180 ˝view restrictor on the sensor [38,68].To avoid direct sunlight on the sensor, it was required for the operator to take samples of below and above canopy radiation in the opposite direction to the sun.For each sample tree, one above canopy measurement was taken by walking to an adjacent open field.Next, five below canopy measurements were performed on the individual trees at regular space points around each tree diameter from which the average sample LAI was calculated.In total, 563 trees were collected from both the fragmented (n = 300) and intact (n = 263) indigenous forest strata (Figure 1).For each endangered tree species, the sample points were 58 and 67 (Albizia adianthifolia), 47 and 37 (Ekebergia capensis), 41 and 44 (Harpephyllum caffrum), 39 and 56 (Hymenocardia ulmoides), 40 and 59 (Sclercarya birrea), and 38 and 37 (Trichilia dregeana) in the intact and fragmented strata, respectively.

Calculating Leaf Area Index from Field Data
For processing and calculating LAI from field data, the FV2200 1.2 software, and the LAI-2200 instrument software were utilized to compute the LAI based on the procedure described by LICOR [69].In a first step, the measurements from the below and above canopy sensors were matched using the closest readings in time.Detector ring 5 (61 ˝-74 ˝) was omitted to reduce the known underestimation of the LAI compared with the real measurements [13,70].We omitted the detector ring 5 because the underestimation of the LAI-2200 plant canopy analyzer increases with increasing zenith angle [71].These software packages offer diverse options for data processing and different inversion algorithms to calculate LAI.We used the horizontal model of FV2200 to calculate the LAI because it considered as an ideal algorithm for calculating LAI of wide and flat canopies like forest trees [30].

Remotley Sensed Data and Pre-Processing
WV-2 image was acquired on the 1 December 2013 under clear-sky conditions.The WV-2 is the first multispectral commercial satellite with eight bands, and senses in the 400-1400 nm spectral range (50-180 nm).The spatial resolution of multispectral bands is 2.0 m along with a panchromatic band of 0.5 m fine spatial resolution with a swath width of 16.4 km at nadir, and an average revisit time of 1.1 days [72].The spectral bands of WV-2 consist of four conventional bands (blue, green, red, and NIR1) and four additional bands (coastal blue, yellow, red edge, and a new NIR-2).Therefore, the satellite has the spectral and spatial resolutions that meet many applications like predicting and monitoring forest structural and biophysical traits at species level [73,74].The WV-2 image was atmospherically corrected and transformed to at canopy reflectance using the Quick Atmospheric Correction (QUAC) procedure in ENVI (Environment for Visualizing Images) 4.7 software [75].QUAC performs in-scene based atmospheric correction at the visible and near-to-shortwave infrared (VNIR-SWIR) regions of the electromagnetic spectrum for multi-and hyperspectral imagery.QUAC determines atmospheric composition parameters directly from the information contained within the image (pixel spectra), thus allowing for the retrieval of accurate reflectance spectra [76].The acquired image was geometrically corrected (Universal Transverse Mercator: UTM zone 36 South and WGS-84 Geodetic datum) by DigitalGlobe™.

Spectral Vegetation Indices (SVIs)
SVIs based on absorption and reflectance in the visible and NIR regions (e.g., NDVI) have been widely used for predicting biophysical traits (e.g., LAI) of agricultural and natural ecosystems [19,30,34,35,77].In this study, after the WV-2 image was processed, 24 SVIs were computed (Table 1) and utilized to predict the LAI of the six endangered tree species.These indices were selected based on previous studies that predicted forest biophysical traits like LAI and biomass [30,[77][78][79][80][81].We investigated the use of all SVIs combined together (n = 24) for predicting the LAI of the endangered tree species in each forest stratum (i.e., intact and fragmented forests).In addition, the data were combined across the six endangered tree species and across the two forests (fragmented and intact) and used to develop universal models that could predict LAI in each of the forest stratum or in the whole study area.The field LAI data were described using the mean and standard deviation (SD) statistics.The data were then tested for normality using the Shapiro-Wilk test [98].An independent t-test was then performed with 95% confidence levels (p ď 0.05) to test if there are significant differences in the endangered tree species LAI between the intact and fragmented indigenous forest strata.An independent t-test is the suitable method for comparing means of two groups (e.g., two forest ecosystems) on a given normally distributed variable (e.g., LAI).

Support Vector Machines (SVM) Regression Algorithm
SVM algorithm, which was invented by Cortes and Vapnik [53], is based on the statistical learning theory and requires no assumption on the distribution of the response variable (e.g., LAI) [34,99].SVMs are very specific learning algorithms characterized by the usage of kernels, absence of local minima, sparseness of the solution and capacity control obtained by acting on the margin, or on a number of support vectors.Originally, SVM was developed to solve the classification problems but it was later extended to handle regression problems [53].The support vector regression (SVR) algorithm converts the nonlinear regression problem into a linear relationship by using the kernel functions to map the original input space into a new feature space with higher dimensions [51].In particular, the SVR aims to estimate an unknown continuous-valued function based on a finite number of noisy samples.Basically, it makes use of structural minimization principle which is known to have good generalization performance for different dataset sizes as opposed to empirical risk minimization implemented by other method like ANN [100,101].Further, the kernel methods of SVR require optimizing only two parameters.These two parameters are ε-insensitive zone (ε) and regularization parameter (C).The accuracy of the SVR is highly dependent on a correct setting of the meta-parameters (ε and C).The parameter ε controls the width of the epsilon-insensitive zone for the calibrating dataset.Hence, the value of ε can affect the number of support vectors used to construct the regression function.In other words, the bigger the epsilon, the fewer support vectors are selected.Conversely, bigger ε values result in more "flat" estimates [52], whereas the parameter C determines the balance between the model complexity and the degree to which the larger deviations (than epsilon) are tolerated in the optimization.Therefore, the larger values of C aim at minimizing the empirical risk regardless of the complexity of the model.In this regard, both C and ε values affect model complexity.A more detailed description of SVR method can be found in Cortes and Vapnik [53], Cherkassky and Ma [52] and Ben-Hur and Weston [102].
In this study, the SVM regression method was used to estimate the LAI of the six endangered tree species using the 24 SVI combined together as predictor variables (n = 24) and to minimize the calibrating errors, the Vapnik's ε-insensitive loss function was employed.In order to project the data into a new space, a radial basis function was used, followed by optimization procedure to find the number of support vectors for the best performance.Moreover, the optimal values of the two parameters C and ε of the radial basis function were obtained using a 10-fold cross validation method and grid search on the calibrating dataset [39,103].The calibrating dataset was divided into 10 subsets of equal size, SVM regression models were then calibrated on the nine subset samples, and tested on the removed one and the process was repeated ten times until all subset samples had served as test samples.The pair parameter that minimizes the prediction error was then considered as the best values for the final prediction performance.The analyses were carried out using the e1071 library version 2.15.2 in R statistical packages [104].

Artificial Neural Networks (ANN) Regression Algorithm
ANN is one of the firstly developed nonparametric machine learning regression techniques.It is a powerful approach that cannot only analyze complex relationships but also does not depend on an assumption of data normality [58,105].The ANN is a mathematical model that simulates the structural and functional aspects of biological neural connections.It consists of an interconnected group of artificial neurons and processes information using a connectionist approach for calculation [57].
Several models of ANN such as radial basis function, back propagation and multilayer perceptron have been applied for analyzing remotely sensed data for a variety of applications like forestry and ecological modeling [106][107][108].Radial basis function neural network has proved to be a good function for analyzing a wide variety of remotely sensed data since it reduces the computational time required for the calibrating process [10,105].The approach requires one input variable, which is the "distance" between the weight and input nodes.The back propagation is the multi-layer feed forward neural networks method which comprises of a series of simple connected neurons, or nodes between input and output layers [58].On the other hand, the multilayer perceptron is a commonly used ANN structure that consists of an input layer, an output layer and one or more hidden layers of nonlinearly-activating nodes [58,109].The nodes are connected by a certain synaptic weight to all nodes in the next layer and the perceptron learning occurs through the changes in the connections weights after the input protector (SVIs) are processed [109].The multilayer perceptron is a feed forward ANN model that projects input data onto a set of suitable output by using three or more layers of nodes with nonlinear activation functions [58,110].ANN has been widely used in modeling vegetation and tree species traits that are not linearly predictable in the original remotely sensed variables [41,111].
In this study, ANN regression algorithm was employed using a feed-forward multilayer perceptron neural networks with error back propagation modeling approach to estimate the LAI of the six endangered tree species using all the 24 SVI presented in Table 1 combined together as predictor variables.In particular, the information in the neural networks moves forward from one layer to the next to compute the output, and the error is propagated back from output to input layer to adjust the weights of the connection and the biases so as to minimize the mean square error of prediction.Many trials of internal network structure, input data, and learning approaches have been tested to define the optimal regression features based on the method described by Yin et al. [112].The optimum number of nodes for each trail was tested to assess the necessary number of hidden layers and the number of required nodes per layer.This was tested by manually changing the number of nodes in the hidden layer.In this study, the base networks consisted of two hidden layers and the number of training iterations was set to a default value of 1000 [113].The number of neurons in the hidden layer and learning rate parameters were optimized using a trial and error basis.Firstly, the network was run with a fixed learning rate of 0.01, with the number of neurons in the hidden layer changing from 1 to 5. Secondly, the network performance was assessed using the root mean square error and coefficient of determination metrics.The number of neurons that produced the lowest root mean square error and highest coefficient of determination were selected as optimal neurons.

Validation
To validate the performance of the SVM and ANN regression algorithms, the reference data were randomly split into 70% (210 and 184 for the intact and fragmented strata, respectively) for calibration and 30% (90 and 79 for the intact and fragmented strata, respectively) for validation based on the recommendation by Adelabu et al. [114].Moreover, the calibration dataset was used for optimizing the SVM and ANN regression algorithms, whereas the validation dataset was used to examine the performance and reliability of the prediction models.One-to-one relationships between the measured and predicted LAI values were fitted and coefficient of determination (R 2 ), root mean square errors (RMSE), and bias metrics were then calculated.The RMSE provides direct estimates of the modeling errors expressed in the original measurement units, the lower value of RMSE indicates a good predictive model performance [100].

Descriptive Statistics and an Independent t-Test
A Shapiro-Wilk normality test showed that the LAI data for the six endangered tree species grown in the fragmented and intact indigenous forest strata were normally distributed (p = 0.032 for the intact forest stratum and p = 0.044 for the fragmented forest stratum).Figure 2 shows the descriptive statistics of the LAI for the six endangered tree species in the fragmented and intact indigenous forest strata.The result of the independent t-test showed that fragmented forest obtained significantly higher (p ď 0.05) mean LAI compared to the intact indigenous forest strata (Figure 2).With regard to the individual tree species, there is a great variability in LAI among the species and the highest LAI mean values were achieved for Albizia adianthifolia (4.19) and Trichilia dregeana (3.94) in intact forest stratum, while the lowest mean values were obtained for Albizia adianthifolia (2.03) in fragmented forest stratum (Figure 2).Furthermore, the descriptive statistics of the combined (aggregated) LAI across the six endangered tree species in the intact and fragmented forest ecosystems are also shown in Figure 3.The figure also shows a significant difference (p ď 0.05) in combined trees LAI between the fragmented and intact forests.2).Furthermore, the descriptive statistics of the combined (aggregated) LAI across the six endangered tree species in the intact and fragmented forest ecosystems are also shown in Figure 3.The figure also shows a significant difference (p ≤ 0.05) in combined trees LAI between the fragmented and intact forests.

Support Vector Machines (SVM) and Artificial Neural Networks (ANN) Regression Models
Since the SVIs dataset achieved the best results, the study has only reported for both calibration and validation using combined SVIs dataset.Table 2 shows the optimum parameters for both the SVM and ANN regression methods.The 10-fold cross validation method and grid search approaches resulted in optimal ε and C values of 1 and 100, respectively, for all endangered tree species in the two forest strata, except for the Albizia adianthifolia and Sclercarya birrea in fragmented forest stratum (1 and 1000), and Hymenocardia ulmoides, Sclercarya birrea and Trichilia dregeana in intact forest stratum (1 and 10).Similarly, the results showed optimal SVM parameters of 1 for ε and 100 for C when the combined data were utilized.The table also shows that the input layers for ANN regression method ranged between 1 and 6 for the six endangered tree species in the fragmented forest stratum and for the combined data and between 4 and 6 for the species in the intact forest stratum and for the combined data.The hidden layers were varied between 2 and 5 for the fragmented and the combined data and 4 and 9 for the intact forest stratum.

Support Vector Machines (SVM) and Artificial Neural Networks (ANN) Regression Models
Since the SVIs dataset achieved the best results, the study has only reported for both calibration and validation using combined SVIs dataset.Table 2 shows the optimum parameters for both the SVM and ANN regression methods.The 10-fold cross validation method and grid search approaches resulted in optimal ε and C values of 1 and 100, respectively, for all endangered tree species in the two forest strata, except for the Albizia adianthifolia and Sclercarya birrea in fragmented forest stratum (1 and 1000), and Hymenocardia ulmoides, Sclercarya birrea and Trichilia dregeana in intact forest stratum (1 and 10).Similarly, the results showed optimal SVM parameters of 1 for ε and 100 for C when the combined data were utilized.The table also shows that the input layers for ANN regression method ranged between 1 and 6 for the six endangered tree species in the fragmented forest stratum and for the combined data and between 4 and 6 for the species in the intact forest stratum and for the combined data.The hidden layers were varied between 2 and 5 for the fragmented and the combined data and 4 and 9 for the intact forest stratum.
Table 2.The optimal parameters for the best calibrated SVM and ANN regression models used for predicting the LAI of the six endangered tree species in the fragmented and intact indigenous forest strata.

Model Validation
Figures 4-7 show the one-to-one relationships between the measured and predicted LAI for all models developed in this study.When the performance of the SVM prediction models were assessed, the results showed that LAI could be better estimated for the Hymenocardia ulmoides grown in the fragmented forest ecosystem as indicated by the relatively higher R 2  Val , and lower error metrics (Figure 4).For the intact forest (Figure 5), the best model was achieved for predicting the LAI of the Albizia adianthifolia (R 2 Val = 0.80 and RMSE Val = 2.10% of the mean).The slope in all other predictive models deviated from the expected one-to-one relationship and the models either overestimated or underestimated the LAI measurements.On the other hand, the best ANN regression model was achieved for predicting the LAI of the Hymenocardia ulmoides (Figure 6) in the fragmented forest stratum (R 2 Val = 0.71 and RMSE Val = 1.52% of the mean) and for Harpephyllum caffrum (Figure 7) in the intact forest stratum (R 2 Val = 0.71 and RMSE Val = 1.57% of the mean).The results also showed unreliable and inaccurate models for predicting LAI of the combined six endangered tree species in the fragmented (RMSE Val = 24.00%and 24.96% of the mean for SVM and ANN, respectively) and intact (RMSE Val = 21.07%and 26.04% of the mean for SVM and ANN, respectively) forests (Figure 8), and combined data across the six endangered tree species and the two forest ecosystems (a universal model RMSE Val = 21.11% and 24.09%. of the mean for SVM and ANN, respectively) (Figure 9).It is interesting to note that most of the SVM models developed using the fragmented data overestimated the LAI in all tree species, except for Sclercarya birrea.

WorldView-2 Image Potential in Predicting LAI of Endangered Tree Species
Our study shows that LAI at individual species level can accurately be estimated in fragmented and intact forest ecosystems.This result is in consistency with other studies that demonstrated the

WorldView-2 Image Potential in Predicting LAI of Endangered Tree Species
Our study shows that LAI at individual species level can accurately be estimated in fragmented and intact forest ecosystems.This result is in consistency with other studies that demonstrated the

WorldView-2 Image Potential in Predicting LAI of Endangered Tree Species
Our study shows that LAI at individual species level can accurately be estimated in fragmented and intact forest ecosystems.This result is in consistency with other studies that demonstrated the utility of WV-2 in predicting LAI at different spatial scales of a landscape [14,19,77].These findings therefore support the assertion that the potential utility of WV-2 spectral variables is improved predictions accuracy of vegetation biophysical traits such as LAI in indigenous ecosystems [14,19,30,77,115].The successful use of WV-2 data for predicting LAI at tree species level could be due to the fine spatial resolution (2 m) that is required to capture the spectral properties of each individual tree species.This is in confirmatory with the finding of other studies like Pu and Cheng [85] who found that LAI predictive models generated using WV-2 data in a mixed forest ecosystem performed better than those derived using the relatively low spatial resolutions Landsat 5 TM data.
Although all 24 SVIs (Table 1) combined together were utilized to predict LAI of the endangered tree species, we hypothesized that the red edge band, which was included in some of the SVI, could have enhanced the performance of the LAI predictive models.The red edge band, the inflection point in the slope that connects the reflectance in the red and in the NIR spectral range [116][117][118], is more sensitive to vegetation biophysical traits like chlorophyll content as compared to other regions of the electromagnetic spectrum [19,30,119].It is found that the vegetation indices computed from the red edge and NIR have relatively stronger correlation with LAI in different landscapes.Chlorophyll content can be one of the vegetation biochemical that has a direct relationship with LAI [116,[118][119][120].In general, our finding is in conformity with Mutanga et al. [115] who concluded that the vegetation indices derived from WV-2 data involving the uniquely located red edge band can improve the prediction accuracy of vegetation biophysical traits (e.g., biomass) compared with the indices that only include conventional bands.

Predicting LAI of Endangered Tree Species in Frgamented and Intact Forests
The tree LAI in the fragmented forest (open) was significantly higher than that in the intact forest.The predictive models for predicting LAI in fragmented forest (Figures 4 and 6) outperformed those for predicting LAI in the intact forest (Figures 5 and 7).In the intact forest, the target endangered tree species could have been mixed with other tree species within the sample tree and that might have resulted in mixed spectral characteristics (SVI).Hence, the mixed spectral features in the intact forest might have hindered the performance of the LAI predictive models.Moreover, in a few cases in the intact forest and due to the difficulty of taking measurements close to the stem area of the individual trees, we collected LAI measurements from sites where two or more tree species were overlapped.The spectral features from the overlapped sites could have also resulted in mixed spectral features due to different trees structural and biophysical traits [66].It is also interesting to note that in intact forest stratum, we sampled trees along the roads and open paths, hence it is expected that one side of trees could have received more sunlight than other side of the trees.Since LAI is a light-dependent biophysical trait, the variation in light along trees side could have confounded the prediction of LAI in intact forest and the performance of the models developed when intact forest data were utilized.
In general, the results of this study showed unreliable and inaccurate models for predicting LAI in the two forests (fragmented and intact) when the data were pooled across the six endangered tree species.Likewise, inaccurate universal LAI prediction model was obtained when the data were combined across the six endangered tree species and the two forests (fragmented and intact).That is expected, since our six endangered tree species have distinguishable spectral features [66] that would have confounded the establishment of accurate LAI models across the tree species.The future studies aiming at predicting LAI in our study area or other areas with similar conditions should classify areas into different tree species and separate between the fragmented and intact forests when WV-2 spectral variables and SVM as well as ANN regression algorithms are employed.

Support Vector Machines (SVM) and Artificial Neural Networks (ANN) Regression Models
We utilized two optimized learning nonlinear regression methods (SVM and ANN) to predict LAI in Dukuduku indegenous forest at tree species level.Tree biophysical traits in such a complex, and dynamic natural ecosystem might not possibly be modeled using a linear relationship.The nonelinear SVM and ANN regresion approaches explained the high varaibility in the trees LAI in the complex Dukuduku landscape and resulted in predictive models of relatively high accuracy.We also parametrized the two regression apprasoches to get the best meta parameters for predicting LAI [52,121].The results showed that different optimal parameters were required to estimate LAI in the fragmented and intact forest ecosystems.This was expected since we employed empirical statistical approaches for deriving the predictive models under two different forest ecosystems.This result is in conformity with other studies that reported different optimal settings for SVM and ANN under different levels and complexities of landscapes [54,68,101,121,122].
Furthermore, our study shows that the LAI predictive models derived using SVM regression performed relatively better than those derived using ANN regression.Other studies also noted the superiority of SVM models in predicting forests and crops LAI [19,30,101].The superiority of SVM models in predicting endangered tree species LAI when compared with the ANN models could also be due to the fact that SVM regression usually makes use of structural minimization principle, which is known to have the ability to produce accurate predictive models [51,100,101].Meanwhile, ANN regression approach employs model functions like radial basis function that are relatively biased when used with input remotely sensed variables and can deviate from what has been presented during the calibrating stage [21,60,123].Furthermore, ANN regression is often referred to as a black-box technique that could encounter an overfitting problem on the test dataset [124,125].ANN also requires a relatively long processing time during the calibrating phase due to manual adjustments of the hidden layers nodes.However, SVM was optimized using a 10-fold cross validation method, while ANN optimal parameters were obtained using a trial and error approach.Further studies should employ the same method to calibrate and optimize SVM and ANN regression methods when are compared for their performance in predicting forest biophysical traits.
Overall, our results are promising for accurate prediction of LAI at tree species level in Dukuduku forest ecosystem.However, our results should be interpreted with some caution as we used snapshot data at specific environmental conditions and forest ecosystems.Further studies should explore the transferability of the present models to other points in space or time.Our LAI estimates should also be utilized to study and model other forest biophysical (e.g., biomass, net primary productivity) and metro-physiological (e.g., evapotranspiration) traits using process-based physical models.

Conclusions
This study shows a successful application of high spatial resolutions WV-2 spectral variables and the machine learning SVM and ANN regression methods for predicting LAI of six endangered tree species in fragmented and intact Dukuduku indigenous forest ecosystems in South Africa.Our results showed that 60% (R 2 Val > 0.60) of the variation in LAI of the endangered tree species could be explained by the predictive models when data in the fragmented forest ecosystem were utilized.On the other hand, the results showed that a maximum R 2  Val of 0.80 could be obtained for predicting the LAI of the endangered tree species in intact forest ecosystem.In general, LAI predictive models developed using the fragmented forest data performed more accurately (RMSE val ranged between 1.37% and 14.72% of the mean) compared with the models developed using the intact forest data (RMSE val ranged between 1.57% and 5.85% of the mean) and SVM regression approach achieved relatively more accurate LAI prediction models compared with ANN regression.
Overall, the successful application of the WV-2 data, SVM and ANN for predicting LAI of six endangered tree species in the Dukuduku indigenous forest could help in making informed decisions and policies regarding management, protection and conservation of these endangered tree species.The findings of this study, however, provide the necessary insight and motivation to the remote sensing community, ecologists and forest managers to shifting toward identifying the most suitable and readily available remote sensing sensors necessary for reliable and accurate indigenous forest monitoring especially in a fragmented ecosystem.

Figure 1 .
Figure 1.The location of the Dukuduku indigenous forest in KwaZulu-Natal Province, South Africa and field sample locations overlaid in a true-color WorldView-2 image.

Figure 1 .
Figure 1.The location of the Dukuduku indigenous forest in KwaZulu-Natal Province, South Africa and field sample locations overlaid in a true-color WorldView-2 image.

Figure 2 .
Figure 2. Descriptive statistics of the measured LAI of the six endangered tree species in both the intact (I) and fragmented (F) forest ecosystems.LAI observations for each tree species in the both forest ecosystems (I and F) with a different letter are significantly different (p ≤ 0.05) from each other.

Figure 2 .
Figure 2. Descriptive statistics of the measured LAI of the six endangered tree species in both the intact (I) and fragmented (F) forest ecosystems.LAI observations for each tree species in the both forest ecosystems (I and F) with a different letter are significantly different (p ď 0.05) from each other.

Figure 3 .
Figure 3. Descriptive statistics of the measured LAI of the combined (aggregated) six endangered tree species datasets in both the intact (I) and fragmented (F) forest ecosystems.LAI data with a different letter are significantly different (p ≤ 0.05) from each other.

Figure 3 .
Figure 3. Descriptive statistics of the measured LAI of the combined (aggregated) six endangered tree species datasets in both the intact (I) and fragmented (F) forest ecosystems.LAI data with a different letter are significantly different (p ď 0.05) from each other.
the six tree species and the two forest ecosystems 1.0

Figure 4 .
Figure 4. One-to-one relationships between measured and predicted LAI based on an independent validation dataset (30%) using support vector machines (SVM) regression algorithm and fragmented indigenous forest data.

Figure 4 .
Figure 4. One-to-one relationships between measured and predicted LAI based on an independent validation dataset (30%) using support vector machines (SVM) regression algorithm and fragmented indigenous forest data.

Figure 5 .
Figure 5. One-to-one relationships between measured and predicted LAI based on an independent validation dataset (30%) using support vector machines (SVM) regression algorithm and intact indigenous forest data.

Figure 5 .
Figure 5. One-to-one relationships between measured and predicted LAI based on an independent validation dataset (30%) using support vector machines (SVM) regression algorithm and intact indigenous forest data.

Figure 6 .
Figure 6.One-to-one relationships between measured and predicted LAI based on an independent validation dataset (30%) using artificial neural networks (ANN) regression algorithm and fragmented indigenous forest data.

Figure 6 .
Figure 6.One-to-one relationships between measured and predicted LAI based on an independent validation dataset (30%) using artificial neural networks (ANN) regression algorithm and fragmented indigenous forest data.

Figure 7 .
Figure 7. One-to-one relationships between measured and predicted LAI based on an independent validation dataset (30%) using artificial neural networks (ANN) regression algorithm and intact indigenous forest data.

Figure 7 .
Figure 7. One-to-one relationships between measured and predicted LAI based on an independent validation dataset (30%) using artificial neural networks (ANN) regression algorithm and intact indigenous forest data.

Figure 8 .
Figure 8. One-to-one relationships between measured and predicted LAI based on an independent validation dataset (30%) using: (a) combined fragmented forest data with support vector machines; (b) combined fragmented forest data with artificial neural networks; (c) combined intact forest data with support vector machines; and (d) combined intact forest data with artificial neural networks.

Figure 9 .
Figure 9. One-to-one relationships between measured and predicted LAI based on an independent validation dataset (30%) using combined data across the six tree species and the two forest ecosystems (universal model) with: (a) support vector machines; and (b) artificial neural networks.

Figure 8 .
Figure 8. One-to-one relationships between measured and predicted LAI based on an independent validation dataset (30%) using: (a) combined fragmented forest data with support vector machines; (b) combined fragmented forest data with artificial neural networks; (c) combined intact forest data with support vector machines; and (d) combined intact forest data with artificial neural networks.

Figure 8 .
Figure 8. One-to-one relationships between measured and predicted LAI based on an independent validation dataset (30%) using: (a) combined fragmented forest data with support vector machines; (b) combined fragmented forest data with artificial neural networks; (c) combined intact forest data with support vector machines; and (d) combined intact forest data with artificial neural networks.

Figure 9 .
Figure 9. One-to-one relationships between measured and predicted LAI based on an independent validation dataset (30%) using combined data across the six tree species and the two forest ecosystems (universal model) with: (a) support vector machines; and (b) artificial neural networks.

Figure 9 .
Figure 9. One-to-one relationships between measured and predicted LAI based on an independent validation dataset (30%) using combined data across the six tree species and the two forest ecosystems (universal model) with: (a) support vector machines; and (b) artificial neural networks.

Table 1 .
Summary of the WorldView-2-derived spectral vegetation indices (SVIs) used in this study.

Table 2 .
The optimal parameters for the best calibrated SVM and ANN regression models used for predicting the LAI of the six endangered tree species in the fragmented and intact indigenous forest strata.

Table 3 .
Coefficient of determination (R 2 Cal ) and root mean square errors (RMSE Cal ) for the SVM and ANN regression models when calibrated using the data collected from the fragmented and intact forest strata.