Towards Forest Condition Assessment: Evaluating Small-Footprint Full-Waveform Airborne Laser Scanning Data for Deriving Forest Structural and Compositional Metrics

: Spatial data on forest structure, composition, regeneration and deadwood are required for informed assessment of forest condition and subsequent management decisions. Here, we estimate 27 forest metrics from small-footprint full-waveform airborne laser scanning (ALS) data using a random forest (RF) and automated variable selection (Boruta) approach. Modelling was conducted using leaf-off (April) and leaf-on (July) ALS data, both separately and combined. Field data from semi-natural deciduous and managed conifer plantation forests were used to generate the RF models. Based on NRMSE and NBias, overall model accuracies were good, with only two of the best 27 models having an NRMSE > 30% and/or NBias > 15% (Standing deadwood decay class and Number of sapling species). With the exception of the Simpson index of diversity for native trees, both NRMSE and NBias varied by less than ± 4.5% points between leaf-on only, leaf-off only and combined leaf-on/leaf-off models per forest metric. However, whilst model performance was similar between ALS datasets, model composition was often very dissimilar in terms of input variables. RF models using leaf-on data showed a dominance of height variables, whilst leaf-off models had a dominance of width variables, reiterating that leaf-on and leaf-off ALS datasets capture different aspects of the forest and that structure and composition across the full vertical proﬁle are highly inter-connected and therefore can be predicted equally well in different ways. A subset of 17 forest metrics was subsequently used to assess favourable conservation status (FCS), as a measure of forest condition. The most accurate RF models relevant to the 17 FCS indicator metrics were used to predict each forest metric across the ﬁeld site and thresholds deﬁning favourable conditions were applied. Binomial logistic regression was implemented to evaluate predicative accuracy probability relative to the thresholds, which varied from 0.73–0.98 area under the curve (AUC), where 11 of 17 metrics were >0.8. This enabled an index of forest condition (FCS) based on structure, composition, regeneration and deadwood to be mapped across the ﬁeld site with reasonable certainty. The FCS map closely and consistently corresponded to forest types and stand boundaries, indicating that ALS data offer a feasible approach for forest condition mapping and monitoring to advance forest ecological understanding and improve conservation efforts.


Introduction
The spatial data and analysis requirements for modern environmental monitoring and management are increasingly high [1,2]. Forest locations, in particular, are subject to a number of management pressures and external threats [3][4][5], being exploited as a resource for a range of cultural and economic activities, as well as being important habitats for a variety of organisms and storing substantial amounts of above-ground carbon [6]. All of ysed the relationships between plant species diversity and ALS measurements [25,54,55]. Radiometric information recorded by ALS, commonly referred to as intensity and recorded from an infrared laser, has proven sufficient to classify species in some cases [56], however, differences in acquisition specifications and sensor design seriously impact performance. At the very least, classification generally between deciduous and coniferous vegetation is possible (e.g., [56,57]). The use of intensity has, however, remained contentious.
ALS return intensity (also referred to as amplitude) presents a number of challenges for its use in any analysis, given the nature of typical sensor design. According to [42], it is not possible to compare any two discrete-return intensity values. The research presented in [58] states that intensity data can vary in performance between sensors when used for species classification. In addition, sensors using different wavelengths can also preclude comparison [59]. Recent work on normalising intensity has been proposed, and the results seem promising (e.g., [56]). Other studies have combined ALS datasets with multi-or hyper-spectral data which have allowed the classification of overstorey species [60,61]. Alternatively, general species characteristics have been correlated with ALS derived metrics, such as mean vegetation return height and the diversity of forest species or land cover (e.g., [62][63][64]).
In general, the area-based statistical models developed for a given dataset are not transferable between acquisitions. For example, in [65] metrics estimated with models calibrated using leaf-on field and ALS data produced erroneous results when applied to leaf-off ALS data, implying there is a difference in the 3D distribution of returns when considering multi-temporal analysis. The combination of datasets from different timepoints (e.g., summer and winter) has the potential to yield additional information. Studies have used multiple acquisitions of airborne ALS for the same site at different times in order to exploit the seasonality of different forest species, to estimate understorey presence [66] or to compare the accuracy of terrain mapping [67]. For the latter study, a greater proportion of the returns were from lower portions of the forested area when assessed with leaf-off data than with leaf-on data, due to the absence or presence of foliage. The research outlined in [68] combined leaf-on and leaf-off datasets and was able to exploit the difference in vegetation structure and return intensity between the acquisitions to classify tree species. The authors of [69] stated that above-ground biomass estimates were similar between leaf-on and leaf-off ALS data, but that stratification by species type improved estimates. In [70], a combination of leaf-on and leaf-off data was shown to model diameter at breast height (DBH) and diversity in crown dimensions more accurately, whilst models derived from leaf-on only data performed the poorest in terms of accuracy. Conversely, [71] stated that differences in canopy conditions manifested in leaf-on or leaf-off datasets have an insignificant impact on estimates of above-ground biomass. The model strength was instead dependent on environmental conditions and the modelling method implemented. The authors of [55] estimated 23 forest structural or compositional forest metrics using a combination of leaf-on and/or leaf-off metrics, where only ten of these were best estimated using a combination of both datasets. This implies that there are potentially significant changes in vegetation structure captured by acquisitions from different dates, however, some metrics can be estimated from either.
A more recent development of ALS is full-waveform (FW). This sensor type, as opposed to more conventional discrete-return ALS, provides connected profiles of the three-dimensional scene per pulse, which potentially contain more detailed information about the structure of the illuminated surfaces [72]. Additional processing is required to provide a conventional 3D point cloud. Such a sensor design can potentially return more information from below the canopy [73] and higher return densities than conventional discrete-return ALS, as noted in [74], which is of interest in forest studies [75][76][77]. From the research presented in [55], the analysis of full-waveform (FW) airborne ALS data can yield estimates of both compositional and structural metrics of substantial potential for assessing forests for a variety of possible applications.
Thus, utilising FW ALS data and a combination of leaf-on and leaf-off acquisitions could conceivably yield the highest potential for estimating forest metrics for forest condition assessment. The goal of the current research is to evaluate the capabilities of FW ALS to provide a range of forest structural and compositional metrics across the full vertical profile and to assess the accuracy of these estimates for forest condition monitoring. This research will assess the potential utility that ALS acquisition can provide in future forest monitoring applications, specifically for forest contexts in southern Britain. The field site comprises both semi-natural and managed plantation forests in close proximity. Twentyseven field metrics for the semi-natural forest have been defined for this location from previous research, which is outlined in [11,55]. The objectives of the current research were therefore to: (i) estimate the accuracy of these structural and compositional metrics across a range of forest types; (ii) evaluate if there are any benefits to the use of leaf-on, leaf-off or combined ALS acquisitions in the estimation of forest metrics; (iii) compare the relative merit of height, amplitude and width ALS metrics in derived models; and (iv) assess the use of a subset of these metrics for forest favourable conservation status (FCS) monitoring.

Study Site
The study site is located within the New Forest National Park in the south of the UK (50 • 50 N; 1 • 30 W). Many land cover types exist in close proximity and are managed for multiple land uses. Much of the forested areas are managed plantations, although semi-natural forest also exists [78]. The analysis was focused on a ca. 22 km 2 area in the proximity of the towns of Brockenhurst, Beaulieu and Lyndhurst. The underlying terrain within this site varied only gently, with typical elevations occurring between 5 m and 45 m above sea level. A number of the plantation forest locations are managed enclosures in order to reduce ungulate browsing. Unenclosed areas are not subject to felling operations and are permanently open to grazing by large ungulates (e.g., ponies and deer). Enclosed areas are typically located in plantation coniferous forest.
The area contains several types of semi-natural and plantation coniferous and deciduous forests (as stated in [79]), which present a wide range of available structural and compositional variation, such as canopy gaps and the presence of deadwood or understorey. Deciduous species include: oaks (Quercus robur and Quercus petraea), common beech (Fagus sylvatica), common alder (Alnus glutinosa), silver birch (Betula pendula), sweet chestnut (Castanea sativa) and holly (Ilex aquifolium). Coniferous species include: Corsican pine (Pinus nigra var. maritime), Scots pine (Pinus sylvestris), Douglas fir (Pseudotsuga menziesii) and Norway spruce (Picea abies). The distribution of the broad habitat types as classified by the Forestry Commission are summarised in Figure 1.

Definition of Field Metrics
A number of field-based indicators of FCS were defined in [11] for the New Forest (Table 1). They identified 17 ecological indicators that related to structure and composition, deadwood, tree regeneration and ground vegetation (from: [80][81][82][83]). These indicators and an additional 10 metrics were recorded as part of a later study [55] by a plot-based method (Table 1). A summary of field measured ranges for each of the 27 indicator metrics is presented in Table S1.  Table 1. List of key ecological factors recorded for each of the field plots installed within the New Forest field site. The acronym DBH stands for diameter at breast height. The indicators marked with an * were used as determinants of 'favourable conservation status' following from [11].

Key Ecological Factor Indicator
Forest canopy structure

Field Data Collection
Using pre-existing forest inventory data (made available from the Forestry Commission and described in [55]), the woodland areas of the study site were split into coniferous, deciduous and mixed woodland compartments and stratified according to forest inventory information. A total of 41 field plots were then located randomly across this stratification to enumerate a range of forest types and canopy conditions. An initial 21 plots were visited in the summer (June to September) of 2010, with the remaining 20 plots visited in a further field campaign in the summer of 2012 (June to October). Each of the field plots was established at a minimum of 10 m away from a stand boundary or any non-forested areas in order to reduce any potential edge effects. Plots in coniferous stands totaled 20, whilst 16 were located in deciduous stands, and 5 plots were located in mixed stands.
We installed 41 north-oriented square plots sized 30 m × 30 m, with a square 10 m × 10 m subplot in the south-west corner. Field plot corners were located using a combination of Leica GPS 500 (Leica Geosystems Ltd, Milton Keynes, UK) and Sokkia 6F total station (Sokkia Topcon Company Ltd, Kangawa, Japan). Leica Geo-office software (version 8.2) was used for post-processing tasks.
Total horizontal positional error was calculated as ≤0.08 m. A total of 27 metrics (as in Table 1) were recorded in each of the 41 field plots. Plot-level totals and averages were calculated for each field recorded metric. Within each plot, tree stems with a diameter at breast height (DBH) greater than 10 cm were considered as a tree. Where appropriate, DBH was measured at a height of 1.4 m above ground. Measurements per tree included: stem DBH, canopy top height (m) and species (with Scots pine, common alder, oak, beech, silver birch, holly and sweet chestnut considered as native or naturalised species). Vertical height measurements were calculated using a clinometer. Plot-level basal area was calculated by summing the area of a circle calculation applied to each DBH measurement. The number of saplings and their species types were recorded within each field plot. Saplings were defined as tree stems > 1.3 m in height with DBH < 10 cm. The total number of seedlings and their species types were also recorded within the 10 m × 10 m subplot. Seedlings were defined as tree stems < 1.3 m in height. The number of ground vegetation (i.e., vascular plant) species was also recorded within the subplot. In the field, height to the live crown (HTLC) was defined as the distance between the ground and the lowest live branch, and measured using a clinometer. Crown diameter was recorded for the horizontal projection north to south and then east to west. Measurements were made at ground level with a tape held at the edge of the crown being measured. The edges of the crown were defined as the perimeter of the crown that was visible and identifiable from the ground directly below. Crown horizontal area was calculated as the area of an ellipse.
The Shannon-Wiener index (SH) [84] for all native trees and saplings was calculated as: where p i = the proportion of individuals (plot stem number) in i = 1, . . . , n, where n is the number of species. The Simpson index [85] is expressed as: where x i is the relative abundance of the i-th species. Each of the standing deadwood items, or snags, within a field plot was recorded. Snags were defined as standing deadwood > 10 cm DBH [86]. Snag volume was calculated using the formula for determining cylindrical volume using height and DBH measurements. Downed deadwood (DDW) was defined as deadwood logs or branches of at least 10 cm diameter lying on the ground [86]. Measurements for DDW were made in the 10 m × 10 m subplot only. Length and girth around the two ends of the log were recorded. Estimates of DDW volume were determined using the equation for a frustum of a cone. Deadwood decay states for snags and DDW were divided into three decay classes according to the following criteria, as defined in [11]: (i) logs with a low decay state, no surface breakdown, bark still intact, wood structure firm; (ii) logs with a moderate decay state, with some surface breakdown, wood structure weaker but bole mostly sound; and (iii) logs with high decay state, extensive surface breakdown, bark mostly absent, bole with no sound wood present and colonised with vegetation. A size-weighted average decay class score was then calculated at the plot level, and rescaled to a value between 0 and 1.

Airborne Laser Scanner Datasets
Small-footprint FW ALS data were acquired for the study area in 2010 under leaf-off (8 April) and leaf-on (6 July) conditions. The ALS instrument used was the Leica ALS50-II airborne laser scanner. On both dates, the ALS data were acquired at a flying altitude of 1600 m, with a pulse repetition frequency (PRF) of 147 kHz, a beam divergence of 0.

Airborne Laser Scanner Processing
The FW ALS data (supplied in LAS version 1.3 format) required pre-processing steps in order to derive a conventional point cloud. The Sorted Pulse Data library (SPDlib) (version 1.0.0) [87,88] was used to perform Gaussian decomposition upon each of the returned waveforms to derive a point cloud. Between 1 and 10 returns per pulse were derived using this approach. In addition to elevation, each return had an associated value relating to waveform amplitude (or intensity) and peak-width. Ground elevation returns were classified through a progressive morphological filter within SPDlib as outlined in [89]. Above-ground heights were then calculated by subtracting ALS elevation returns from an interpolated terrain surface created from ground returns (using the nearest neighbour interpolation method). Intensity/amplitude range normalisation was implemented using the method documented in [90].
In order to derive ALS variables from the field plot areas, all ALS returns which lay within field plot boundaries were extracted from the datasets. Nine key variables were generated from the complete vertical distribution of the return height, amplitude and width values: mean, median, minimum, maximum, standard deviation (StDev), variance, absolute deviation (AbsDev), skewness and kurtosis. These metrics were selected for comparison to previous work [55]. Given the potential importance of amplitude and width metrics from returns near ground (<0.2 m; as in [55]), the nine variables for both were also calculated for all returns below 0.2 m height above ground. Percentiles were calculated using all returns from the complete vertical distribution of return heights, plus amplitude and width data, every 5th percentile (excluding the 50th and 100th percentiles as these are the median and maximum, respectively). In addition, dominant height was calculated as the arithmetic mean of heights of returns ≥80th percentile (as in [91]), and canopy cover (CC) was calculated as follows: where R >T denotes those returns above a threshold height T, and R all is all returns. Here, T was set to 1 m height above ground. These variables were generated for both leaf-on and leaf-off datasets, resulting in 99 metrics for each (198 in total).

Statistical Analysis
A random forest (RF) modelling approach was selected in order to limit the potential issues caused by high correlations between many of the ALS metrics [92,93]. In addition, linear regression methods can have difficulty in structurally complex forest when compared to machine learning approaches [94].
The RF approach [100] is a widely used machine learning technique using an ensemble method to perform classification or regression tasks (e.g., [101][102][103]). Multiple RF models were developed for each of the 27 indicator metrics. This was carried out for each ALS dataset: (i) leaf-on only; (ii) leaf-off only; and (iii) a combination of leaf-on and leaf-off (i.e., 27 models per dataset). Analyses were performed for all 41 field plots for each of the three datasets irrespective of dominant tree species type (i.e., deciduous, coniferous or mixed). No dedicated validation dataset was used here. A lower number of variables relative to the number of observations will reduce the likelihood of overfitting, thus the Bortua R package was used as a means to automatically select important variables. This package is a wrapper built around the RF algorithm [95]. Briefly, the approach duplicates and randomly reorders the training dataset and implements an RF classifier. This approach was repeated 500 times, and the predictor accuracy/impurity was assessed through the calculation of a Z-score. Comparisons of every iteration select a variable as important if it outperforms its duplicate. At the end of this process, the seven most important predictor variables (if available) were used as input into the RF model.
Tuning and construction of the RF model was implemented using the caret R package. A 10-fold cross-validation control was used in all permutations here. Tuning consisted of three elements evaluated in sequence: (i) MTRY: the number of variables randomly sampled as candidates at each branch split; (ii) maxnodes: the maximum number of terminal nodes (limiting the potential RF tree size); and (iii) ntrees: the number of trees to grow. The ranges of values tested as part of the tuning process were: 1-20; 3-20; and 250-2000, respectively. The optimal value for each of the three values was taken to be the one which produced the lowest root mean square error (RMSE) and mean average error (MAE), in addition to the highest adjusted coefficient of determination (R 2 ). Once the three tuning parameters had been determined, the RF model was then run using the predictor variable(s) identified through the variable selection procedure.
The predicted values from the statistical models were quantified by calculating the adjusted coefficient of determination (adj. R 2 ), absolute and normalised root mean square error (RMSE and NRMSE, respectively) and absolute and normalised bias (Bias and NBias). These functions are expressed as: where y is the predicted value, x is the observed field measured value for the individual plot i, n is the number of samples, x is the mean of field measured values and k is the number of predictor variables in the model. Here, we define acceptable model accuracy in terms of NRMSE, where: <10% indicates a high correspondence between predicted and observed, 10-20% denotes a small difference, 20-30% is a moderate difference and an NRMSE ≥ 30% represents a large and unacceptable difference [104,105]. Likewise, an NBias of ≥15% was considered unacceptable model accuracy [103].

Mapping Forest Favourable Conservation Status (FCS)
Favourable conservation status for each of the 41 field plots was calculated with regard to the threshold values given in [11], and accounting for scale (see Table S2). The most accurate RF models relevant to the 17 favourable status indicator metrics were then used to predict each field metric per plot, which was then assessed relative to the threshold values. The predictions were assigned a value of 0 or 1, depending on whether the estimate was under or over the threshold value for favourable status, giving a potential range of 0-17, where a maximum score means a high FCS status.
In order to evaluate the ability of the RF models to estimate the metrics to a suitable accuracy, specifically to an accuracy level which would allow us to estimate if an area has a metric above or below the threshold value defined in Table S2, a binomial logistic regression was implemented on the results. Both field plot data and predictions from the RF models were assigned a 0 or 1 depending on threshold value. These values were entered into a generalised linear model binomial logistic regression, in order to evaluate the ability of the RF models to predict a dichotomous dependent variable. The proportion of true positives (sensitivity) and true negatives (specificity) was calculated. As a more general metric to evaluate how well the logistic regression model does at classifying the data, we calculated the area under the curve (AUC) using the pROC package [99]. AUC values can vary from 0.5 (which suggests no relationship) to 1.0 (which suggests observed and predicted values are identical), where higher values are considered better. Here, we classify a value of 0.7 or above as acceptable [106].
Subsequently, each of the 17 indicator metrics was estimated for 30 × 30 m cells across the study site for the entire area of ALS data acquisition, with each assessed per cell against the relevant FCS threshold. Each cell per indicator metric was assigned a value of 0 or 1, depending on whether the estimate was under or over the threshold value. These values were summed giving an index between 0 and 17 per cell.

Summary of Models Using Metrics from the Leaf-On Acquisition Only
Model tuning for all of the final RF models produced from metrics generated from the leaf-on ALS acquisition is summarised for reference in Table S3. The predictor variables used in each of the 27 RF models are summarised in Table S4.
The final RF models each used between one and seven input ALS variables. In total across the 27 models, height variables were included 55 times, width variables 43 times and amplitude variables 26 times. The most commonly occurring ALS variables were AbsDev of return height (six models), Variance of return height (seven models) and StDev of return height (eight models). Across the 27 RF models, height variables were used in 22 models, width variables in 16 models and amplitude variables in 13 models. Only four models exclusively used height predictors (Shannon-Weiner index for native trees, Downed deadwood decay class, Number of seedlings and Number of ground vegetation species), three models used only amplitude predictors (Simpson index of diversity, Number of saplings and Standing deadwood decay class) and one used width only (Number of sapling species). There were nine models that combined height and width predictors, four that combined height and amplitude predictors and only one model that combined amplitude with width predictors. There were five models which combined height, width and amplitude predictor variables.
The RF models created for each of the 27 forest metrics were assessed against field data, summarised in Table 2. Across all models, the adj. R 2 ranged from 0.59 to 0.82. In terms of NRMSE, eight models were below 15%, 18 were between 15 and 30% and one was above 30% (Standing deadwood decay class), here implying an unsuitable model. The majority of models had an NBias of below 15%, while three did not (Standing deadwood decay class, Simpson index of diversity and Number of sapling species). Thus, in total three models of the 27 were considered of unacceptable accuracy. Table 2. Summary statistics of each random forest model using metrics from the leaf-on acquisition, with absolute and normalised root mean square error (RMSE and NRMSE); absolute bias and normalised bias (NBias); and the adjusted coefficient of determination (Adj. R 2 ).

Summary of Models Using Metrics from the Leaf-Off Acquisition Only
Final RF model tuning parameters for the leaf-off ALS acquisition are provided in Table S5. Predictor variables used in each model are summarised in Table S6.
The final RF models created using leaf-off ALS data also had between one and seven input variables. In contrast with the leaf-on models, there was a greater prevalence of width variables. Thus, across the 27 models, width variables were included 54 times, and both height and amplitude variables 36 times. The most commonly occurring ALS variables were Kurtosis of return width (six models), Variance of return width and StDev of return width (seven models) and AbsDev of return width (eight models). Height, width and amplitude predictors were included in 14, 17 and 14 models, respectively, and used exclusively as input for one (Simpson index of diversity), six (Standard deviation of tree diameter, Number of native saplings, Standing deadwood decay class, Number of native seedlings, Mean height to the living crown and Mean crown horizontal area) and three (Total crown horizontal area, Number of sapling species, Percentage bare ground cover) of the RF models, respectively. A combination of height and width variables formed the inputs to three models, height and amplitude for six models and amplitude and width for four models. All three predictor variable types were included in four models.
Comparisons of RF model predictions against field data (Table 3) produced models with adj. R 2 which ranged from 0.59 to 0.89. In terms of NRMSE values, eight models were below 15%, 17 were between 15 and 30% and two were above 30% (Standing deadwood decay class and Percentage bare ground cover). The majority of models (25 of 27) had an NBias of less than 15%, the exceptions being: Standing deadwood decay class and Number of sapling species. Thus, as with the leaf-on only dataset, three of the 27 RF models were of unacceptable accuracy, of which two were for the same forest metric. Table 3. Summary statistics of each random forest model using metrics from the leaf-off acquisition, with absolute and normalised root mean square error (RMSE and NRMSE); absolute bias and normalised bias (NBias); and the adjusted coefficient of determination (Adj. R 2 ).

Summary of Models Using Metrics from Both the Leaf-On and Leaf-Off Acquisition
Final RF model tuning parameters for the combination of leaf-on and leaf-off ALS acquisition are provided for reference in Table S7. Predictor variables used in each model are summarised in Table S8.
As with both the separate leaf-on and leaf-off models, the combined leaf-on/leaf-off RF models each had between one and seven input ALS variables. In total across the 27 models, width variables were included 55 times, height variables 45 times and amplitude variables 32 times, of which 50 were from leaf-on data and 82 from leaf-off data. The most common input variables (regardless of leaf-on or leaf-off) were: Maximum of return height, Variance of return height, AbsDev of return height and StDev of width (6 models), StDev of return height and Kurtosis of return width (7 models), Variance of return width (8 models), and AbsDev of return width (10 models). Height, width or amplitude ALS variables were present in 16, 18 and 14 models, respectively. Those models using exclusively height, width or amplitude variables constituted four (Mean tree height, Number of tree species, Shannon-Weiner index for native trees, Number of ground vegetation species), seven (Standard deviation of tree diameters, Mean height to living crown, Mean crown horizontal area, Number of seedlings, Number of native seedlings, Number of sapling species, Downed deadwood decay class) and two (Total crown horizontal area and Number of saplings) RF models, respectively. Models combining height and width, height and amplitude or amplitude and width variables accounted for two, three and two models, respectively. Models produced using all three predictor variable types accounted for seven models.
Predictor variables derived from leaf-on acquisition were present in 20 models, whilst leaf-off variables were present in 23 models. Both leaf-on and leaf-off variables were present together in 16 RF models, whilst four models were derived using leaf-on only variables and seven using leaf-off only variables. For the four models containing only leaf-on ALS data, these were either exactly the same (Mean tree spacing, Number of saplings), a subset (Percentage bare ground) or a highly correlated alternative (Number of sapling species) to the input variables used in the leaf-on only models. By contrast, for the seven models containing only leaf-off ALS variables, four of the models had the same or a subset of the input variables used in the leaf-off only models (Standard deviation of tree diameter, Mean height to the living crown, Number of native seedlings and Number of seedling species), whilst the models for Number of tree species, Downed deadwood decay class and Shannon-Wiener index for native trees had very different input variables to their leaf-off only counterparts. As many of the input variables are highly correlated, in addition to the large number of predictor variables available, and the random subsetting of predictors in the Boruta package over a finite number of iterations (500), this could explain the different predictors being selected. It is conceivable that the predictors indicated in Table S6 would be repeated if there were enough Boruta iterations. Each of the predictions from these models is illustrated in Figure S1.
Comparisons of RF model predictions against field data (Table 4) produced models with adj. R 2 which ranged from 0.56 to 0.89. In terms of NRMSE values, nine models were below 15%, 17 were between 15 and 30% and only Standing deadwood decay class was above 30%, here implying an unsuitable model. The majority of models (25 of 27) had an NBias of less than 15%, with the exceptions being Standing deadwood decay class and Number of sapling species. Table 4. Summary statistics of each random forest model using metrics from the leaf-on and leaf-off acquisition, with absolute and normalised root mean square error (RMSE and NRMSE); absolute bias and normalised bias (NBias); and the adjusted coefficient of determination (Adj. R 2 ).

Best Overall Models
Comparing the leaf-on only and leaf-off only RF models, based on NRMSE and NBias, seven models were better using leaf-on data, 11 were better using leaf-off data and nine models were inconsistent between the two ALS datasets. However, in only two models (Number of sapling species and Volume of standing deadwood) were both the NRMSE and NBias different by >1% point between the two datasets. Combining leaf-on and leaf-off data improved ten leaf-on only models and nine leaf-off only models, but only in three cases were both NRMSE and NBias increased by >1% point (Volume of standing deadwood for the leaf-on model and the Simpson index of diversity for both leaf-on and leaf-off models). Therefore, overall, with the exception of Simpson index of diversity which was modelled with a very high NBias using leaf-on only data (22.38%), there was very little or consistent improvement between the leaf-on only, leaf-off only and combined datasets in RF model output quality assessed by NRMSE and NBias.
The most accurate models overall for each of the 27 forest metrics, in terms of minimising NRMSE and NBias, and maximising adj. R 2 (summarised in Table 5 for the 27 field metrics), were derived from across all ALS datasets. Models derived from leaf-on only ALS data (Section 3.1) constitute 8 of the best models, those derived from the leaf-off data (Section 3.2) constitute 12 of the best models and those derived from a combination of leaf-on and leaf-off ALS data (Section 3.3) account for 7 models. Overall, slightly lower NRMSE and NBias values were achieved through this process of best model selection. For example, the difference in NRMSE between the leaf-on, leafoff and combined leaf-on/leaf-off models per forest metric varies between only 0.02 and 4.02 percentage points. The average NRMSE across the 27 models was as follows: 19.57% for leaf-on, 19.28% for leaf-off, 19.06% for combined leaf-on/leaf-off and 18.36% for the best set of models from across all datasets. In terms of NRMSE for the best models, 26 of the models had a value below 30% (nine < 15%), with the exception being Standing deadwood decay class. For NBias, 25 of 27 models were <15%, with the exceptions being Standing deadwood decay class and Number of sapling species. These were therefore the only two RF models considered to be of unacceptable quality.

Assessment of Accuracy with Regard to Indicators of 'Favourable Conservation Status'
Favourable status values for the 41 field plots ranged from 3 to 11, where lower values on average were observed for plots located in coniferous stands (mean 6.8; standard error 0.4; n = 20), and higher values for sites in deciduous dominated stands (mean 8.6; standard error 0.5; n = 16) or mixed stands (mean 8.2; standard error 0.7; n = 5). The results of a t-test indicated that the FCS index means from coniferous and deciduous plots were significantly different (p < 0.05).
The most accurate RF models (as in Table 5) relevant to the 17 FCS indicator metrics were used to predict each field metric, which were then assessed relative to the threshold values. Binomial logistic regression was implemented, and the results are summarised in Table 6. Overall accuracy of the logistic regression per indicator metric varied between 51 to 100 percent correct (where 11 of 17 were >80% correct). The lowest accuracy (<65%) was observed for the Shannon-Wiener index for native trees and Volume of downed deadwood. All AUC values were above 0.7, implying acceptable discrimination for predicted values. AUC values ranged from 0.71 to 0.98. In addition, the AUC calculated for the Shannon-Wiener index for native seedlings was determined to be rank-deficient and potentially misleading, implying that one or more of the predictor variables were not linearly independent. All of the field measurements for Number of seedlings and Number of native seedlings did not exceed the threshold value and thus were always 0, preventing AUC calculation. Table 6. Summary of logistic regression applied to the estimates from models created from leaf-on, leaf-off or combined models for the 17 indicator metrics identified in Table 1 (and Table S2). Predicted values were assessed against these indicator metric targets and assigned a 1 if they exceeded this, or a 0 if they did not. These values were then assessed against the equivalent field value. (# denotes the area under the curve (AUC) model was rank-deficient and therefore the fit may be misleading; -denotes no model could be run in these instances, all predicted values were below the FCS threshold and were therefore zero).

Variables
Overall Correct (%) Sensitivity (%) Specificity (%) AUC (0-1) The map presented in Figure 2 provides the index of favourable conservation status derived from ALS models for the New Forest study site. Generally, high index values are observed for areas containing native beech-oak deciduous forest, whereas low values are consistent with areas of coniferous plantation forest.

RF Model Differences between Leaf-On and Leaf-Off ALS Acquisitions
The timing of ALS acquisition in the current research, with leaf-off data (from April, prior to leaf flush) and leaf-on data (from July, in mid-summer) is generally expected to produce different ALS return distributions, specifically related to canopy penetration rates, (e.g., [67,107,108]). The analysis of both ALS acquisitions for field metric estimates by random forest modelling, when assessed individually, produced similar results in terms of NRMSE and NBias. With the exception of the Simpson index of diversity for native trees (which was modelled with a notably high NBias in leaf-on only data), both NRMSE and NBias varied by <±4.5% between leaf-on only and leaf-off only models (similarly to [109]). However, whilst these model differences were low, 11 of 27 metrics were more accurately predicted from the leaf-off data, with most of these field metrics related to the number, spacing and variation of trees (in terms of species composition and size) or below-canopy features such as seedlings, deadwood and height to the live crown. Only seven leaf-on models were better than leaf-off models in terms of NRMSE and NBias, and these mostly related to tree size and crown area, saplings and bare ground.
When the two ALS acquisitions were combined into RF models, there was relatively little gain in accuracy per metric. In total, only in the case of Number of sapling species, Volume of standing deadwood and the Simpson index of diversity was there a difference of >1% point in both NRMSE and NBias when comparing leaf-on only, leaf-off only and/or combined leaf-on/leaf-off RF models. For the other 24 metrics, the difference between models in terms of NRMSE and NBias was negligible or inconsistent. However, whilst the models had similar accuracy, they differed considerably in their input ALS variables (see Section 4.2). When the best models were selected to minimise NRMSE and NBias, and maximise the adj. R 2 , eight were leaf-on only models, 12 leaf-off only models and seven were combined leaf-on/leaf-off models. In the best models, metrics related to tree size, crown area, saplings or bare ground used leaf-on only data, whilst metrics related to tree number and spacing, tree, seedling and ground vegetation species composition and diversity and deadwood were dominated by leaf-off or combined leaf-on/leaf-off ALS variables.
Overall, our NRMSE results were similar to the accuracies reported in the surrounding literature ( Table 7). The most direct comparison is from an earlier study using the same field and ALS data and an area-based linear multiple regression approach (using Akaike information criterion analysis) to estimate 23 metrics. With the exception of the estimates of the Volume of standing deadwood and Downed deadwood decay class, which had a lower RMSE in [55], all other RF model estimates have a lower RMSE in the current study, typically over 50% lower than their linear model derived counterpart. All metrics have a better or similar R 2 and/or RMSE than other published studies, with the exception of Mean tree height [110] and Shannon-Wiener index for native trees [61]. It should be noted, however, that Standing deadwood decay class was modelled in the current study with an NRMSE > 30%, which could be considered unreliably high. Table 7. Statistics (root mean square error (RMSE), normalised RMSE (%), coefficient of determination (R 2 ) and standard error of the regressions (SER%)) from the literature for the same metrics used in the current study which were estimated using airborne laser scanning.

Variables
Overall Accuracy References

Exploring the Predictors Used in the Models
The RF model individual predictors were not consistent when estimating the same field metric using leaf-on or leaf-off data. The majority of RF models derived from leaf-on data (22 of 27) included predictor variables summarising the distribution of return heights, whereas a lower proportion of models derived from leaf-off data (14 of 27) incorporated height metrics. When considering the models produced from a combination of leaf-on and leaf-off acquisitions, the proportion of models which incorporated height predictors was more similar to leaf-off models (16 of 27). Width variables were present in 16-18 models and amplitude variables were present in 13-14 models across the leaf-on, leaf-off and combined leaf-on/leaf-off datasets. Given that the accuracy of the models was similar across the leaf-on, leaf-off and combined datasets (with the exception of the Simpson index of diversity), this implies that amplitude and width ALS variables from leaf-off data (often from a lower depth within the canopy) can provide equivalent or possibly better predictive ability than height variables from leaf-on data.
Using the best selected RF model (Table 5), of the 10 metrics defined as being related to forest canopy structure (see Table 1), five of these used leaf-on only predictors, three used leaf-off only predictors and two used a combination of leaf-on and leaf-off predictors. These RF models showed a dominance of return height and width predictors, both with 26 out of a total number of 65 ALS variables across all 10 models. The majority of height predictor variables related to measures of variance (i.e., standard or absolute deviation, variance, skewness) or upper canopy height (i.e., ≥70th percentile, maximum or dominant height), with the exception of Basal area which had lower canopy height variables (i.e., ≤30th percentile and minimum non-ground height). The relationships between forest structure and height predictor ALS variables are widely documented (e.g., [115,123]. The RF models for both Standard deviation of tree diameters and Mean height to the live crown were composed of width metrics only (mostly relating to variance or mean). The authors of [124] defined echo-width as providing information on the range distribution of scattering surfaces within the laser footprint that contributes to a return echo and it is an indicator for surface roughness and ground slope. Authors such as [125] have demonstrated statistically significant relationships between roughness parameters and forest structure, which implies that echo-width predictors may be linked to the size and distribution of tree canopy components encountered in the current study. Many of the amplitude predictors were for the mid-to upper percentiles or measures relating to total or variance, which is similar to that observed in [126] whereby amplitude predictors combined with height predictors were strongly related to forest biomass. Other research has stated that the distribution of amplitude values in a forest is related to the presence and spatial arrangement of foliage [68,127]. The best RF model for the estimation of Total crown horizontal area had amplitude variables as the three most important predictors, highlighting a relationship between canopy biomass and ALS return amplitude.
Concerning the estimates of the four forest metrics related to forest canopy composition, the best RF models were dominated by height-related ALS variables (13 of a total of 22 input variables). Unlike the forest structure models, these tended to be more from the mid-profile (i.e., 40th-55th percentile) in addition to measures of variance, implying a link between structural diversity and overstorey composition. The model for the Simpson index of diversity only incorporated height metrics. Native species were typically deciduous species, characterised as being far more structurally complex (e.g., [128]), implying that the RF model is picking up on canopy height variation and understorey. The width and amplitude variables mostly related to ≥75th percentile or variance measures, indicating that species composition influences the spatial arrangement and roughness of the canopy.
The best RF models for regenerating saplings and seedlings were dominated by width input variables (17 out of a total of 32 variables used in the seven models), and these were spread across the full range (i.e., 5th-100th percentile) as well as measures relating to variance. The best RF models for the number of native saplings and seedlings were both created from only width variables. Overall, the number of regenerating saplings and seedlings encountered in the current study was low, and the majority of these were encountered in deciduous dominated plots. Leaf-off ALS data were thus the primary source of useful information (even with combined leaf-on/leaf-off models). The height variables used in the best RF models relate to mid-profile (i.e., 40th percentile) or to either maximum or variance. The best RF model for Number of seedlings contained only height predictors (related to top-canopy or variance). This highlights that seedling and sapling regeneration is linked to the surrounding forest structure. The Number of seedling species had amplitude as five of the seven input variables. Amplitude metrics are potentially related to living and dead biomass, bark and coniferous needles [122,129].
All RF models for the estimation of deadwood volume and decay class incorporated ALS metrics from both leaf-on and leaf-off acquisitions, and in particular amplitude and width variables (six and five, respectively, out of 15 total input ALS variables across the four models). These tend to relate to upper percentiles (≥70th percentile) and measures of variance. The authors of [130] suggested that forest ground-level vegetated elements are rougher whilst fallen stems have smooth surfaces that could be linked with echo-width. Standing deadwood volume was likely related to forest cover type and management, where larger proportions were located in deciduous dominated plots. As with downed deadwood, standing dead material likely has smoother surfaces, as represented in the width metrics used in the RF models. ALS amplitude variables used are possibly related to dominant cover type and structure, being similar to the input variables used to predict Total crown horizontal area.
Finally, the best RF models for the ground layer metrics of Number of ground vegetation species and the Percentage of bare ground cover used very different ALS predictors. For the former model, the height predictor metrics were from the upper portion of the forest plot (≥80th percentile), implying that the upper canopy structure impacts ground vegetation species, whereas for the latter model the height predictors were from the lower portion of the profile (≤30th percentile), suggesting the importance of understorey in influencing ground vegetation coverage. In both cases, high percentile amplitude variables were also included, potentially related to the presence and spatial arrangement of foliage across the vertical profile.

Favourable Conservation Status
ALS datasets have been used to estimate forest structure and predict the habitat of various animal species, and potential diversity of said species (e.g., [63,[131][132][133]). A common element is the identification of indicators of environmental conditions, and establishment of criteria for their assessment. For the current study, we assessed the accuracy of our RF model predictions with regard to 17 FCS indicator threshold values determined by [11] for semi-ancient deciduous forest in the UK. With the exception of assessments of seedling metrics (Number of seedlings, Number of native seedlings and Shannon-Wiener index for native seedlings), the 14 other metrics were predicted with acceptable accuracy with respect to their threshold values (i.e., with AUC values over 0.7, as specified in [106]). For all field plots, the numbers of seedlings encountered were consistently below the threshold value specified in [11].
The FCS index was calculated for the entire study site, and consistent values were observed within the extents of individual stand boundaries. Higher values were observed for locations containing semi-ancient deciduous forest, which we would expect given the nature of the index, and the natural structural and compositional diversity, and presence of native species, in comparison to the planted homogenous coniferous forest, which consistently scored low values. Conifer stands typically have little to no understorey due to management and shading. Conifer areas also have little or no deadwood, relatively low and invariant DBH values and low species diversity. This means that the grid-cells which intersect with these areas do not score very highly for anything except stem density and basal area. The number of regenerating trees (i.e., saplings and seedlings) is low across the entire study site, especially for coniferous areas, thus the highest value of 17 for the FCS index is not met for any of the New Forest study sites. This trend was recognised by [82] because of grazing by a high number of large ungulate species, such as deer and ponies, which are present throughout the unenclosed forest.

Concerns and Wider Implications
Sample size effects on model training and accuracy are the main issue in the current research project. None of our field plots achieved >11 in the FCS scoring, and so we must conclude that either more field data are needed to truly cover the complete range of conditions at the field site, or that this location simply does not have any sites with FCS index values > 11. The sample number was relatively low, although other studies have used similar sample numbers for RF analyses (e.g., [134]). The sample size precluded us from separating dedicated training and validation datasets. This therefore represents a source of uncertainty. Future work should ensure a large number of field plots are installed. RF models have been stated to be incapable of extrapolating beyond the range of values used in training the model [135].
In addition to sample size, plot location and FCS target thresholds should also be considered as possible causes of model outcome uncertainty. However, as in [136], minor imprecision in plot geo-reference has little impact on RF model performance for plot-level estimates. The FCS targets outlined in [11], or more specifically the target threshold values (derived from: [80][81][82][83]), were collated for a specific semi-natural forest location. We have had to presume that these indicators are appropriate for all forest stands encountered in the current research and context. Data for semi-natural deciduous and plantation coniferous forest have been merged throughout this study.
There was a two-year time difference in this study between the two periods of field data collection, and therefore also between the 2012 field data and the 2010 ALS acquisitions. As our field plots were in mature, stable woodland, we must assume that any potential changes in our field site during this time were small. However, this is a potential source of uncertainty [137]. Future work would therefore need to ensure field data are acquired at an appropriate time relative to ALS acquisition to ensure any potential uncertainty is minimised.
Strong relationships were observed between ALS variables and fieldwork derived forest plot attributes. However, as with other similar studies, there are a number of potential issues which limit transferability of such models to other sites. These can relate to the range of forest types studied in order to derive the relationships, with many developed for a specific region and a limited number of species types. Tree species and canopy structure can influence the biases observed in ALS observations of height and can have an effect on the vertical penetration of ALS returns, which can be complicated when considering leaf-on and leaf-off data (e.g., [137,138]). In addition, considerations over differences in the acquisition parameters of the ALS data between studies pose problems [139,140], as do seasonal differences between field and remote sensing data capture, for example, models calibrated using leaf-on field and ALS data can produce erroneous results when applied to leaf-off ALS data (e.g., [64]). Different canopy penetration capabilities when comparing sensors have been demonstrated, even when applied at the same location [141,142]. Acquisition specific considerations in terms of sensor specifications, flying altitude, pulse repetition frequency, laser power, beam divergence and footprint size will influence target parameter characterisation and a system's ability to resolve a return [143][144][145]. The application of different processing methods (in particular, raster or point cloud approaches) can also yield differences in parameter retrieval [146].
Within the context of the current research, the ALS metrics selected were relatively basic with an aim of being more transferable [143,147]. The surrounding research literature contains additional metrics which could improve our models. The inclusion of vertically distributed canopy density or complexity indices may improve model accuracy as they have been demonstrated to correlate well with canopy conditions (e.g., [148,149]). Computationally similar light penetration indices appear to be transferable across acquisitions (e.g., [150]). Likewise, methods based on fitting functions to the frequency of returns by height, e.g., Weibull probability density functions [151], offer great potential. Future work will therefore need to assess the transferability or stability of metrics, in addition to the modelling methods, across different ALS acquisitions and environmental contexts.
One of the conclusions of the study presented in [55], when using linear regression analysis to estimate forest metrics, was that the leaf-on and leaf-off datasets captured different properties of the forest, which is supported here given the range of predictor metrics used across the models. Many of the best performing models in that study (10 of 23) included both leaf-on and leaf-off metrics. Overall, more accurate results were obtained in the current study, but relatively small differences were observed between leaf-on, leaf-off or combined model accuracies. This suggests that the RF approach may be better when applied to locally disparate forest types than linear models (as in [152]). There is also the potential implication that a combination of datasets of leaf-on and leaf-off acquisitions may not be necessary depending on the statistical modelling approach applied, at least within the context of the current study. The benefits that a combination of datasets provides in a regression approach may have been rendered redundant by the use of RF here. A comparison of different modelling methods would be a logical avenue for future work.

Conclusions
In this study undertaken in coniferous and deciduous forest in the New Forest, UK, we found that the differences between RF models derived from leaf-on, leaf-off and combined leaf-on/leaf-off datasets were slight, with the exception of estimating the Simpson index of diversity. Thus, whilst there were some detectable trends (e.g., higher accuracies were observed for forest overstorey structure when leaf-on ALS predictor variables were used, whilst forest canopy composition or understorey characteristics achieved a higher accuracy when either leaf-off or combined leaf-on and leaf-off ALS data were used), these were not significant enough to imply that consideration of ALS acquisition time will be required for optimal prediction of forest characteristics if using RF modelling. However, whilst model performance was similar between leaf-on, leaf-off and combined datasets, model composition was often very dissimilar, reiterating that these datasets capture different aspects of the forest and that structure and composition across the full vertical profile are highly inter-connected.
Estimated accuracy overall exceeded estimates produced using linear models in previous research. Whilst there is room for improvement, given the uncertainties associated with estimating a range of metrics and the small training sample size used, the ALS-based method was in good agreement with field-based assessments of FCS. In addition, continuously mapped estimates of FCS were created across the study site and corresponded closely to forest stands and compartments present, demonstrating the utility of such an approach for forest condition mapping and monitoring.
The value of ALS is its ability to estimate a variety of habitat variables related to forest three-dimensional structure. The current research demonstrates the feasibility of predicating spatially explicit forest metrics, derived from ALS covariates, for multiple forest stands. This approach potentially demonstrates a method to rapidly assess forest condition and FCS over large areas, reducing (but not eliminating) the need for costly field surveys. The main advantage of ALS acquisitions of this type is that they allow the sampling of forest structural characteristics at a high spatial resolution over large spatial extents. The ability to characterise these structures through ALS provides a proxy for biophysical processes in forests [126]. These findings are important for advancing the management of forest resources. Further investigation into such approaches could therefore yield additional data on many potential biological processes and distributions within a landscape. A similar type of approach to that documented in the current study would allow managers to investigate forest changes post-disturbance or treatment. Equipped with this knowledge, managers would have better information and have tools to address the impact of forest changes in the face of a potentially changing climate.
The results of this study show that full-waveform multi-temporal ALS holds a great deal of potential information which is useful for estimating forest characteristics, both within and below the main forest canopy, and for mapping continuously across large extents. The spatial resolution of the data allows for within stand assessment, whereas previously, areas defined as stands, regardless of size, were considered the smallest management unit. We expect that the use of remote sensing technologies and methods will be tested in other sites in the future in order to develop better estimates. The availability of high-resolution forest vegetation maps with acceptable agreement to field validation will significantly advance forest ecological understanding and improve conservation efforts.
Supplementary Materials: The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/rs14205081/s1, Figure S1. Plots of predicted values from RF models against field measurements produced from the most accurate models; Table S1. A summary of field measured values for each of the 27 metrics across all 41 plot locations; Table S2. List of ecological indicators for the New Forest; Table S3. A summary of random forest model tuning parameters for models using metrics from the leaf-on acquisition; Table S4. A summary of variable inputs to each random forest model using metrics from the leaf-on acquisition, as determined by Boruta variable selection; Table S5. A summary of random forest model tuning parameters for models using metrics from the leaf-off acquisition; Table S6. A summary of variable inputs to each random forest model using metrics from the leaf-off acquisition, as determined by Boruta variable selection; Table S7. A summary of random forest model tuning parameters for models using metrics from both the leaf-on and leaf-off acquisitions; Table S8. A summary of variable inputs to each random forest model using metrics from both leaf-on and leaf-off acquisitions, as determined by Boruta variable selection.