Estimation of Total Biomass in Aleppo Pine Forest Stands Applying Parametric and Nonparametric Methods to Low-Density Airborne Laser Scanning Data

The account of total biomass can assist with the evaluation of climate regulation policies from local to global scales. This study estimates total biomass (TB), including tree and shrub biomass fractions, in Pinus halepensis Miller forest stands located in the Aragon Region (Spain) using Airborne Laser Scanning (ALS) data and fieldwork. A comparison of five selection methods and five regression models was performed to relate the TB, estimated in 83 field plots through allometric equations, to several independent variables extracted from ALS point cloud. A height threshold was used to include returns above 0.2 m when calculating ALS variables. The sample was divided into training and test sets composed of 62 and 21 plots, respectively. The model with the lower root mean square error (15.14 tons/ha) after validation was the multiple linear regression model including three ALS variables: the 25th percentile of the return heights, the variance, and the percentage of first returns above the mean. This study confirms the usefulness of low-density ALS data to accurately estimate total biomass, and thus better assess the availability of biomass and carbon content in a Mediterranean Aleppo pine forest.


Introduction
Forest ecosystems and their associated understory act as important carbon sinks, providing habitats for wildlife [1,2] and promoting economic and social services to societies [3,4].The sequestration of carbon and CO 2 by forest biomass and soils plays a major role in managing greenhouse gas emissions [5] providing a low-cost opportunity in climate policies [6].The estimation of tree and shrub biomass in the Mediterranean Basin also contributes to better understanding fire behavior, which constitutes one of the most relevant disturbances as well as being a source of greenhouse gas emission to the atmosphere [7][8][9][10][11].The inclusion of understory vegetation in carbon sequestration has been traditionally ignored [12] since it represents lower amounts of biomass compared with tree aboveground biomass (AGB).However, considering the global scale this amount constitutes an important pool for carbon sequestration [12,13].
Remote sensing technologies have demonstrated effectiveness for estimating forest resources such as biomass [14,15].Particularly, Airborne Laser Scanning (ALS) is currently considered one of the best Forests 2018, 9, 158 2 of 17 tools due to its capability to provide 3-D information of vegetation structure.Vertical forest structure has been estimated with ALS data for several applications, such as forest inventory [16][17][18], forest structural heterogeneity [19][20][21][22], fuel type mapping [23,24] fuel modelling [23][24][25][26] or tree damage detection after natural disasters [27][28][29] for several height strata.However, few studies have focused on shrub biomass characterization with ALS data [30][31][32][33].Some studies have used low density ALS data to estimate forest biomass [25,[34][35][36][37][38], but little research has been performed including shrub vegetation because of the inherent difficulty in the estimation related to its low height and uniform surface [30].Several studies state that ALS data tends to underestimate shrub vegetation [39][40][41][42].Besides, when shrub and tree vegetation cover is high [43] and density of ALS data is low, the accuracy of digital elevation models (DEM) used to normalize return heights decreases [30].The performed studies use an approach that combines ALS data and harvesting field measurements for biomass estimation [30,33].In this sense, the lack of more studies to characterize shrub vegetation might have been associated with the necessary destructive sampling to generate forest structure equations, the assumption of simple geometric shapes [44,45], and the additional difficulty to estimate biomass at a regional scale using low-density ALS data.However, the presence of shrub biomass in the understory or the existence of shrubland areas constitutes an important land use in the Mediterranean basin.In this sense, the availability of shrub allometric equations for the main Spanish shrub species [46] have opened new opportunities.These equations allow the estimation of shrub biomass from simple field measurements, such as height and percentage cover per species or group of species.Thus, an area-based approach could be used to estimate biomass, using larger field plots (15 m radius) than traditional ones (1.5 m radius).This approach seems to be less affected by low-density ALS data, as has been proven in accurately estimating forest structure parameters in heterogeneous forests [47].
Few studies have compared different selection methods for ALS modelling [48,49].In this sense, LASSO selection and the varimax rotation for Principal component analysis (PCA) selection are proposed as novel selection processes for biomass estimation.Some studies have compared different regression models to estimate forest parameters using high point cloud density [16,34,[50][51][52], but only Li et al. [33] have compared two regression models to estimate shrub biomass.
The main objective of this study is to estimate the total biomass (TB) in heterogeneous Pinus halepensis Miller (hereinafter Aleppo pine) forest stands located in the Aragón Region (Spain).In this manuscript, TB refers to the dry weight of the plant material from trees, including roots, stems, bark, branches, and leaves from the ground to the apex, as well as the aboveground plant material from shrubs.The values of TB are expressed per unit area in terms of density, i.e., tons of TB per hectare (tons/ha).Specifically, we aim to (1) examine the relationship between biomass calculated at the field plot using allometric equations and ALS-derived variables; and (2) compare five variable selection methods and five different regression models, including regression trees and machine learning algorithms.This paper is organized in five sections.Section 2 describes the processing of ALS and field data, the selection methods, the regression models applied, as well as, the model comparison and validation methods used.Section 3 presents the results including the best TB model.Section 4 describes the discussion of the results and the paper's conclusions are included in Section 5.

Study Area
The Aleppo pine forests under study are located in the Ebro Basin (Figure 1), in Northeast Spain.This species represents almost 50% of the forested tree area in Aragón and is well adapted to Mediterranean environmental conditions.The two selected areas (Figure 1) are representative of the Aleppo pine forest in Central Ebro Valley, occupying 11,400 ha.
Climate of the region is Mediterranean with continental features.The average annual temperature is ≈13 • C, ranging from winter mean temperatures lower than 6   C. Annual precipitation is medium-low and irregular, averaging 455 mm and mostly occurring in autumn and spring [53].
Aleppo pine forests are characterized by a hilly topography, with elevations ranging from 300 to 1150 m above sea level and slopes from 0 • to 39 • .The lithology of the study area varies from Miocene shales and sandstones, alternating with conglomerates in area A (Figure 1A) to Miocene carbonate and marl sediments in area B (Figure 1B).Aleppo pine forests are characterized by a hilly topography, with elevations ranging from 300 to 1150 m above sea level and slopes from 0° to 39°.The lithology of the study area varies from Miocene shales and sandstones, alternating with conglomerates in area A (Figure 1A) to Miocene carbonate and marl sediments in area B (Figure 1B).Most of these pine stands are semi-natural, although some stands located in the south-eastern part of the area B (Figure 1B) were planted approximately forty years ago.The evergreen understory includes species such as Quercus ilex subsp.rotundifolia, Quercus coccifera, Juniperus oxycedrus, Buxus sempervirens, Juniperus phoenicea, Rosmarinus officinalis, and Thymus vulgaris.Most of these pine stands are semi-natural, although some stands located in the south-eastern part of the area B (Figure 1B) were planted approximately forty years ago.The evergreen understory includes species such as Quercus ilex subsp.rotundifolia, Quercus coccifera, Juniperus oxycedrus, Buxus sempervirens, Juniperus phoenicea, Rosmarinus officinalis, and Thymus vulgaris.

ALS Data
The ALS data were collected by the Spanish National Plan for Aerial Orthophotography (PNOA) and captured in several surveys carried out between January and February 2011.An airborne Leica ALS60 discrete-return sensor with a small-footprint oscillating-mirror was used.The sensor was operated at a wavelength of 1.064 µm and a scan angle of ±29 • from nadir.Data were provided by the National Geographic Information Centre (CNIG) in 2 × 2 km tiles of raw data points in LAS binary files format v. 1.2.The geodetic reference system for x, y, and z coordinates was ETRS 1989 UTM Zone 30 N and heights were orthometrically corrected.The point cloud was captured with up to four returns measured per pulse.The point cloud density was 1.5 point/m 2 , considering all returns with a vertical accuracy better than 0.2 m.

Field Plot Data and Conversion of Biomass Measurements to ALS Campaign Year
Field data were acquired in 83 circular plots, 15 m radius, at the two study areas (Figure 1).The sampling data fulfil the statistical requirements [55] and consider the size of the study area and the variability of pine forest in terms of canopy cover (CC), canopy height, and terrain slope [56].The 45 field plots located in area A were collected from June to July 2015 and the 38 field plots located in area B were collected from July to September 2014.A stratified random sampling technique was selected to establish the location of the field plots, considering a representative sample of the tree density, forest structure [47], and terrain variability [57].To perform this procedure, tree height (h), CC and terrain slopes of the study area were derived from ALS point cloud to define homogeneous areas.
The center of the designated circular plots was located in the field using a Leica VIVA ® GS15 CS10 GNSS real-time kinematic Global Positioning System.The average accuracy of the planimetric coordinates was 0.16 m.The green crown height and h parameter were measured using a Haglöf Sweden ® Vertex instrument and the tree diameter at breast height (dbh) was measured at 1.3 m, using a Haglöf Sweden ® Mantax Precision Blue diameter calliper.Trees with a dbh larger than 7.5 cm were measured in each plot, and trees with a dbh lower than 7.5 cm were only counted.The percentage of canopy cover was estimated, as well as the average height of the different shrub species and the understory coverage.
The temporal lag between ALS acquisition data and field work campaign was 3 and 4 years in area B and area A, respectively.Although no significant changes took place in the study area in that period, field plot data were transformed to match the ALS campaign year.For this purpose, the values of dbh and height growth per diametric class provided by the National Forest Inventory (NFI) were used.These refer to the difference of years between the second NFI (NFI2) and the third one (NFI3), which is eleven years.A linear interpolation based on stand tables was made to estimate the variation between field data and ALS data acquisition (Table 1).Total tree biomass fractions were calculated using the Pinus halepensis allometric equations according to Ruiz-Peinado et al. [58] (Equations ( 1)-( 5)): If dbh ≤ 27.5 cm then Z = 0; If dhb > 27.5 cm then Z = 1 (2) where W s is the biomass weight of the stem fraction, W b7 is the biomass weight of the thick branch fraction (diameter larger than 7 cm), W b2−7 is the biomass weight of medium branch fraction (diameter between 2 and 7 cm), W b2+n is the biomass weight of the thin branch fraction (diameter smaller than 2 cm) with needles, and W r is the biomass weight of the roots.Subsequently, the biomass fractions were summed up to calculate the total tree biomass of each individual tree.Then, these biomass values were added to obtain per plot biomass values that were later expressed in tons per hectare.
Shrub biomass was calculated at plot level using Montero et al. [46] equations for the different shrub formations of the study area according to the Spanish Forest Map (MFE) categories [59] (Equations ( 6)-( 10)): Shrub hedges, borders, galleries, etc.: Quercus coccifera and Pistacia lentiscus: Leguminosae aulagoideas and related shrubs: Labiatae and thymus formations: General shrub biomass: where W is the biomass weight for each species in tons/ha, h m is the average shrub height at plot level and CC is the percentage of shrub canopy cover at plot level.Equation ( 6) was applied for Crataegus monogina, Rhamnus lycioides and Rosa canina; Equation (7) was used for Quercus coccifera; Equation (8) was applied for Genista scorpius; Equation (9) was used for Thymus sp.Subsequently, general Equation (10) was used for all the other inventoried species: Rosmarinus officinalis, Juniperus oxycedrus, Juniperus sabina, Buxus sempervirens, Genista florida, and Salsola vermiculata.It should be noted that Equation (10) was also applied for Quercus ilex and Pinus halepensis with less than 7.5 cm of dbh, as it is not possible to use available tree allometric equations.Shrub biomass in every plot was calculated summing up the biomass values for each type of species.The estimation of shrub biomass to match ALS year was carried out by applying shrub biomass growing equations according to Montero et al. [46] (Equations ( 11)-( 14)).It should be noted that no growing equations were developed by Montero et al. [46] for shrub edges, borders, and gallerie formation.Consequently, the General shrub growing biomass (Equation ( 14)) was applied for Crataegus monogina, Rhamnus lycioides, and Rosa canina.
Quercus coccifera and Pistacia lentiscus: Leguminosae aulagoideas and related shrubs: Labiatae and thymus formations: General shrub biomass: where W is the biomass weight for each species in tons/ha, h m is the average shrub height at plot level, and CC is the percentage of shrub canopy cover fraction at plot level.Then, the shrub biomass growing values per year in every plot were calculated summing up the biomass values for each species obtained from Equations ( 10)-( 13).These annual growing values were subtracted from the measured ones considering the difference in years between ALS flight and field data acquisition.Finally, TB density was calculated for each plot by adding up the tree biomass and the above ground shrub biomass expressed in tons of dry biomass per hectare.

ALS Data Processing
The noise point class was removed and ground points were classified according to Montealegre et al. [60] using the multiscale curvature classification algorithm [61].This algorithm is implemented in MCC 2.1 command-line tool as a C++ application.The Point-TIN-Raster interpolation method [61], implemented in ArcGIS 10.5 software, was applied to generate a digital elevation model (DEM) with 1 m size grid [60].Point heights were normalized with the DEM and ALS tiles were clipped to the spatial extent of each field plot.Furthermore, a full suite of statistical metrics commonly used as independent variables in forestry was calculated [62] using FUSION LDV 3.60 open source software [63] including variables related to height distribution: percentiles, mean elevation, kurtosis, skewness, etc.; and variables related to the percentage of canopy cover.According to Nilsson [64] and Naesset and Økland [57], a threshold value of 0.2 m height was applied to remove ground laser hits while considering the understory.The threshold is related to the ALS sensor precision in coordinate Z, which has a root mean square error (RMSE) below 0.2 m.Spearman's rank correlation coefficient (ρ) was computed using R to determine the strength and direction of the relationship between field plot biomass data and ALS data.The selection of the ALS variables was made considering a minimum positive and negative ρ value of 0.5.

Selection of ALS Variables
Stepwise selection method is based on dropping or adding variables at several steps.Backward, forward and bidirectional stepwise selection procedures were applied using R.
PCA allows the number of variables for regression purposes to be reduced.Although it has been widely used in many research fields, it is not a common approach for ALS metric selection [49].PCA was computed using R package "lattice".The obtained components with eigenvalues greater than 0.1 were retained according to the Kaiser Criterion [65].A Varimax rotation, which maximizes the sum of the variance, was applied to better interpret the PCA results [66,67].The three most representative metrics for all the PCs were selected.
LASSO (Last absolute shrinkage and selection operator) is a technique based on regularization methods.It generates interpretable models by minimizing the residual sum of squares regarding the sum of the absolute value of the coefficients that are less than a constant [68].LASSO was computed in R using the "glmnet" package.
All subset selection methods allow a suitable group of metrics to be selected on which the models can focus their attention, while disregarding the rest of the variables.A wide variety of search approaches have been developed for subset selection.In this regard, four search approaches were implemented using R package "leaps": exhaustive, backward, forward, and sequential replacement (Seqrep) [69].The maximum size of subsets was set to three.
A maximum number of three predictor variables were selected using the different selection methods to avoid overfitting [70] and to generate more understandable models for forest management [38].

Parametric and Nonparametric Models for Estimating TB
The performance of five regression methods was compared to estimate TB, including a parametric model, the multiple linear regression model (MLR), and four nonparametric models: Support vector machine (SVM), Random forest (RF), Locally weighted linear regression (LWLR) and Linear model with a minimum length principle (MDL).Parametric models summarize data using a finite set of parameters while in nonparametric methods the number of parameters is potentially infinite.These methods are described below.
MLR is one of the most broadly applied methods for the estimation of forest structure variables using ALS data [38,57,[71][72][73][74][75].MLR is a linear approach for modeling the relationship between a scalar dependent variable and two or more explanatory variables.The selected variables, considering the different selection methods, were included in the models.The basic assumptions of linear regression models were verified for the fitted model: normality of the residuals, homoscedasticity and independence or no auto-correlation in the residuals [76] in order to make comparisons between suitable models.Dependent and independent variables were transformed logarithmically in those cases where statistical assumptions of linear regression were not fulfilled [71,74,77] to verify whether the measures of goodness of fit of the models improved.
SVM is a supervised learning model which has associated learning algorithms that analyze and recognize patterns.This method tries to discover among multidimensional hyperplanes the optimal separation between classes, where the separability is a maximum, assuming that input data are separable in space [38,77].Data located in the hyperplane are called support vectors, being the most difficult to classify since they have a lower separability.SVM, with linear and radial kernels, were computed using R package "e1071".The models were tuned applying a cost parameter within the interval 1-1000 and a gamma parameter in the interval 0.01-1.
RF is an ensemble learning method that adds randomness to bagging, increasing decision tree diversity by growing them from distinct subsets.RF combines a decision tree, considering the values of an independent random vector sample, with the same distribution for all trees in the forest [78].RF divides the nodes of each decision tree using the best variables from a random sample.RF was computed using R "randomForest" [79] and "caret" packages [80].The model was tuned by applying a number of trees to growth (ntrees) within the interval 1-1000 and a number of variables selected between 1 and 2.
Two regression tree structures based on "If Then" rules were computed using the R package "CORElearn".Those nonparametric techniques differ between them in the regression model applied in the leaf nodes.The first one, the LWLR, or loess, is a method which smoothes the dependent variable as a function of the independent ones to fit a regression surface to data [81].The coefficient of smoothness is defined by computing the weighted mean square error considering a distance function.The second one, the MDL, considers that regularities in a set of data can be used to compress the data by using fewer symbols from a finite alphabet to explain the data faithfully.This method is based on the rule developed by Rissanen [82].

Model Comparison and Validation
Two types of models were compared: (i) models that include the same independent variables as the MLR one; and (ii) models that use two or three ALS variables, being different from those included in the first comparison.
The original sample was split into a training set of 75% of the cases (62 plots) and a test set of 25% of the cases (21 plots).Validation was performed for all the models, being executed 100 times for those methods with associated randomness as SVM, RF, LWLR, and MDL to increase robustness in the results [34].In addition, data were normalized in values ranging from 0 to 1 in order to avoid weights saturation according to Görgens [52].
The model performance was compared in terms of root mean square error (RMSE), relative RMSE (%RMSE) and bias.Furthermore, the Friedman nonparametric test was applied using the RMSE of each plot in order to determine differences between models [83].The Nemenyi post-hoc test was applied to determine whether differences were statistically significant, with a significance level of 0.05 [84].This test was used only for those cases in which the null-hypothesis of the Friedman test was rejected, implying non-equivalence between models.

Results
Table 2 shows a summary of the field plot characteristics.The heights of the inventoried trees range from 3.3 to 16.8 m and they present a variety of diameters, from 8.0 to 27.5 cm.The shrubs present heights ranging from 0.3 to 2.0 m and a CC from 1% to 40%.The average tree biomass fraction is the most important fraction representing almost 75% of TB, and shrub biomass represents in average around 25% of TB.The ALS metrics selected by the five selection processes are presented in Table 3. Considering the maximum number of selected variables established, the selection methods included generally one metric related to the canopy height (CHM), one metric associated with the canopy height variability (CHVM), and another expressing the canopy density (CDM).However, sequential replacement all subsets regression, forward all subsets regression, and PCA selected only two metrics.Regarding CHM, low height percentiles were selected in some cases, but elevation mean and higher percentiles were chosen for the remaining methods.The most selected CHVM was variance.The percentage of first returns above mean and the percentage of all returns above a height break were the CDM metrics with the strongest correlation with TB.All selected metrics showed a logical relation with the dependent variable.The regression models to estimate TB are summarized in Table 4.The MLR and SVM with radial kernel or linear kernel present the lowest RMSE.LWLR, MDL, and RF show lower accuracies.Although most of the models present values of bias close to zero, some of them show a slight overestimation with values close to 1.The selected MLR model including the 25th percentile of height, elevation variance, and the percentage of first returns above mean, present the lowest RMSE with 15.14 tons/ha, a %RMSE of 19.21, and a relative R 2 after validation of 0.87.SVM with radial kernel including the 5th percentile of height, elevation mean, and percentage of first returns above mean is the second best model.SVM with linear kernel, including the same ALS metrics as the MLR method, presents also similar errors.Figure 2 shows the independent variables included in the selected MLR model associated with the vertical distribution of ALS returns in three selected field plots representative of the variability of TB of the Aleppo pine forest under study.The TB of smaller pines with low shrub presence (Figure 2A) is lower than in taller pines with higher presence of understory (Figure 2C).Thereby, the TB presents a direct relationship with these variables (Figure 2B).The higher the variance, the higher the TB as the dispersion of the data increases.Similarly, the increase of the percentage of first returns above the mean implies an increase in TB content.The 25th percentile presents an inverse correlation with the TB.A lower value of the 25th percentile implies a higher presence of shrubs in the understory.method, presents also similar errors.
Figure 2 shows the independent variables included in the selected MLR model associated with the vertical distribution of ALS returns in three selected field plots representative of the variability of TB of the Aleppo pine forest under study.The TB of smaller pines with low shrub presence (Figure 2A) is lower than in taller pines with higher presence of understory (Figure 2C).Thereby, the TB presents a direct relationship with these variables (Figure 2B).The higher the variance, the higher the TB as the dispersion of the data increases.Similarly, the increase of the percentage of first returns above the mean implies an increase in TB content.The 25th percentile presents an inverse correlation with the TB.A lower value of the 25th percentile implies a higher presence of shrubs in the understory.Figure 3 presents the scatter plots of observed and predicted TB for the models that include the same variables as the best selected MLR ones.It should be noted that, in this model, logarithmic transformation of the dependent or the independent variables was not required to meet the linear regression assumptions.MLR, SVM with linear kernel and SVM with radial kernel show high coefficients of determination (0.87, 0.86, and 0.86, respectively), while MDL (0.82), RF (0.80), and LWLR (0.78) present slightly lower coefficients of determination, 0.80 and 0.78, respectively.The implementation of the MLR model (Equation ( 15)) in ArcGIS allowed the estimation of TB.TB = −14195.1 + 4753.9 × Elevation variance + 8992.0 × P25 + 699.6 × % first returns above mean (15) The performance comparison between models by using the Friedman test shows that the models are not equivalent with a p-value of 0.00.Thereby, the required application of the post-hoc Nemenyi test indicates that no statistically significant differences exist between models, with 95% of probability.
Figure 4 shows the TB mapping for the study area.The north part of the study area (Figure 4A) shows a heterogeneous Aleppo pine forest with higher TB values (above 150 tons/ha), which denote mature forested areas with the presence of shrubs and taller trees.In contrast, the southern part (Figure 4B) shows some homogeneous areas with lower TB values (bellow 10 tons/ha).These areas have been affected more recently by several wildfires, contrasting with the pines of area A or some north stands of area B that have higher ages, presenting TB values ranging from approximately 30 tons/ha to more than 120 tons/ha.coefficients of determination (0.87, 0.86, and 0.86, respectively), while MDL (0.82), RF (0.80), and LWLR (0.78) present slightly lower coefficients of determination, 0.80 and 0.78, respectively.The implementation of the MLR model (Equation ( 15)) in ArcGIS allowed the estimation of TB.The performance comparison between models by using the Friedman test shows that the models are not equivalent with a p-value of 0.00.Thereby, the required application of the post-hoc  Nemenyi test indicates that no statistically significant differences exist between models, with 95% of probability.
Figure 4 shows the TB mapping for the study area.The north part of the study area (Figure 4A) shows a heterogeneous Aleppo pine forest with higher TB values (above 150 tons/ha), which denote mature forested areas with the presence of shrubs and taller trees.In contrast, the southern part (Figure 4B) shows some homogeneous areas with lower TB values (bellow 10 tons/ha).These areas have been affected more recently by several wildfires, contrasting with the pines of area A or some north stands of area B that have higher ages, presenting TB values ranging from approximately 30 tons/ha to more than 120 tons/ha.

Discussion
The results demonstrate that low density ALS data can be used to accurately estimate TB.Regarding the selection method, all subsets regression and selection based on Spearman's rank was the most powerful technique for ALS metrics selection.LASSO selection was also a good technique.However, the use of Stepwise selection and PCA showed a restricted power in identifying the best

Discussion
The results demonstrate that low density ALS data can be used to accurately estimate TB.Regarding the selection method, all subsets regression and selection based on Spearman's rank was the most powerful technique for ALS metrics selection.LASSO selection was also a good technique.However, the use of Stepwise selection and PCA showed a restricted power in identifying the best subsets.These findings agree with Garcia-Gutierrez et al. [48], who found that Stepwise selection was the technique with less power, but all subsets regression obtained good results.Although PCA was used by Silva et al. [49] for stem volume modeling, it has been outperformed by other selection techniques for TB estimation.
The selection of a reduced number of biologically representative variables increases the understanding of models for forest management purposes.In this sense, the use of automatic methods for variable selection is considerably less time-demanding when modeling.The three ALS-derived variables included in the best model were coherent and analogous to those derived for characterizing forest vertical diversity [22] and aboveground biomass [15,36].Variables derived from digital surface models (DSM) were not computed in this study.Nevertheless, they have been proposed in other studies because they seem more suitable for tree damage detection [29] or estimation of vertical diversity [19] when not using an area based approach (ABA).These variables concern the variability of different strata of the point cloud and canopy cover distribution.
The comparison between regression models shows that the MLR model had the lowest RMSE (15.14 tons/ha) and bias (0.01), matching with the values obtained by other authors for aboveground biomass estimation [73,85].Thus, according to Görgens et al. [52] and Domingo et al. [38], MLR slightly outperforms other nonparametric methods.Although MLR was the best model, no statistically significant differences were found with other methods.These results partly agree with other approaches [34,36,48], which obtained lower estimation errors with nonparametric techniques.The use of selection methods is useful to better determine the best subset of variables and the comparison of models, such as MLR, SVM with radial kernel, SVM with linear kernel, and LWLR.TB estimation by using trees and shrub allometric equations and low-density ALS data is considered to better account for biomass and carbon reservoirs in Mediterranean forest ecosystems.The comparison of different models for TB estimation including other tree species or biomass estimation in shrubland areas may be considered.In this sense, 35% of the forested land in Spain, which includes grasslands, shrubland, and trees, is covered by shrubs [83].Furthermore, shrub species in the understory, with an average presence of 24%, constitute an important part of TB in Mediterranean Aleppo pine forest.This pattern is also common in the understory of other Mediterranean forests as for example Pinus and Quercus ilex.Shrubland areas are also especially important due to the abandonment of crops and wildfire disturbances [13], that might even increase the presence of these species in the near future.Consequently, further research is needed on the estimation of TB using tree and shrub allometric equations and low density ALS data in shrubland and burned areas to compare with previous higher point density studies.Moreover, the use of ALS technology with higher density, the use of terrestrial LiDAR, or the fusion with high spatial resolution images derived by unmanned aerial vehicles might be useful to better account for forest biomass.

Conclusions
Allometric equations are one of the available methods for estimating forest structural parameters.This method has been widely applied to accurately estimate tree height, crown diameter, basal area, stem density, volume or aboveground biomass at plot level and to relate it to ALS data to estimate it at stand level [36,47,56,57,86].Nevertheless, little research has focused on the estimation of biomass with ALS data including shrub fraction.The lack of knowledge of shrub vegetation associated with the traditionally necessary destructive sampling to generate forest structure equations might have been one of the drawbacks of including those species for biomass and carbon account.In this sense, the development of new allometric equations for shrub species, defined to estimate biomass from simple field measurements and the generalization of ALS data have opened new opportunities for TB estimation.
This study assesses the usefulness of low point density ALS data to accurately estimate biomass, including tree and shrub fraction, in an Aleppo pine forest.The comparison of the effectiveness of five variable selection methods increased modeling efficiency.All subsets regression and manual selection based on Spearman's rank were the most powerful techniques.LASSO selection was also a good technique but, Stepwise selection and PCA showed less power to identify the best subsets.The comparison of parametric and nonparametric regression methods showed that the best model for TB estimation was the MLR.This model included three ALS metrics: the 25th percentile of the return heights, the variance, and the percentage of first returns above the mean, presenting an RMSE of 15.14 tons/ha and a bias of 0.01.No statistically significant differences between the generated models were found.The results allow TB mapping at the regional scale providing useful information for forest management purposes, with values ranging from below 10 tons/ha to areas above 150 tons/ha.This study moves a step forward to compute TB by using allometric tree and shrub equations and ALS data to better account for forest biomass as assets to manage greenhouse gases from a local to a global scale.

Figure 1 .
Figure 1.Study area.Land cover types and location of 83 forest inventory plots.High spatial resolution orthophotography of the Spanish National Plan for Aerial Orthophotography [54] is used as backdrop.Coordinate System: ETRS89 UTM Zone 30 N.

Figure 1 .
Figure 1.Study area.Land cover types and location of 83 forest inventory plots.High spatial resolution orthophotography of the Spanish National Plan for Aerial Orthophotography [54] is used as backdrop.Coordinate System: ETRS89 UTM Zone 30 N.

Figure 2 .
Figure 2. Airborne Laser Scanning (ALS) metrics associated with the vertical distribution of point cloud returns in three selected field plots representative of the study area: (A) smaller pines with low presence of shrubs (area B); (B) average height pines with an intermediate presence of understory (area A); (C) taller pines with high presence of shrubs (area A).The red square represents the average of the shrub height in each plot.

Figure 2 .
Figure 2. Airborne Laser Scanning (ALS) metrics associated with the vertical distribution of point cloud returns in three selected field plots representative of the study area: (A) smaller pines with low presence of shrubs (area B); (B) average height pines with an intermediate presence of understory (area A); (C) taller pines with high presence of shrubs (area A).The red square represents the average of the shrub height in each plot.

Figure 3 .
Figure 3. Scatterplot of predicted vs. observed values of total biomass (TB) (tons/ha) using different regression methods.

Figure 4 .
Figure 4. Maps of TB.High spatial resolution orthophotography from Spanish National Plan for Aerial Orthophotography (PNOA) [54] is used as backdrop.Coordinate System: ETRS89 UTM Zone 30 N.

Figure 4 .
Figure 4. Maps of TB.High spatial resolution orthophotography from Spanish National Plan for Aerial Orthophotography (PNOA) [54] is used as backdrop.Coordinate System: ETRS89 UTM Zone 30 N.

Table 1 .
Values of tree diameter at breast height (dbh) and height growth between the second National Forest Inventory (NFI2) and the third one (NFI3) (11 years) and subtractive values of dbh and height when applying linear interpolated degrowth to match Airborne Laser Scanning (ALS) year.

Table 2 .
Summary of the field plot characteristics (n = 83).CC is canopy cover; TB is total biomass.

Table 3 .
ALS selected metrics using different variable selection methods, when applying a threshold value of 0.2 m to the point cloud data.Seqrep is sequential replacement; P refers to the percentile e.g., P05 is the 5th percentile; Elev. is Elevation; ret. is returns; Elev.AAD is average absolute deviation.

Table 4 .
Summary of the models and validation results in terms of RMSE (tons/ha), %RMSE, and bias (tons/ha).SVM r. refers to Support Vector Machine with radial kernel; SVM l. refers to Support Vector Machine with linear kernel; ret. is returns.

Table 4 .
Summary of the models and validation results in terms of RMSE (tons/ha), %RMSE, and bias (tons/ha).SVM r. refers to Support Vector Machine with radial kernel; SVM l. refers to Support Vector Machine with linear kernel; ret. is returns.