Identifying Forest Structural Types along an Aridity Gradient in Peninsular Spain: Integrating Low-Density LiDAR, Forest Inventory, and Aridity Index

: Forest structure is a key driver of forest functional processes. The characterization of forest structure across spatiotemporal scales is essential for forest monitoring and management. LiDAR data have proven particularly useful for cost-effectively estimating forest structural attributes. This paper evaluates the ability of combined forest inventory data and low-density discrete return airborne LiDAR data to discriminate main forest structural types in the Mediterranean-temperate transition ecotone. Firstly, we used six structural variables from the Spanish National Forest Inventory (SNFI) and an aridity index in a k-medoids algorithm to deﬁne the forest structural types. These variables were calculated for 2770 SNFI plots. We identiﬁed the main species for each structural type using the SNFI. Secondly, we developed a Random Forest model to predict the spatial distribution of structural types and create wall-to-wall maps from LiDAR data. The k-medoids clustering algorithm enabled the identiﬁcation of four clusters of forest structures. A total of six out of forty-one potential LiDAR metrics were utilized in our Random Forest, after evaluating their importance in the Random Forest model. Selected metrics were, in decreasing order of importance, the percentage of all returns above 2 m, mean height of the canopy proﬁle, the difference between the 90th and 50th height percentiles, the area under the canopy curve, and the 5th and the 95th percentile of the return heights. The model yielded an overall accuracy of 64.18%. The producer’s accuracy ranged between 36.11% and 88.93%. Our results conﬁrm the potential of this approximation for the continuous monitoring of forest structures, which is key to guiding forest management in this region.


Introduction
Forest structure has a profound effect on ecosystem function, and it has been proposed as an essential supporting variable for monitoring worldwide biodiversity [1,2], habitat suitability [3,4], or forest productivity [5], among other variables. At continental scales, climatic water availability, defined as aridity, as well as its seasonality and inter-annual variability, are major determinants of forest structure and species composition [6]. In particular, forest structure, among other factors, can largely influence forest productivity and carbon sequestration potential [7][8][9][10]. In addition, forest structure influences forest vulnerability to disturbance effects, which are expected to increase due to climate change (e.g., fire [11], drought [12], and storms [13]).
Natural and human-driven perturbations also determine forest structure, being essential drivers of forest demography, development, and, consequently, of the ecosystem services they provide [14,15] (e.g., regulation of the hydrological cycle, wood provision, variability, and biodiversity [72,73], along with nutrient and carbon dynamics [74,75] in forested landscapes. The main goal of this study was to explore the potential of low point density LiDAR data to classify, at a regional scale, forest structural types across an aridity gradient of peninsular Spain, a highly heterogeneous region with forests highly exposed to climate change, chiefly to drought, fire, and windstorms. Specific objectives of the study were: (i) to identify different typologies of forest structures based on stand level NFI measurements and climatic variables via an unsupervised cluster analysis; (ii) to classify forest structure from LiDAR metrics using a Random Forest modeling approach; (iii) to map regional patterns of forest structure across an aridity gradient along peninsular Spain from the low-density PNOA LiDAR data.

Study Areas
As a result of the topography, ocean influence, and geographical position, the Iberian Peninsula holds a wide range of climatic conditions, with the Mediterranean sub-types being the most common climates [76]. This variability of climates results in a highly functional and structural diversity [77]. We selected four provinces of Spain based on aridity and data availability criteria.
There are several approaches to characterize aridity: based on annual degree-days [78], temperature and precipitation [79], or derived indices using previous variables [80]. We used the Martonne aridity index (M) (Formula (1) [81] due to its simplicity and because it has proven to be useful in areas with marked aridity such as that of our study areas [80,82,83]. We computed M across peninsular Spain as: where T is the mean annual temperature ( • C) and P is total annual precipitation (mm) ( Table 1). Large values of M are associated with humid conditions. T and P were obtained for each 4th Spanish National Forest Inventory (SNFI-4; 2008 to date) plot from WorldClim2 raster maps with a 1 km resolution [84]. Furthermore, two different criteria were applied to select our study areas: (i) open data from the SNFI-4 and (ii) the second LiDAR coverage (see details in Table 2). Since the SNFI-4 and the LiDAR data are both open access projects, they are planned at the provincial scale (secondary administrative unit in Spain). That is why the selection of the study areas was also done at this scale. The provinces that met our criteria were Badajoz (Southwest), Murcia (Southeast), Madrid (Center), and La Rioja (North), encompassing 2770 field plots. The study regions covered a wide temperature and precipitation gradient, with La Rioja showing the greatest precipitation and lowest mean annual temperature. In the opposite situation, we found Badajoz and Murcia. These values define a gradient of increasing aridity from north to south and from west to east (see Figure 1, Table 1 for details).  The study regions covered a wide temperature and precipitation gradient, with La Rioja showing the greatest precipitation and lowest mean annual temperature. In the opposite situation, we found Badajoz and Murcia. These values define a gradient of increasing aridity from north to south and from west to east (see Figure 1, Table 1 for details).

Spanish National Forest Inventory (SNFI)
The field data in this study were obtained from the SNFI-4 for the four provinces in surveys carried out between 2010 and 2017. In Spain, NFI data are taken in 10-year cycles, with the provinces being the sample design unit [85]. The SNFI project establishes plots at 1×1 km UTM grid with a concentric system plot model. The concentric field plots of variable radii measure trees depending on the diameter at breast height (DBH): a radius of 25 m for trees with DBH ≥ 42.5 cm, a radius of 15 m for trees with DBH ≥ 22.5 cm, a radius of 10 m for trees with DBH ≥ 12.5 cm, and a radius of 5 m for trees with a DBH ≥ 7.5 cm. Trees with 2.5 ≤ DBH ≤ 7.5 cm are counted but not measured. For more details about the SNFI data, see Alberdi et al. [28]

Spanish National Forest Inventory (SNFI)
The field data in this study were obtained from the SNFI-4 for the four provinces in surveys carried out between 2010 and 2017. In Spain, NFI data are taken in 10-year cycles, with the provinces being the sample design unit [85]. The SNFI project establishes plots at 1×1 km UTM grid with a concentric system plot model. The concentric field plots of variable radii measure trees depending on the diameter at breast height (DBH): a radius of 25 m for trees with DBH ≥ 42.5 cm, a radius of 15 m for trees with DBH ≥ 22.5 cm, a radius of 10 m for trees with DBH ≥ 12.5 cm, and a radius of 5 m for trees with a DBH ≥ 7.5 cm. Trees with 2.5 ≤ DBH ≤ 7.5 cm are counted but not measured. For more details about the SNFI data, see Alberdi et al. [28] We calculated six stand-related variables including basal area (BA; m 2 ha −1 ), density (N; number of stems ha −1 ), mean plot diameter at breast height (DBH; cm), standard deviation of the DBH (DBH SD ; cm), mean height (HM; m), and standard deviation of HM (HM SD ; m). These measurements were calculated from tree measurements at plot level by using expansion factors [26] in order to define the different types of forests along the aridity gradient.

Airborne LiDAR Data
LiDAR data used in our study areas were collected between 2016 and 2019 during different surveys carried out for the PNOA-LiDAR project. The point cloud had a point density between 0.5 and 2 points/m 2 , depending on the province and flight characteristics (see Table 2). LiDAR tiles in LAS format were obtained from the National Geographic Information Centre (http://centrodedescargas.cnig.es/CentroDescargas/buscadorCata logo.do?coT1amilia=02211#, accessed on 1 November 2021). In total, 17733 tiles were required in order to fully encompass the study areas. A 2-m resolution digital elevation model (DEM) was generated from filtered ground returns and used to normalize the height of the returns. Canopy LiDAR metrics for each SNFI-4 plot were extracted using FUSION/LDV software [86], as well as several scripts developed in MATLAB. Furthermore, we used MATLAB to get metrics that could not get extracted from LiDAR. We applied a predefined height threshold of 2 m to separate trees from understory vegetation with FUSION/LDV, as well as with MATLAB. For each of the field plots, a total of 41 metrics were computed from the normalized point cloud (see Table S3 (Supplementary material)).

Determination of Structural Types from SNFI-4 Data
The structural types of the study areas were identified based on six SNFI-4 structural variables (BA, N, HM, HM SD, DBH, and DBH SD ) and the aridity index (M) using a kmedoids algorithm. K-medoids is a clustering technique that breaks the dataset into groups, minimizing the distance between points labeled to be in a cluster and a point designated as the center of that cluster [87]. We decided to use a k-medoids algorithm as this approach is more robust than other clustering methods (e.g., k-means) [87]. K-medoids minimizes the sum of dissimilarities of data objects, reducing noise and offering better results when a significant number of outliers are expected. Furthermore, the k-medoids method improves calculations given complex, overlapping data, and it reduces execution time compared with others clustering algorithms [88].
To choose the optimal number of clusters, a gap statistic clustering method was implemented [89]. The gap statistic approach is a standard method for determining the number of clusters in a dataset. The gap statistic method standardizes the graph of log (Wk), where Wk is the within-cluster dispersion, by comparing it to its expectation under an appropriate null reference distribution of the data [90]. The estimate of the optimal clusters corresponds to the value that maximizes the gap statistic (i.e., that yields the largest gap statistic). This means that the clustering structure is far away from the random uniform distribution of points [91]. Interpretation of cluster variables allowed us to define the main structural types found in our study areas. After clustering the SNFI-4 plots, this information served as a reference to classify plots based on LiDAR data.
Once forest structural types were defined, we extracted the species for each structural type from the SNFI-4 in order to determine the main species that made up each forest structural type. Thus, the main species (i.e., those accounting for more than 50% of the total BA of the plots) were assigned to the structural class.

Discrimination of Structural Types from Airborne LiDAR Data
LiDAR data provide detailed descriptions of forest structure; however, the accuracy of LiDAR-based estimates depend on sensor and surveying characteristics [60]. Among them, point density is a critical parameter in determining the accuracy with which structural variables can be obtained from LiDAR [60,61,92]. Moreover, this effect is more important at landscape than at plot scale [93]. Therefore, the ability of low-density airborne LiDAR data to discriminate forest structural types at the province scale was assessed by means of a supervised classification approach using the cluster derived from the SNFI as reference data. A Random Forest classification (RF) was used to assess the capability of LiDAR metrics to discriminate the forest structures identified from the field data [94]. A summary of the LiDAR metrics is detailed in Table S3 (Supplementary material). The facts that RF (i) allows the detection of nonlinear relations among variables, (ii) considers interactions of different order among predictors, and (iii) can predict categorical variables (as carried out in this case) makes this technique a good option to reach the goal of the study [95]. A first RF model was run to extract the variables with the largest variance importance (calculated through the mean decrease accuracy metric, which evaluated the reduction of model performance when the variable of interest was excluded), obtaining six LiDAR metrics as the most important variables for the model. A second RF model was then run with these variables in order to classify and map forest structural types regionally on LiDAR data. The dataset was randomly split. For both RF, a sample of 70% of the SNFI-4 plots (n = 1938 plots) were used for training the model and the rest 30% of the SNFI-4 plots (n = 832 plots) were using for the validation process. The RF model was set to grow to 500 trees while the parameter of mtry (number of variables randomly sampled as candidates at each split) was one third of the variables. Producer's accuracy (how often are real features on the ground correctly shown on the classified map or the probability that a certain land cover of an area on the ground is classified as such) and user's accuracy (how often the class on the map will actually be present on the ground, referred to as reliability) [96] were calculated using the confusion matrix.
The six variables used in the second RF were CCM2, ((All return above 2 m/Total first returns) *100), representing canopy cover; Mean Height of the Canopy Profile (MCHP), represents the relative vertical distribution of canopy surface area (vertical vegetation profile) and accounts for occlusion of the laser energy by the canopy [97]; Area under the canopy curve (AC), related with the forest stand biomass; Percentile 5 (P05) and Percentile 95 (P95) represents height where the 5% and 95%, respectively, are below it; Difference between Percentile 90 and Percentile 50 (Dif 90_50), shows the difference between P90 and P50 and it is related with the biomass location.
Data exploration revealed that the number of plots differs among structural types, i.e., data were unbalanced. This could lead to under-prediction for the less abundant structural types relative to their true proportions because RF attempts to minimize the overall error rate [98]. We solved this issue using a downsampling approach. The downsampling method is a mechanism that reduces the count of training samples falling under the majority class. It helps to even up the counts of target categories [99].
We estimated the overall accuracy of the classification, the omission, and commission errors. While the omission errors refer to those observations that were classified as reference sites that were left out from the correct class in the classification process, the commission errors refer to those observations that were missed in classification in the specific group [100].
All statistical analyses referring to clustering methods were performed using the Nbclust R package [89]. The RF analyses were performed using the randomForest R package [94] in R.4.0.3, R Core Team (2020).

Regional Forest Structural Types Mapping
Once the RF model for classifying forest structural types was trained and validated, it was applied regionally to identify forest structural type patterns across the four provinces studied and across the aridity gradient. We processed 17,733 tiles and computed the six most important metrics at 50 m resolution, which results in a pixel area close to the area covered by the SNFI plots. We used the Spanish National Forest Map (https://www.miteco.gob.es/ es/biodiversidad/servicios/banco-datos-naturaleza/informacion-disponible/T4e50_des cargas_ccaa.aspx, accessed on 1 November 2021) to mask forest stands, discarding urban, croplands, or any other land use cover units without woody vegetation cover.
Mapping the forest structural types for each province allowed us to identify spatial patterns of forest structure including boundaries of forest structure along the aridity gradient ( Figure 5). See Table S1 for more details (Supplementary material).
To clarify the process of identification, classification, and mapping of forest structural types, Figure 2 shows a flowchart that summarizes the steps followed in our study. area covered by the SNFI plots. We used the Spanish National Forest Map (https://www.miteco.gob.es/es/biodiversidad/servicios/banco-datos-naturaleza/informacion-disponible/T4e50_descargas_ccaa.aspx, accessed on 1 November 2021) to mask forest stands, discarding urban, croplands, or any other land use cover units without woody vegetation cover.
Mapping the forest structural types for each province allowed us to identify spatial patterns of forest structure including boundaries of forest structure along the aridity gradient ( Figure 5). See Table S1 for more details (Supplementary material).
To clarify the process of identification, classification, and mapping of forest structural types, Figure 2 shows a flowchart that summarizes the steps followed in our study.

Downsampling Cloud Point Homogenizing Point Density
In order to assess the point density effect in our model, we downsampled the point cloud density. We took as reference the province with the lowest point density (Murcia, 0.5 p m −2 ) and reduced the point cloud density of the remaining provinces (La Rioja, 2 p

Downsampling Cloud Point Homogenizing Point Density
In order to assess the point density effect in our model, we downsampled the point cloud density. We took as reference the province with the lowest point density (Murcia, 0.5 p m −2 ) and reduced the point cloud density of the remaining provinces (La Rioja, 2 p m −2 ; Madrid, 1 p m −2 ; and Badajoz, 1 p m −2 ) to match the point density of Murcia, thus obtaining a homogeneous point density for the four provinces in our study.
Then, we extracted a new set of LiDAR metrics from the downsampled point density data and followed the same process described above to fit the RF model. Finally, we obtained the confusion matrix and evaluated the impact of the point density reduction on the accuracy of the classification.

Forest Structure Based on National Forest Inventories Data
The gap statistic clustering method identified four clusters as the optimal number of forest structural types for the four provinces studied. We assigned names to each cluster based on species composition and aridity values of the structural type. Thus, we will refer hereafter to each typology as (i) Sclerophyllous stands (SCS), (ii) Agrosilvopastoral and/or Open Woodlands (AOW), (iii) Coppices and oromediterranean pines (COP), and (iv) High mountain species (HMS). The structural characteristics of the identified types were summarized in Table 3. SCS was the most common type throughout the gradient, with 40.79% of the total plots located in this type and a wide representation in different provinces. Plots classified as COP represented the lowest percentage with 16.89% of the plots. HMS included 20.97% of our plots, and AOW had 21.37% of the total plots considered in this study ( Figure 3).  In relation to the forest inventory variables, COP showed large values of BA and displayed the largest N (Table 3). However, we found that the mean DBH of this structural type was the lowest. As regards the aridity, this structural type, with a value of M = 26, tended to occur in moderately humid sites (i.e., in sites with large values of M). Similarly, HMS also had large values of BA and occurred in humid sites (M = 36.6). However, these stands had a smaller tree density and the mean DBH was larger than in COP. HMS showed the highest value of HM and it displayed a large structural heterogeneity. Regarding AOW, it presented the largest DBH among types, yet was the most variable. This structural type held the lowest density and the second lowest BA. With respect to aridity (M = 21.1), AOW appeared in areas with moderate aridity. SCS showed the lowest value of HM as well as the smallest BA among all structural types. Of the studied sites, this structural type occurred in sites with the highest aridity (M = 18). See Figure 3, Table 3, and Figure S2 (Supplementary material) for more details.
When we evaluated the relative distribution of diameter by classes (Figure 4), an inverse J distribution was suggested for SCS, COP, and HMS, being the lowest diameter class (7.5-17.5 cm) with the largest proportion of trees for the four forest types. This is especially evident for SCS and COP. However, the diameter class with the second largest proportion of trees was 37.5-47.5 cm for AOW. Hence, this forest structural type displayed the largest proportion of big trees (dbh ≥ 37.5 cm) when compared to the other three forest structural types. See Figures S3 and S4   In relation to the forest inventory variables, COP showed large values of BA and displayed the largest N (Table 3). However, we found that the mean DBH of this structural type was the lowest. As regards the aridity, this structural type, with a value of M = 26, tended to occur in moderately humid sites (i.e., in sites with large values of M). Similarly, HMS also had large values of BA and occurred in humid sites (M = 36.6). However, these stands had a smaller tree density and the mean DBH was larger than in COP. HMS showed the highest value of HM and it displayed a large structural heterogeneity. Regarding AOW, it presented the largest DBH among types, yet was the most variable. This structural type held the lowest density and the second lowest BA. With respect to aridity (M = 21.1), AOW appeared in areas with moderate aridity. SCS showed the lowest value of HM as well as the smallest BA among all structural types. Of the studied sites, this structural type occurred in sites with the highest aridity (M = 18). See Figure 3, Table 3, and Figure S2 (Supplementary material) for more details.
When we evaluated the relative distribution of diameter by classes (Figure 4), an inverse J distribution was suggested for SCS, COP, and HMS, being the lowest diameter class (7.5-17.5 cm) with the largest proportion of trees for the four forest types. This is especially evident for SCS and COP. However, the diameter class with the second largest proportion of trees was 37.5-47.5 cm for AOW. Hence, this forest structural type displayed the largest proportion of big trees (dbh ≥ 37.5 cm) when compared to the other three forest structural types. See Figures S3 and S4 for more details.  Considering all provinces, SCS was the structural type most widely distributed across our study zones, i.e., the structural group that presented the highest abundance of all those evaluated in this work. However, this structural type was consistently more abundant in the south, especially in the province of Murcia. AOW was the second main structural type in our study, and it was also prevalent in the south. In this case, Badajoz had the largest distribution of this structural type. COP and HMS were mostly represented in the provinces with higher M values, Madrid and La Rioja, predominantly in Considering all provinces, SCS was the structural type most widely distributed across our study zones, i.e., the structural group that presented the highest abundance of all those evaluated in this work. However, this structural type was consistently more abundant in the south, especially in the province of Murcia. AOW was the second main structural type in our study, and it was also prevalent in the south. In this case, Badajoz had the largest distribution of this structural type. COP and HMS were mostly represented in the provinces with higher M values, Madrid and La Rioja, predominantly in mountain locations where the aridity level was low.

Discrimination of Forest Structural Types from Airborne LiDAR Data
An initial run of RF, using a downsampling approach, enabled us to identify the most important variables and calibrate a model with balanced data. The six most important variables were, in decreasing order, CC2M ((All return above 2 m/Total first returns) * 100), representing canopy cover;  Tables 4 and 5). The performance of the RF with or without the downsampling process (balanced and unbalanced data) was similar in both cases, i.e., quite similar results in analysis with balanced and unbalanced data. However, the prediction of classes increased consistently when we operated with balanced data (Tables S1 and S2 (Supplementary material)). Therefore, the implementation of the downsampling method showed ecologically more coherent results when spatialized ( Figure 5). See Figure S1

The Effect of Homogenizing Point Cloud Density on Model Accuracy
Once we carried out the downsampling point density analysis, i.e., homogenizing point densities among provinces, we extracted the contingency tables (Supplementary material, Tables S6 and S7). The overall accuracy was 59.3% for the training dataset and 60.7% for the validation dataset.
By forest structural type, we can observe how SCS, AOW, and COP were not affected by downsampling the point density (SCS increase the accuracy 0.5%, AOW reduced the accuracy 4%, and COP increased accuracy 4% for the validation dataset). However, HMS reduced the accuracy by more than 13% with respect to its accuracy before reducing the

The Effect of Homogenizing Point Cloud Density on Model Accuracy
Once we carried out the downsampling point density analysis, i.e., homogenizing point densities among provinces, we extracted the contingency tables (Supplementary  material, Tables S6 and S7). The overall accuracy was 59.3% for the training dataset and 60.7% for the validation dataset.
By forest structural type, we can observe how SCS, AOW, and COP were not affected by downsampling the point density (SCS increase the accuracy 0.5%, AOW reduced the accuracy 4%, and COP increased accuracy 4% for the validation dataset). However, HMS reduced the accuracy by more than 13% with respect to its accuracy before reducing the point density.

Species Composition and Forest Structural Types
The results obtained from the extraction of the main species for each structural type using the SNFI-4 show the presence of a wide range of species adapted to different aridity conditions ( Figure 6). SCS was assessed by two main species: P. halepensis, which represented more than 80% of the total BA of the plots contemplated, and Quercus ilex. AOW was composed of up to three main species: Q. ilex, Pinus pinea L., and P. halepensis. However, Q. ilex represented more than 60% of the total BA of the plots. COP was composed of several ecologically distinct species. In decreasing basal area abundance: Quercus pyrenaica Willd., Pinus halepensis Mill, Quercus ilex L, Pinus nigra Arn., and Pinus sylvestris L. Three main species comprised HMS: P. sylvestris, Fagus sylvatica L., and Q. pyrenaica. In this structural type, P. sylvestris represented more than 60% of the total BA. F. sylvatica and Q. pyrenaica made up, in decreasing order of abundance, the rest of the BA of the plots. SCS was assessed by two main species: P. halepensis, which represented more than 80% of the total BA of the plots contemplated, and Quercus ilex. AOW was composed of up to three main species: Q. ilex, Pinus pinea L., and P. halepensis. However, Q. ilex represented more than 60% of the total BA of the plots. COP was composed of several ecologically distinct species. In decreasing basal area abundance: Quercus pyrenaica Willd., Pinus halepensis Mill, Quercus ilex L, Pinus nigra Arn., and Pinus sylvestris L. Three main species comprised HMS: P. sylvestris, Fagus sylvatica L., and Q. pyrenaica. In this structural type, P. sylvestris represented more than 60% of the total BA. F. sylvatica and Q. pyrenaica made up, in decreasing order of abundance, the rest of the BA of the plots.

Low-Density LiDAR Data Considerations
In this study, we have identified and described variations in forest structure along the Mediterranean-temperate forest ecotone by combining low point density LiDAR, NFI, and climate data.
Once we extracted the forest stand variables and M was calculated for each plot, a k-medoids algorithm was implemented obtaining four structural types for the aridity gradient contemplated in our study. Many studies have implemented different clustering methods to define and classify forest structure [17,55,56]. For example, Abdullahi et al. [101] applied a k-means approach to classify forest structure based on height information derived from interferometric X-band Synthetic Aperture Radar. Moran et al. [102] utilized a hierarchical clustering algorithm to estimate forest structure classes using high point density airborne LiDAR data in Montana, USA. As seen, there are different ways to group forest structures. However, Wallace et al. [103] advised that hierarchical solutions were not as effective as nonhierarchical approaches, at least in stands where the forest structure is not homogeneous. We advocated non-hierarchical clustering methods for three main reasons: (i) it is difficult to update hierarchies without rebuilding the classification from scratch, (ii) acknowledging reticular relationships between vegetation types is more realistic than imposing a hierarchical structure [103], and (iii) concentrating on the definition of types does not preclude imposing a further structure a posteriori [104]. Following these conclusions, and because of the complexity of the Mediterranean forest structure, we chose a nonhierarchical method to cluster our structural types.
There are multiple examples where forest structure was defined directly using LiDAR data information [57,[105][106][107][108]. However, in those studies, high point density LiDAR data were used, providing high levels of accuracy. Since the PNOA LiDAR project contains low point density, we decided to define forest structural types through SNFI-4 firstly, instead of attempting to discriminate forest structural groups directly from the LiDAR data. In this way, the structural characteristics of each group are defined from field data that are subsequently used to train a supervised algorithm that discriminates such groups from LiDAR variables, describing different aspects of forest structure.
With respect to the LiDAR variables selected, we found CC2M, MCHP, Dif 90_50, AC, P05, and P95 (see Table S1 (Supplementary material)) the most important variables to discriminate forest structural types. Görgens et al. [109] pointed out that metrics selected using automated processes might differ completely between studies and thus require the identification of stable metrics to be used as predictors to facilitate model generalization (see also [110,111]). For instance, Valbuena et al. [55] used a methodology for automated classification of forest areas from LiDAR datasets based on two direct metrics: L-coefficient of variation and L-skewness. Torresan et al. [17] utilized standard deviation and canopy mode heights, percentile 95, percentile 99, and the difference between percentile 90 and percentile 10. Despite each study selecting different metrics, many of the LiDAR-derived metrics are strongly correlated [112]. Moreover, the metrics selected in all studies represent complementary aspects of the 3D structure of forest stands. Huesca et al. [113] defined forest structural types based on metrics describing the horizontal structure and another set of metrics describing the vertical structure of forests. In our study, the selected metrics described both aspects of forest structure: CC2M described the canopy cover; P95 and MCHP were related to the mean and dominant height of the canopy [114]; AC and Dif 90_50 were associated with biomass and its distribution through the canopy [92,115], respectively; and P05 was related to the presence of understory strata. These metrics provide a comprehensive description of the vertical and horizontal distribution of the forest.
The overall accuracy in our study was 60.63% and 64.18% for the training and the validation set, respectively. These results outperformed other studies carried out in Mediterranean environments, despite the low point density of our datasets. For instance, Torresan et al. [17], using an agglomerative hierarchical clustering and point density of 1.28 p m −2 , detected five clusters of forest structure in the province of Trento (Italy) using DBH. They attained an overall accuracy of 54.1%, lower than the overall accuracy obtained in our study. Valbuena et al. [55] obtained an overall accuracy above 80% using L-coefficient of variation and L-skewness ALS metrics to classify boreal forest types with a point density of approx. 1 p m −2 . They conducted their study in Finland, where forest areas are much more structurally homogeneous than those in Mediterranean environments [116]. However, it was also remarkable that Valbuena et al. [55] considered only two LiDAR metrics, defining forest as open and closed canopies and even-size and uneven-sized trees (two by two), depending on the metric value. In our study, we took into account a total of four structural types defined using six LiDAR variables, which could reduce the accuracy of our classification because of the complexity of the structural groups.
It should be noted that it was not only the structural complexity of the studied places, but also the total spatial extent of such. For example, Adnan et al. [56] developed a methodology to classify forest structural types across bioregions applying a hierarchical clustering analysis to determine forest structural types in coniferous and deciduous forests using NFI variables. They achieved an overall accuracy between 72% and 87%, depending on the functional type (deciduous vs. coniferous). It was also noteworthy that Adnan et al. [56] considered a total of 566 plots, while in our study we contemplated 2770 plots. The large spatial extent we considered in our study allowed us to detect variations along the aridity gradient in relation to the forest structure. However, this large spatial extent could be considered a drawback, since, when considering a large area, we obtain high spatial variability which could influence the accuracy of our model to detect the structural types defined in the study [93].
The producer's accuracy achieved in our study ranged between 36.11% and 84.93% for the validation dataset, whereas user's accuracy remained more consistent for all classes. The highest producer's accuracy was reached for the classification of SCS (84.93%). This could be because SCS is the most frequent class in the study regions as well as the most homogeneous group in terms of forest structure. The low accuracy obtained for some of our structural types could be due to their structural complexity. When the structural complexity of forests increases, the accuracy of laser metrics can be affected due to the interference of middle forest strata [117], particularly when using low density data. This fact coincides with other studies like Torresan et al. [17] where the presence of middle strata, or even understory, caused considerable impact to the density of LiDAR points below the canopy, thus affecting the accuracy of the analyses [118].
The literature shows a general tendency to decrease the producer's accuracy with stand density, relative tree height, presence of forest understory, or tree clustering [119,120]. In line with this tendency, our producer's accuracy decreased in structural groups with high density and moderate stand height (COP). The producer's accuracy also tended to decrease when we classified structural groups where trees are usually clustered and there are abundant middle and understory strata (AOW). Moreover, low density data have only a moderate ability to predict stand density [17,120,121]. On the other hand, in groups with moderate tree density, where stands tend to be more homogeneous, our producer's accuracy increased consistently (SCS-HMS).
The user's accuracy achieved moderately satisfactory results in terms of classification performance. Considering that the user's accuracy shows the suitability of the choice and identification of forest groups [41], we can argue that our user's accuracy is highly homogeneous among forest structure types (between 58.59% and 65.55%). Furthermore, given the spatial extension considered in our study, the high number of observations along the aridity gradient, and the heterogeneity of Mediterranean forests, our results can be considered satisfactory.
When we considered the downsampling point cloud density results, we observed that the accuracy of SCS, AOW, and COP remained the same as before the downsampling process (less than 4% difference). These forest structural types were mainly located in provinces where the point density was lower before the downsampling point density analysis (see Figures 3 and 5), i.e., provinces with similar original point cloud density. However, HMS reduced the accuracy by 13% with and without downsampling point density, respectively. HMS is mainly located in La Rioja, the province with the highest point density before the downsampling process. This result is coincident with those results from Disney et al., Leitold et al.,, which show how point density had a significant impact on the accuracy of structural forest variables. It is also remarkable, as mentioned before [17,120,121], that low point density LiDAR presents a moderate capacity to predict forest stand density.
It should be noted that increasing the point cloud density could improve the results obtained in this and similar studies where forest structure is being defined. As it is observed in this study, HMS worsened its accuracy when we downsampled point density; demonstrating that when a denser point cloud is contemplated, results can significantly improve. To that end, and due to the capacity of the new sensors developed during the last few years, it is possible to increment the point cloud density at the national level in the 3rd PNOA-LiDAR coverage until around 5 p m −2 (M. Jurado, personal communication).
Another important factor affecting the achieved accuracy when we evaluated forest stand variables and structure was the time lapse between the NFI data and LiDAR data collection [122]. In our study, the time lapse between SNFI-4 and LiDAR data collection ranged from 1-2 years in Badajoz to 6-7 years in Murcia. The time difference in Madrid was 3-4 years, and 4 years in La Rioja. These differences between data collections likely affected the accuracy of our study, especially when we were evaluating possible transitional forest structural types, such as that considered in COP.

Species Composition and Forest Structural Types
The results obtained from the extraction of the main species for each structural type showed the utility of using NFI to figure out the species conforming to our structural types ( Figure 6). In fact, without NFI, it would not be possible to identify these species only with LiDAR.
COP was composed of stands of oromediterranean species such as P. nigra, P. sylvestris and, mainly, of low forest resprouting species such as Q. pyrenaica and Q. ilex [123,124]. This type is related to highly densified structures, both resulting from older pine afforestation and abandoned coppiced stand, but also post-disturbance (i.e., fire), high density stands. The low diameters characterizing low forests may explain the high proportion of small trees in the histogram of frequencies of trees by diameter class ( Figure S4, supplementary material) as well as the low mean DBH. In addition, this structural type was present in riparian forests, especially in central and northern Iberian Peninsula, where aridity is generally lower. Species like Populus spp., Fraxinus angustifolia Vahl., or other species with a tall stature were part of these forests [125]. HMS shared a similar ecological space with COP, although it occurred in sites that were more humid. These stands presented larger DBH and lower N than COP. This forest type aggregated mature individuals of montane species such as P. sylvestris, F. sylvatica, Q. pyrenaica [126]. The diametric distribution for this type also followed an inverse J but it was more flattened than in the case of SCS and COP. To some extent, these two facts could be due to: (i) these species occupy many areas with low accessibility where the wood extraction is limited, and (ii) forest management prescription usually promotes large diameters to obtain high-value by-products. This forest structural type was also present by riversides, sharing species with COP. AOW corresponded to agrosilvopastoral systems related to more or less open woodlands and savanna-like formations (namely dehesas or montados) and it was composed mainly of Q. ilex [127,128]. In agreement with our results, these stands usually display low tree density and high plot diameter. Low rates of regeneration and small trees are also characteristic of these forests, threatening the persistence of these stands. Finally, SCS forests were made up of drought-adapted species such as Q. ilex and P. halepensis [129][130][131]. The warm and dry climatic conditions and the asexual reproduction of Q. ilex could be behind the low mean diameter found for this forest type. These results overall provided a compositional description of our forest structural types.
The distribution of the structural types was partially segregated along climatic gradients. SCS was more abundant towards south and east, Murcia being the most arid province under study and presenting the highest relative extent of this structural type. This structure was coincident in space and plot mensuration data for forest structures described in previous studies for the southeast of Spain [132,133]. In our study zones, AOW had its largest and most continuous occurrence in the province of Badajoz. This result could be due to the presence of open woodlands, which are characterized by low densities, and in our study areas, they were mainly found in Badajoz [134]. HMS was essentially distributed through the high part of the mountains, mainly located in the north of Madrid, and the south and central areas of La Rioja. HMS was also observed along rivers. The species composition and distribution were similar to that described by López Estéban [135] for the Spanish central mountains and by Arnáez et al. [136] for the region of La Rioja. Finally, COP occupied similar ecological locations to those of HMS, but in places with higher aridity. This structural type showed the smallest occurrence among the structural types considered for this study. This condition could be due to its particular position in the aridity gradient [137].
These four structural typologies suggest divergent functional performance in terms of exposure and vulnerability (sensitivity and adaptive capacity) in response to climate change hazards (chiefly hot spells and intense droughts, windstorms, and wildfires). Therefore, their description can be assisted to the definition of adaptation priority regions (i.e., targeting thinning in order to decrease drought vulnerability) (COP) or alternative plantations to foster recruitment (AOW).

Other Methodological Remarks
Forest structure refers to the three-dimensional arrangement of vegetation (especially focused on trees) considered at a range of spatial scales, in combination with nonliving spatial elements such as soils, slopes, and hydrology [138]. Due to the interpretability of the term "forest structure" and its definition, there is a wide range of studies where different variables have been used to discriminate forest structural types. Torresan et al. [17] utilized DBH to classify forest structure stands into understory, midstory, and overstory. However, we considered that for regional studies like ours, taking only one structural variable would make us spatially underestimate the real structure of the forest stands present in peninsular Spain. Adnan et al. [56] used four forest structural variables to classify forest structure: quadratic mean diameter (QMD), Gini coefficient (GC), basal area larger than mean (BALM), and density of stems (N). In our study, we included BA, DBH, and DBH SD, which integrate QMD and BALM. GC is an index of inequality utilized in ecology to determine tree size variability [55]. Instead, we utilized HM and HM SD as an indicator of stand height variability. These two variables, plus DBH and DBH SD , accounted for the variability attributable to the different allometries (i.e., DBH-HM relationship), along with growth among species, site conditions, and stand development stage. The chosen variables N and BA allowed us to differentiate fully stocked forest types from medium or open stands [129,131,139,140]. Similar to our approach, De Cáceres et al. [141] developed a general method to classify forest stands using species composition and the forest structure variables BA, DBH, and HM.
Regarding data accuracy, airborne LiDAR scanners are highly accurate nowadays [38], so the LiDAR-based estimates are related to factors like point density, modeling process (ALS metrics, modeling approach, training data size), plot co-registration, forest composition, etc. [142][143][144]. In this direction, we had into account for those requirements in order to ensure a correct design and treatment of LiDAR data.
Improving the knowledge about forest structure in large areas remains a priority in forest ecology [18,19], especially in ecosystems highly exposed to climate change [145,146].
However, many studies carried out using LiDAR data to predict forest structure were completed using a relatively small spatial extent [18,147,148]. Conversely, our study contemplated four ecologically distinct Spanish provinces that comprise 10.6% of the total forest area of Spain [149]. The geographical location of the provinces and the large number of plots considered therein provided a well-defined aridity gradient so that our sample is representative of the main structural types present in peninsular Spain.
The analysis and methodology developed in this study can guide decision-making regarding Mediterranean forest management, resource use, and ecosystem services. The maps developed in this work can also be used to communicate different forest biodiversity conservation issues, e.g., landscape fragmentation of different forest types [150]. The identification and distribution of these forest structures on a regional, or even national scale, allow us to prioritize specific actions for forest ecosystem management and conservation.

Conclusions
We evaluated the usefulness of low-density discrete return airborne LiDAR data to predict forest structural types identified with National Forest Inventories along a wide aridity gradient covering peninsular Spain. The identification of Spanish forest structural types considering climatological variables opens an interesting window to support and strengthen current forest management, conservation, and monitoring programs. Despite our model having achieved a satisfactory overall accuracy, there is a considerable margin for improvement through the densification of the LiDAR data point cloud, as well as concurrent data field collection. The classification of forest structural types proposed is interesting for the management and protection of Mediterranean forests, prevention of natural disasters, and conservation of ecosystem services. Furthermore, our methodology can be adapted to detect and analyze changes over space and time, especially considering that the PNOA includes multitemporal LiDAR coverage for the whole of the Spanish territory. These previous considerations will allow us to develop better monitoring of forest stands and their progression, giving us a vision on a national scale of the different Spanish forests and their evolution.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/1 0.3390/rs14010235/s1, Table S1: Number of pixels and percentages (in brackets) of each structural type by province with balanced classes. SCS: Sclerophyllous stands; AOW: Agrosilvopastoral and/or Open Woodlands; COP: Coppices and oromediterranean pines; HMS: High mountain species. Table  S2: Number of pixels and percentages (in brackets) of each structural type by province with no balanced classes. SCS: Sclerophyllous stands; AOW: Agrosilvopastoral and/or Open Woodlands; COP: Coppices and oromediterranean pines; HMS: High mountain species.  Table S8: LiDAR flight and sensor characteristics. Figure S1: Distribution of forest structural type per province with no balanced classes. SCS: Sclerophyllous stands; AOW: Agrosilvopastoral and/or Open Woodlands; COP: Coppices and oromediterranean pines; HMS: High mountain species. Data Availability Statement: The datasets and R code used for the development of this work are available upon request to the author Data analyzed in this study were extracted from the following web addresses: LiDAR data: http://centrodedescargas.cnig.es/CentroDescargas/catalogo.do?Serie= LIDAR (accessed on 1 November 2021). National Forest Inventory data*: https://www.miteco.gob.e s/es/biodiversidad/servicios/banco-datos-naturaleza/informacion-disponible/ifn3.aspx (accessed on 1 November 2021). *data from the fourth National Forest Inventory are not currently available to the public as of publication of this paper.