Extracting Canopy Surface Texture from Airborne Laser Scanning Data for the Supervised and Unsupervised Prediction of Area-Based Forest Characteristics

Area-based analyses of airborne laser scanning (ALS) data are an established approach to obtain wall-to-wall predictions of forest characteristics for vast areas. The analyses of sparse data in particular are based on the height value distributions, which do not produce optimal information on the horizontal forest structure. We evaluated the complementary potential of features quantifying the textural variation of ALS-based canopy height models (CHMs) for both supervised (linear regression) and unsupervised (k-Means clustering) analyses. Based on a comprehensive literature review, we identified a total of four texture analysis methods that produced rotation-invariant features of different order and scale. The CHMs and the textural features were derived from practical sparse-density, leaf-off ALS data originally acquired for ground elevation modeling. The features were extracted from a circular window of 254 m2 and related with boreal forest characteristics observed from altogether 155 field sample plots. Features based on gray-level histograms, distribution of forest patches, and gray-level co-occurrence matrices were related with plot volume, basal area, and mean diameter with coefficients of determination (R2) of up to 0.63–0.70, whereas features that measured the uniformity of local binary patterns of the CHMs performed poorer. Overall, the textural features compared favorably with benchmark features based on the point data, indicating that the textural features contain additional information useful for the prediction of forest characteristics. Due to the developed processing routines for raster data, the CHM features may potentially be extracted with a lower computational burden, which promotes their use for applications such as pre-stratification or guiding the field plot sampling based solely on ALS data.


Introduction
Airborne laser scanning (ALS; also referred to in some instances as "airborne scanning LiDAR") has become a routinely operated technique for wall-to-wall prediction and mapping of various forest characteristics [1].Due to straightforward implementation with ALS data acquired broadly for ground elevation modeling [2][3][4], so-called area-based approaches are the most established prediction techniques.By area-based approaches, we fundamentally refer to the two-stage procedure [5] in which: (i) models to predict the forest attributes of interest for the individual areas of interest (AOIs) are fit based on a set of training field plots; and (ii) the resulting models are applied to all the AOIs of the entire inventory area to produce wall-to-wall predictions.Operational implementations are elaborated upon in [6][7][8].
To develop predictive relationships between ALS and field training data, relevant features need to be extracted.The sparse pulse densities (typically < 1 m ´2 according to [6,7]) allow extracting features related to the distributions of height values or proportions of certain types of echoes [5,9], which do not account for the full horizontal information available in the data.It is possible to extract structural and volumetric features from sparse, area-based data [10][11][12], whereas applying notably higher pulse densities has allowed understory- [13][14][15] or tree-level descriptions based on canopy density and height models [16][17][18].Canopy height models (CHMs) are rasterized images representing interpolated difference in elevations between the top of the vegetation and ground level.Importantly, the sparse pulse density of ALS data constrains the CHM pixel size and therefore affects the accuracies of the subsequent forest attribute predictions.However, some researchers have related the CHM texture derived even from sparse ALS data to the properties of the growing stock such as the spatial pattern of the trees [19,20].
Texture analysis is a well-established field of image analysis with early forest applications for optical data acquired by varying sensors [21][22][23][24][25][26][27].Additionally, Tuominen and Pekkarinen [28] provide a comprehensive overview of parameters affecting the prediction accuracies of most essential forest attributes based on very high-resolution image data in forest conditions closely corresponding to those of the present study.Textural features derived from the aerial images were used as predictor variables with ALS point-based features in [29,30].Even though ALS-based CHMs were readily suitable for texture analyses, those did not appear until about a decade after the introduction of the ALS-based CHM products [31].CHMs derived from high-density ALS data were analyzed for their texture in [32][33][34], and those were derived using practical sparse pulse densities in [19,20,35].The applications are, to date, related to predicting species, diversity, and the spatial arrangement of the trees, whereas the relationships between the textural features derived from the CHMs and forest attributes are not comprehensively known like in the case of aerial images [28].
In all ALS studies cited in the previous paragraph, the characterization of the image texture was based on computing a gray-level co-occurrence matrix (GLCM) and a set of descriptive features following the principles presented by Haralick et al., in 1973 [36].However, there are several alternative approaches to quantify image texture.The use of Gabor or wavelet filtering produced good results compared to using the GLCM features with other remotely sensed materials besides ALS data [37].Autocorrelation functions or (semi-)variograms [38] were tested with optical images [39][40][41] and with ALS data [42].Patch or landscape features [43] were proposed as coarse measures of CHM texture [35].More detailed approaches, such as analyzing local microstructures [44] or geometrically [45][46][47] or topologically [48] driven partitioning of image patches, have been proposed as powerful texture discriminators.To the best of our knowledge, however, the latter approaches are tested for built material classification only.Thus, the suitability of these methods and sensitivity of their parameters for remotely sensed vegetation analyses cannot be deduced based on earlier studies.
In summary, various texture analyses can be identified in the literature to improve the assessments of forest growing stock attributes, but analyses considering aspects other than GLCM-based textural features, especially in the case of ALS-based CHMs, are lacking.Even though conventional point-based ALS features most likely produce the best prediction outcome due to their potential to depict multilayered vertical strata [49,50], the textural features may be expected to improve the predictions by allowing a more detailed description of the horizontal forest structure [19,20].Beside the use of textural features in supervised learning approaches, the favorable computational properties of the CHMs [51] may open up possibilities for inventories, in which ALS data have been acquired but no field reference data yet exist [52].In those cases, the texture features may be used for pre-stratification or guiding a later field inventory to cover the variation of forest attributes within the area, which is challenging but important [53][54][55].Thus, testing textural features for both the supervised and unsupervised learning approaches is well reasoned.
The purpose of this study is thus to compare texture analysis methods on CHMs derived from practically available sparse, leaf-off ALS data.Appropriate methods for extracting the textural features were identified based on a comprehensive literature review.The features were related to selected forest biophysical properties measured from field reference plots with an area of 254 m 2 .The performance of these features was tested in: (i) a supervised prediction of the total stem volume, basal area, and mean diameter; and (ii) an unsupervised classification of the forest area for a simulated sampling of the field plots.For benchmarking purposes, the extracted features were compared to the most common point-based ALS predictor variables in the supervised predictions.

Study Area
The study area is located in the southern boreal forest zone in Evo, Finland (61.19The area of altogether approximately 2000 ha is a part of a state-owned forest.The elevation is typically 125-145 m above sea level and mineral soils with gentle slopes prevail.The forest stands vary from natural to intensively managed in terms of their silvicultural status.The area is dominated by Scots pine (Pinus sylvestris L.) and Norway spruce (Picea abies [L.] H. Karst.), which altogether constitute approximately 84% of the growing tree stock, with minor proportions of deciduous trees mainly occurring below the dominant canopy.

ALS Data
The ALS data used in the study were acquired by the National Land Survey of Finland as a part of their data-acquisition campaign for creating a nationwide ground elevation model for Finland.The data were downloaded from a file service [56], from which they are available for free and with extensive permissions of use.The data were acquired in two separate campaigns: on 7 May 2012 with Optech ALTM Gemini scanner operated from a flying altitude of 1830 m, and on 13 May 2012 with Leica ALS50 scanner operated from 2200 m.In both the acquisitions, a scanning angle of ˘20 ˝, a ground footprint of approximately 50 cm, and otherwise similar scanning parameters were applied to yield a nominal pulse density of 0.7-0.8m ´2.Both the scanner systems recorded up to four echoes per each emitted pulse.The data provider had detected and classified the ground level, on which the normalization of the vegetation height values was based, using an adaptive filtering algorithm [57] implemented in TerraScan software (Terrasolid Ltd., Helsinki, Finland).As the data are meant specifically for ground elevation modeling, we assumed the accuracy of this classification to be appropriate for our purposes.Following similar data acquisition and processing principles, the standard errors of terrain elevation values were found to be in order of 15 cm [58].

Field Measurements
The field sample was selected based on auxiliary information from the ALS data [59].Altogether, five ALS features were extracted to quantify height and canopy cover of the dominant vegetation, and density, diversity, and proportion of understory trees and shrub layer, determined as below 60% of the maximum height and below 5 m [59], respectively.These data were clustered, applying an unsupervised k-Means algorithm with equal weights for each feature to obtain an ALS-based stratification of forest structure.The strategy was found efficient to distribute the sample across the spatial, size, and age distributions of the tree stock.
A field measurement protocol based on circular plots with the radius of 9 m was applied on a total of 155 plots.The species and diameter-at-breast height (DBH) were measured for each tree with a DBH ě 5 cm.For each tree species of the plot, a tree with a DBH corresponding to the median tree was determined in the field and measured for height.These attributes were used as the median tree attributes per plot and species.The similarity of close surroundings of each circular reference plot was quantified by establishing an additional field plot 16 m from the original plot center in each cardinal direction (0 ˝, 90 ˝, 180 ˝, and 270 ˝).In these satellite plots, only DBHs were measured from tally trees selected employing a basal area factor of 2 m 2 a ´1.
The positions of the field plot centers were measured with a Trimble ® Nomad ® 900G Global Positioning System and Global Navigation Satellite System receiver with an external antenna and battery.At least 100 signals were collected from the center of each plot, and its position was computed from these observations.The receiver was connected to a GSM phone, which provided a connection to a reference station for real-time differential correction to help the positioning in the field.A differential post-processing was finally applied using Trimble®Pathfinder®Office and a local, permanent base station.

Field Plot Attributes Derived for the Evaluation
Plot-level forest attributes were compiled and aggregated from the tree-level measurements.Plot basal area (G) was computed by summing the DBH measurements.The variations in the within-plot DBH distribution and G among the main plot and the satellite plots were quantified using the coefficient of variation (CV), i.e., the ratio of the standard deviation to the mean of the specific values.The missing tree heights were predicted by calibrating the parameters of Näslund's height curve [60] using the species-specific median tree diameter and height estimates [61].The basal-area-weighted DBH (D gM ) and height (H gM ) were computed based on all the trees of a plot.The volumes of the individual trees were predicted, employing the DBH and height as predictors of species-specific equations [62].The volume models for birch were used for all deciduous trees.The volumes (V) were summed to the plot level and scaled per ha.Descriptive statistics of the 155 reference plots measured are presented in Table 1.

Canopy Height Model (CHM) Generation
CHM rasters were generated based on the first-of-many and only echoes of the normalized ALS data, aiming to obtain the main information from the data while retaining most generalization abilities over sensors that record a different number of echo categories [6].The low pulse density of ALS data was observed to restrict the realism of the CHM, as a requirement for a smaller pixel size would have meant that fewer CHM values were observed and more were interpolated.Consequently, different pixel sizes were tested and an inverse-distance weighting (IDW) technique [63] was applied to fill the pixels without observations.The IDW algorithm was selected due to its simplicity and controllability based on only few parameters (see further notes in Section 4).The proportions of the observed CHM values with the considered pixel sizes are shown on Table 2. Using the six different pixel sizes (Table 2), the CHM values for all the pixels containing at least one ALS echo were initially filled by selecting the maximum height value of observations inside the pixel.A non-filled value ẑ at pixel S 0 was interpolated using the observed CHM values z at pixels S i (Equation (1)), where λ i is the weight calculated according to Equation (2) based on an interpolation parameter α and d 0i as the distance between S 0 and S i : ) The maximum distance between S 0 and S i was set to 3 m, i.e., observations located further than 3 m from the center of an empty pixel had no effect to the interpolation of the specific CHM value.Values 0.1, 0.5, 1, 2, 3, . . ., 9, and 10 were tested for α. Figure 1 shows examples of CHMs derived using different pixel sizes and α = 5, which was used in most of the analyses.

Textural Feature Extraction from the CHM
Image texture analysis has been a subject of very extensive research since the 1970s-for a review, see, e.g., Chapter 2 in [64].This makes an exhaustive comparison of all the published methods impossible.The properties of the algorithms therefore were analyzed by relating earlier literature to our research objectives, i.e., to model forest attributes aggregated from field plots with a size of 254 m 2 based on the extracted textural features.Several aspects such as invariance and robustness of the extraction method relative to the properties of the analyzed images were considered when choosing the algorithms [65].
Texture analyses are typically labeled as: (i) statistical; (ii) geometrical; (iii) model-based; or (iv) signal processing [65,66], in which the discrimination between different textures is based on: (i) comparing the statistics computed over different (sub-)regions; (ii-iii) determining geometric or image-model-based primitives that compose the texture; or (iv) filtering the image in the spatial or frequency domain to be able to compute the frequency components of the image signal.Approaches (ii) and (iii) were deemed inappropriate for our purposes due to the requirement to assume a geometrical or image model behind the process that generated the texture, i.e., to assume the spatial and size variation in the trees of the plots.Approach (iv) was found to be sensitive to a particular frequency and orientation, and even if the invariance to these factors was achieved by a multichannel filtering approach [64], the method appeared as overly complicated for our purposes.Hence, all the methods included in the present comparison could be labeled as statistical texture analyses.The number of these methods was further reduced by a requirement to obtain rotation-invariant features, i.e., not having to assume that the surfaces analyzed are captured from the same viewpoint.
parameter α and d0i as the distance between S0 and Si: The maximum distance between S0 and Si was set to 3 m, i.e., observations located further than 3 m from the center of an empty pixel had no effect to the interpolation of the specific CHM value.Values 0.1, 0.5, 1, 2, 3, …, 9, and 10 were tested for α. Figure 1 shows examples of CHMs derived using different pixel sizes and α = 5, which was used in most of the analyses.The methods selected for the analyses are listed and characterized according to the order and scale considered (Table 3)."Order" determines the immediate neighborhood of pixels considered as an individual observation, whereas "scale" indicates the spatial domain in which the features are computed from the individual observations.First-order statistics are based on individual pixels, whereas second-order statistics also account for the spatial co-occurrence of the pixels [66].The features considered here are extracted from the area of full plots ("global" scale) or the plots re-scaled using the GLCMs ("macro" scale) or detailed local neighborhoods ("micro" scale).The features were extracted from a circular window with a size of 254 m 2 corresponding to the reference plots with 9 m radii.In addition, a 30 m ˆ30 m square window centered to the location of the plots were tested as an alternative computation unit in particular to evaluate the prediction of the CV of G among the satellite plots.
First-order statistics computed in the global scale are assumed to be neutral to the rotations of the image, whereas the use of second-order statistics (based on co-occurrence of two pixels) requires considering the rotation.In the GLCM approach [36], the rotation invariance is obtained by averaging the features extracted from the GLCMs over all directions.In addition to these second-order, macro-scale statistics, detailed micro-scale statistics were derived by computing gray-scale changes in local neighborhoods using the local binary pattern (LBP) approach [44].In LBP, the rotation invariance is obtained by computing occurrence statistics for each individual rotation pattern of the neighborhood considered [44].
Since the use of higher-than-second-order statistics or other scales is rare in the literature, we believe that the results obtained by the selected methods (Table 3) can be extended to other texture quantification methods characterized by the same order and scale.The feature groups considered in the study are listed as follows.Uniformity of a LBP computed as percentages of at most 0 (U 0 ), 2 (U 2 ) or 4 (U 4 ) bitwise transitions in the binary code of a circular neighborhood a In the abbreviations of the features, subscript t is either 5 m or ad, indicating the use of a constant (5 m) or adaptive (ad = h max -h std ) height threshold, respectively; b All features were computed using a lag of 1, 2, 3, 4, or 5 pixels; c All features were computed using a lag of 1, 2, or 3 pixels.

HIST Features
The following first-order, general statistics were derived from the histograms of CHM pixel values: mean, standard deviation, and maximum (Table 3).These features explain vegetation height and its variation directly.

PATCH Features
The patch metrics are considered as coarse measures of texture that eventually produce information on the co-occurrence of groups of pixels obtained by thresholding an image.The spatial pattern of trees was modeled by thresholding CHMs for representing ground and tree patches [19,20].In [19], a threshold of 5 m was used to assign all CHM values above that height as tree patches.In [20], an adaptive difference between the maximum height (h max ) and the standard deviation of the height values (h std ) was reported as the best canopy threshold for determining the spatial pattern of trees.Both the aforementioned constant and adaptive thresholds were tested in this study, and the patch features described in Table 3 were extracted from the thresholded CHMs.

GLCM Features
Textural features were computed from the GLCMs according to [36] by varying the lag parameter (i.e., the spatial interval between co-occurring pixel values) from 1 to 5. To get rotation invariant features, angular GLCMs were computed for four different offsets (0 ˝, 45 ˝, 90 ˝, and 135 ˝), and the mean values of the four angular textural features were analyzed.One out of the original 14 features [36], namely "Maximal Correlation Coefficient", was not included in our analyses due to known computational instability problems associated with this feature.The derived features and their abbreviations are listed in Table 3.For more detailed descriptions, see [36].

LBP Features
The rotation invariant textural features based on the LBPs [44, 67,68] are computed for every pixel by comparing the CHM value with its local circular neighborhood.The values are re-coded depending on the neighborhood: if the value of the pixel is equal or smaller than the value of its neighboring pixel, the local binary is coded as 1 and otherwise as 0. A single LBP of a pixel thus consists of the eight binary values derived from its circular neighborhood.Following [44], in which the uniformity of the image was measured by counting the number of bitwise 0/1 changes in the binary codes, we used the percentages of LBP patterns having uniformity of at most 0, 2, and 4 (abbreviated as U 0 , U 2 and U 4 , respectively) as the extracted LBP features for the reference plots.All the LBP features were computed using the lags of 1, 2, and 3 CHM pixels.

Conventional Point-Based ALS Features
For benchmarking purposes, the most common point-based ALS predictor variables [5,9], i.e., the maximum, the mean, and the standard deviation of the height values; proportion of echoes above 2 m vegetation threshold; the 5th, 10th, 20th, . . ., 90th, and 95th percentiles; and the corresponding proportional densities of the ALS-based canopy height distribution were calculated according to [69] (pp.502-503).The features were computed separately based on the first and last pulse data, i.e., "only" or "first of many" and "only" or "last of many" echoes, respectively, of up to 4 laser echoes recorded per pulse.

Supervised Prediction of Forest Attributes
The performance of the extracted features was first assessed in supervised prediction of forest attributes based on the observations made from the field-measured reference plots.The strength of the relationship between the CHM features and various forest inventory attributes was quantified using the coefficient of determination (R 2 , [70]).The main attention was focused on central attributes related to the properties of the forest growing stock, i.e., total stem volume (V), basal area (G), and basal-area weighted mean diameter (D gM ).Due to the earlier reported potential to link the textural features with the diversity of the tree size and vegetation structure [34,71], we additionally analyzed the relationships between the textural features and the number of understory trees and variation of the DBH and G observed from within the 9 m plots and among the satellite plots, respectively.
The applicability of the extracted textural features and the benchmark variables was tested for predicting V, G, and D gM attributes of the sample plots, as these were found to be most related to the textural features in the initial tests.To normalize the potentially non-linear relationships between the forest attributes and the ALS features, three most common transformations (square, square root, and natural logarithm) of all the textural and point-based features were included as candidate predictors in addition to the absolute values in a simple linear regression (LR) analysis.The final predictors of the LR models were selected by inserting one of the predictor candidates at a time into each model and iteratively appending the feature that produced the best prediction accuracy.The LR models were formed applying a leave-out-one-plot cross-validation, i.e., excluding each plot at the time from the model fitting data and predicting for the excluded plot.The prediction accuracy was measured by root mean square error (RMSE) and relative root mean square error (RMSE%): where y i is the observed value of variable y on plot i, ŷi is the predicted value of variable y on plot i, ȳ is the mean of the observed values, and n is the number of reference plots.

Unsupervised Classification of the Forested Area
Second, the textural features were evaluated for an unsupervised classification of the forested area, subsequently prioritizing the plots to be measured for field reference data.Earlier studies have suggested that the information in the ALS data may be condensed to a few metrics [72,73], the partitioning of which will provide a stratification corresponding closely to the structural complexity observed in the field [59,74,75].In this study, a similar partitioning was carried out using the textural features, and the applicability of the obtained information was demonstrated by prioritizing the field plots to be measured for predicting plot V using other features.
The data were stratified using an unsupervised k-Means algorithm [76] implemented in R statistical computing environment [77].The algorithm partitions the observations into k clusters such that the sum of squares from the observations to the assigned cluster centers is minimized.The clustering was based on the standard Euclidean distance and initialized by altogether 10,000 random solutions for the initial cluster centers to minimize the randomness in the final result.To use the standard Euclidean distance, the selected variables were required to have low inter-correlation to minimize redundancy of the information and thereby ensure an equal weight of each variable in the unsupervised classification.For this reason, the unsupervised classification was based on altogether six textural features hand-picked to capture the main variation in the forest structural characteristics but to have as low inter-correlation as possible.To include all the ALS features with the same importance in the clustering, the original feature values were normalized between 0 and 1.The normalized feature values were obtained according to Equation (5): where r ij and q ij are the normalized and original values of the jth observation of feature i and q min i and q max i are the smallest and largest values of i among all plots.Since it was found problematic to determine a fixed k, i.e., to fix the expected value of the structural classes, we determined the final partitioning according to the persistence of the clusters on an interval of different k values.Based on initial tests in the data studied, the clustering was repeated by gradually increasing the value of k from 2 to 7, using an R package clue [78] to manage the resulting ensemble of the k-Means partitions.During the iterations, it was recorded whether the individual cells persisted in a cluster or shifted from one cluster to another one.As a result, each grid cell was labeled with an identifier describing its path along the clustering hierarchy.Therefore, as opposed to results with a fixed k, the final partitioning was based on the composition of the sub-clusters formed during the clustering.More details on the applied methodology are provided by [59].
The applicability of the information obtained by the stratification was demonstrated by prioritizing the sample plots to be measured for field reference data.The accuracy of the predictions depends on how extensively the reference data represent the full range of the phenomenon to be modeled [52].We assume that the textural variation extracted from the CHMs is related to the real-world variation of the forest attributes and therefore indicative of which plots include the extreme variation that must be captured in the modeling data to produce realistic predictions.This hypothesis was tested by simulating the sampling of a reduced number of field reference plots for predicting V (cf.[52][53][54]), i.e., applying Probability Proportional to Size (PPS) sampling for the selection of the first-stage sample units [79].
The non-linear relationship between aggregated stem volume (V) and a product of ALS-pointbased mean height (H) and canopy cover (CC) was modeled as: where a, b, and c were model parameters solved using the nls function of R [77].The plots to be included in the modeling data were prioritized according to the distance to their respective cluster center.The distribution of the data to the different clusters was assumed to indicate different forest structure types and the distance to the cluster center to indicate whether a plot represented an extreme (high distance) or a typical observation of the cluster.The plots were ordered according to the decreasing distance and inserted one by one to the data used for fitting Equation (6).The RMSE of Equation ( 6) was recorded after including each plot.

Effects of the CHM Parameters to the Textural Features
The spatial interpolation of unknown CHM values was based on the premise that CHM values of two pixels are related to each other, and the similarity is inversely related to the distance between their locations.In the IDW method, this was affected by the parameter α: the larger the value, the less weight was given for observations located far from the unknown CHM pixel.Figure 2 shows how the R 2 of various textural features and total stem volume were affected, when the values of α were 0.1, 0.5, 1, 2 . . .9, and 10 with the pixel size of either 0.5 m or 1.0 m.The rest of the analyses were performed using α = 5, as with the pixel size of 0.5 m the highest correlations were obtained with that particular parameter α value, and no obvious improvements by other parameter values were discovered with the pixel size of 1.0 m.Remote Sens. 2016, 8, 582 10 of 21 extreme (high distance) or a typical observation of the cluster.The plots were ordered according to the decreasing distance and inserted one by one to the data used for fitting Equation ( 6).The RMSE of Equation ( 6) was recorded after including each plot.

Effects of the CHM Parameters to the Textural Features
The spatial interpolation of unknown CHM values was based on the premise that CHM values of two pixels are related to each other, and the similarity is inversely related to the distance between their locations.In the IDW method, this was affected by the parameter α: the larger the value, the less weight was given for observations located far from the unknown CHM pixel.Figure 2 shows how the R 2 of various textural features and total stem volume were affected, when the values of α were 0.1, 0.5, 1, 2… 9, and 10 with the pixel size of either 0.5 m or 1.0 m.The rest of the analyses were performed using α = 5, as with the pixel size of 0.5 m the highest correlations were obtained with that particular parameter α value, and no obvious improvements by other parameter values were discovered with the pixel size of 1.0 m.The pixel size affected the textural features and subsequently the prediction of forest attributes.To find the optimal pixel size, we evaluated R 2 of the selected textural features and the forest attributes with varying pixel sizes (Figure 3).Several important textural features had the strongest correlations with stem volume and basal area when the pixel size was 0.5-1.0 m.No significant differences between these levels could be observed based on a visual analysis of Figure 3, but since a higher number of pixel values were observed on the latter (Table 2), the pixel size was set to 1.0 m in the following analysis.The pixel size affected the textural features and subsequently the prediction of forest attributes.To find the optimal pixel size, we evaluated R 2 of the selected textural features and the forest attributes with varying pixel sizes (Figure 3).Several important textural features had the strongest correlations with stem volume and basal area when the pixel size was 0.5-1.0 m.No significant differences between these levels could be observed based on a visual analysis of Figure 3, but since a higher number of pixel values were observed on the latter (Table 2), the pixel size was set to 1.0 m in the following analysis.

The Degree of Determination between the Textural Features and Forest Attributes
The R 2 of selected textural features and the most common structural forest attributes are presented in Table 4, while the R 2 of all features are presented as Supplementary Data.For benchmarking purposes, selected point-based features are also shown.None of the ALS-based features correlated well with CV(DBH) or the number of understory trees on the field plots.The observed R 2 of the textural features and CV(G) on the large plots were more promising, as some of the GLCM features explained approximately one third of the variation in CV(G).Nevertheless, obtaining approximately the same degree of determination between the textural features extracted from the 9 m plots and CV(G) of the satellite plots does not support the causality of this observation.Thus, the rest of the analyses are focused on the accuracies of the tree-based forest attributes (i.e., V, G, and DgM).

The Degree of Determination between the Textural Features and Forest Attributes
The R 2 of selected textural features and the most common structural forest attributes are presented in Table 4, while the R 2 of all features are presented as Supplementary Data.For benchmarking purposes, selected point-based features are also shown.None of the ALS-based features correlated well with CV(DBH) or the number of understory trees on the field plots.The observed R 2 of the textural features and CV(G) on the large plots were more promising, as some of the GLCM features explained approximately one third of the variation in CV(G).Nevertheless, obtaining approximately the same degree of determination between the textural features extracted from the 9 m plots and CV(G) of the satellite plots does not support the causality of this observation.Thus, the rest of the analyses are focused on the accuracies of the tree-based forest attributes (i.e., V, G, and D gM ).
The extracted textural features were fairly insensitive to the computation parameters used.The patch features explained the variation of V and G more clearly when the constant threshold (5 m) was used instead of the adaptive threshold.Some GLCM-derived features such as SAVG correlated well with V and G (respective R 2 of 0.46 and 0.55 regardless of the lag) but always poorly with the D gM .The lag parameter of the GLCM features affected only a few of the relationships between the GLCM features and forest attributes.In particular, when a larger GLCM lag value (from 1 to 5) was used for calculating the CON feature, the stronger R 2 was observed with the D gM (from 0.01 to 0.22).LBP-based CHM uniformity features U 0 and U 2 were also dependent on G having the dyadic R 2 up to 0.20.Among all ALS-based variables, CHM mean was the best feature for predicting V and G with respective R 2 of 0.70 and 0.63 (Table 4).In addition, TP 5m _% predicted the respective forest attributes better than any single point-based feature (R 2 of 0.47 and 0.61).H_mean FP and VegR FP were the best point-based features for predicting V and G with R 2 of 0.43 and 0.58, respectively.In addition, based on graphical comparisons with the point-based mean height features, CHM mean had a stronger but also more non-linear relationship with V and G.A square root transformation was required to fix the linearity of the relationship and improved the R 2 of both CHM and point-based features.Conversely, the point-based features were better for predicting D gM (the best feature H70 FP had an R 2 of 0.74 with D gM ) compared to the best textural features (CHM max and CHM std with R 2 of 0.69 and 0.68 with D gM , respectively).

Prediction Accuracies Based on the Linear Regression Models
The CHM features and the point-based features were applied for predicting V, G, and D gM using LR.The best prediction accuracies of V (RMSE% of 25.0%) and G (RMSE% of 20.9%) were achieved by combining both textural and point-based features, whereas an optimal prediction model of D gM (RMSE% of 12.9%) was based only on point-based features (Table 5).
When all ALS-based predictor features were applied for the LR model of V, as an example, CHM mean , TP ad _%, U 0,lag = 1 , VegR LP , H_mean LP , SVAR lag = 1 , H10 LP , TP ad _std, and TP ad _avg were the first 10 predictor features selected to the model (Table 5).A similar combination of the different feature groups was used for modeling G, whereas the D gM models were more frequently based on the point features.The results suggest that combining different feature groups improves the prediction accuracies.Considering the interactions between the features in a feature selection procedure also results in a slightly different ranking of the features than the correlation analyses above.
Table 5.The plot-level prediction accuracy of total stem volume, basal area and mean diameter using different sets of predictor features and their transformations in linear regression analyses.Altogether, 10 predictors are listed from each feature group in the order they were selected (footnote).The predictors included in the models were the first 1, 2, 3, or 5 predictors mentioned.

Unsupervised Classification
The partitioning of the 155 plots to 2-7 clusters according to the selected features (CHM mean , CHM std , TP ad _avg, ASM lag = 1 , IMC2 lag = 1 , and U 2,lag = 2 ) resulted in altogether 27 sub-clusters in the clustering hierarchy.Excluding clusters formed by single plots, the composition of the remaining nine sub-clusters ordered according to CHM mean and labeled as A-I are shown in Supplementary Data (Figure S1) in terms of the textural feature values that generated the clusters.
When examined against the field measurements, the clustering could be linked with forest size-related attributes, particularly the aggregated stem volume, basal area, and mean diameter (Supplementary Data, Figure S2).Cluster A included the most of the youngest stands, but it was not completely uniform because of considerable variation in diameter distribution.Maturity and stocking increased from cluster B to E. Cluster F included the most stocked stands in the data, and F-I were the most prominent stands in terms of maturity for harvesting.Except for Cluster A, the clusters could not be linked to dominant species or the heterogeneity in the within-plot diameter or between-plot basal-area distributions.
The example of using the cluster dispersion obtained solely from the ALS data as a criterion to prioritize the selection of the sample plots for model fitting is illustrated in Figure 4.The RMSE of V based on applying Equation ( 6) to the full data considerably varied when inserting the initial plots (Figure 4b).The left column of Figure 4 suggests plots with the highest priorities in terms of the cluster dispersion are needed to capture the correct form of the relationship, whereas adding plots with lower priorities does not add information on the relationship.The RMSE stabilized after including 70 plots with the highest priority (Figure 4b).6) and an increasing number of sample plots for model fitting, selected in the order of the decreasing dispersion.

Discussion
Unlike in earlier studies based on ALS, we performed a comprehensive comparison of methods for extracting and quantifying the textural variation present in the CHMs derived from a boreal forest area.Although earlier studies had related the textural variation of aerial images to size [28] and diversity [71] attributes of the vegetation and explained the effects of the parameters affecting the textural features extracted from aerial images [28], a new study was justified since the CHMs interpolated from the height values differ considerably from the spectral response recorded by the aerial images.The CHMs directly reflect the size and structure variation in the forest and were expected to thus readily improve the forest attribute prediction, eluding problems related to variations in geometry and radiometry of the spectral images [80].Further, a considerably wider range of textural features was included in our analyses compared to [28].The results are based on practical data, which are broadly available due to the frequent acquisitions of such data for ground elevation modeling.The results of this study may be seen as a continuation of studies promoting the usability of such data for various forest inventory applications [2][3][4]12,59,81] While an exhaustive comparison of the texture quantification methods was found to be impossible due to the multitude of the developed methods, we attempted to select the most important ones based on a literature review of the properties of the algorithms.It could be argued whether the selection of the methods covered by our analyses was representative.However, our problem of relating the extracted features to the forest attributes aggregated from field plots with an area of 254 m 2 placed some constraints on the methods.A particular emphasis was placed on the rotation invariance, which removed the requirement to run the training procedure applying the same rotation.On the other hand, the use of the isotropic features may have caused a loss of directionality from these features, which could have been an important characteristic when classifying the texture.
Assuming an underlying spatial pattern or geometric model of the trees would have enabled the use of additional approaches to quantify the texture.However, there is no evidence that the spatial  6) and an increasing number of sample plots for model fitting, selected in the order of the decreasing dispersion.

Discussion
Unlike in earlier studies based on ALS, we performed a comprehensive comparison of methods for extracting and quantifying the textural variation present in the CHMs derived from a boreal forest area.Although earlier studies had related the textural variation of aerial images to size [28] and diversity [71] attributes of the vegetation and explained the effects of the parameters affecting the textural features extracted from aerial images [28], a new study was justified since the CHMs interpolated from the height values differ considerably from the spectral response recorded by the aerial images.The CHMs directly reflect the size and structure variation in the forest and were expected to thus readily improve the forest attribute prediction, eluding problems related to variations in geometry and radiometry of the spectral images [80].Further, a considerably wider range of textural features was included in our analyses compared to [28].The results are based on practical data, which are broadly available due to the frequent acquisitions of such data for ground elevation modeling.The results of this study may be seen as a continuation of studies promoting the usability of such data for various forest inventory applications [2][3][4]12,59,81].
While an exhaustive comparison of the texture quantification methods was found to be impossible due to the multitude of the developed methods, we attempted to select the most important ones based on a literature review of the properties of the algorithms.It could be argued whether the selection of the methods covered by our analyses was representative.However, our problem of relating the extracted features to the forest attributes aggregated from field plots with an area of 254 m 2 placed some constraints on the methods.A particular emphasis was placed on the rotation invariance, which removed the requirement to run the training procedure applying the same rotation.On the other hand, the use of the isotropic features may have caused a loss of directionality from these features, which could have been an important characteristic when classifying the texture.
Assuming an underlying spatial pattern or geometric model of the trees would have enabled the use of additional approaches to quantify the texture.However, there is no evidence that the spatial pattern of the trees could be correctly classified based on remotely sensed data [20].Even if such a classification were useful for forestry applications, departures from the assumptions could cause severe error trends [82].The inherent spatial autocorrelation of the data further prevented the use of some texture quantification approaches.Given these challenges, it was well reasoned to focus on the statistical texture analyses and the selected combinations of order and scale (Table 3).Note that similar considerations on the feasibility of texture analyses and roles of assumptions and spatial scales were already discussed based on optical satellite images [83].
The generation of CHMs from the low-density ALS data affected the textural features derived using the selected methods.A part of the textural variation in the CHMs originates from the choice of whether to interpolate pixel values or increase the pixel size in order to use more observed height values.The use of small pixel sizes required a number of pixels to be interpolated, whereas increasing the pixel size would have allowed more pixel values to be based on real height observations.On the other hand, the degree of detail of the CHM reduced according to the pixel size.Nevertheless, Figure 1 indicates that the vertical and horizontal variation in the forest structure, as captured by the ALS point data, can be reproduced and analyzed as textural variation in the CHMs.An obvious solution to add the level of detail of the CHMs would be to increase the point density of the data.Data with a higher density might improve the texture analyses, but also allow the use of more efficient analysis methods such as detecting individual tree tops and using them as predictors [18].
The sparse point density led to, at best, a large CHM pixel size.Increasing the pixel size resulted in an increase of the CHM values observed, but a loss of information in terms of the detail.At small pixel sizes (0.5 or 1.0 m), the R 2 of textural features and forest attributes were stable, whereas at larger pixel sizes their R 2 degraded.The IDW interpolation parameter α only had a minor impact on the textural features computed in the study.Notably, many other techniques could have been employed in the interpolation of the CHM rasters.Acknowledging that the effects of interpolation to the quality of the CHMs should also be further studied, we selected to fill the CHMs using IDW interpolation due to its simplicity and controllability based on only few parameters.It would further have been possible to fill the CHMs using more than only or first echoes per emitted pulse, which has become popular especially when the detail of the CHM must enable individual tree detection [18].Even though such alternative may be recommendable for practical purposes, the results obtained here using only first echo data are more likely reproducible with sensors that record a different number of echo categories [6].
The performance of the textural features derived using the selected pixel sizes (0.5 or 1.0 m) was moderately good for predicting V, G, and D gM .Among the tested feature sources, those based on the HIST, PATCH, and GLCM clearly outperformed those based on LBP.The HIST features derived from the grey-level histograms directly characterized the forest canopy height based on the CHMs.CHM mean was clearly the best individual feature for predicting stem volume and basal area among all the ALS-based features.CHM std and CHM max were almost as good as the point-based features to explain the variation of mean DBH.This is slightly surprising, since the CHM features should overall include less information compared to the height features computed from the entire point cloud.The reasons for the more stable relationship with the forest attributes may be related to the smoothing due to the CHM interpolation step.A similar result was reported in [4] and the potential to improve forest variables this way should be further examined.Overall, our results indicate that the description of the forest structure may be considerably simplified using the CHM mean as a substitute (or complement) for the point-based features.For improved accuracies, one should combine the CHM height features with the density features derived from the point data.
Among other textural feature groups, the PATCH features were useful in the prediction of the volume and basal area.The features derived using adaptive CHM threshold appeared as important predictors of the volume, whereas the basal area was better modeled when a constant threshold was applied.Among the GLCM features, SAVG lag = 1 was the best individual feature, while SVAR lag = 1 was most often selected to the LR models from the set of other predictor candidates.Conversely, the uniformity features derived from the LBPs did not stand out with respect to any application considered.This is also slightly surprising, since LBPs should fundamentally detect more detailed microstructures than the aforementioned techniques.Analyzing the individual LBPs directly using more complex machine learning methods might provide better results.The poor performance of the method may also be related to the low resolution of the data analyzed.Nevertheless, the result may also indicate that texture quantification methods performing well in man-made material classifications may underperform in the case of vegetation with considerable natural variations.
Interestingly, the textural features sometimes outperformed point-based features regardless of which echo types were used for the computation.The use of the area-based approach with point-based canopy height and density features is a common and established practice for predicting forest inventory attributes from the ALS data [5,8].Mainly features based on the vertical distribution of the ALS point cloud are used, however, for which reason these analyses optimally characterize only the vertical canopy conditions.The textural features of the present study were computed from the interpolated CHMs, aiming to improve the characterization of horizontal canopy conditions in ALS forest inventories.From a forest management planning perspective, the horizontal information in terms of the stem density and clustering is related to the timing of thinnings, and the inclusion of the textural features is proposed to improve these decisions [19].Although Ozdemir and Donoghue [34] and Wood et al. [71] indicated a potential to link the textural features with the diversity of the tree size and vegetation structure, respectively, our results did not support this.The textural features were poorly correlated with the coefficients of variation of the DBH and G values as well as with the number of understory trees.The features mainly described the properties of the trees in the dominant canopy such as the total stem volume, basal area, and mean diameter.Direct measures of the understory are difficult to obtain due to transmission losses occurring in the upper canopy [84] (see also [50]) and no correlations between the textural features quantifying the top of the canopy and the understory were observed here.Although a slight correlation between the textural features and CV(G) computed from the large plots was observed, a similar correlation was obtained when the textural features were computed from a smaller window.A better solution for quantifying the within-stand variation in a practical prediction based on the grid cells could be to analyze the variation between the neighboring cells.A higher number of cells would also likely be needed to make solid conclusions.
In addition to the linear regression analyses, the textural features were demonstrated in an unsupervised classification using the well-known k-Means approach.The approach provided a data-driven partitioning of the feature space to the given number of clusters.As there are no objective criteria for determining the number of clusters [74], we followed an earlier example on determining this number [59].By initially experimenting with different numbers of clusters, we found that 4-7 structural classes could be separated based on our study area and data.There are five classes in both the site type and development stage classification system applying to the studied forest area.However, not all classes are likely present and the data may not discriminate between all these properties.Therefore, the applied split and merge criteria resulted in a reasonable number of clusters.
Notably, the results obtained here are a re-clustering of the entire study area [59].Whether applied to the full study area, it is reasonable to expect more clusters to be found similar to [59].Nevertheless, the main purpose of this analysis was to determine if the developed features could be used for pre-stratification of the inventory area based on ALS data, as proposed in the introduction section.It was found that the dispersion of the clusters derived in an unsupervised mode could be used as an indicator for prioritizing the plots to be measured as the sample to form the reference data for the wall-to-wall models.We acknowledge that these are indicative results and the generalizability of this conclusion should be verified in more extensive data, allowing a split to training and validation data.For example, the textural features derived show a preliminary potential for streamlining the sampling design of the field plots to be used for later inventories, which should be further studied.
Practical inventory projects of large areas will most likely need to increasingly account for computation costs related to extracting the features.We see a considerable potential of the features developed here related specifically to their computational feasibility.For example, when designing the sampling protocol to obtain the field reference data for an ALS-based forest inventory, the sizes and orientations of potential plot locations should not be fixed but instead allowed to vary in order to design a field sample capturing the essential variation of the forest.Whether guiding the field plot sampling requires extracting features from, e.g., moving windows of multiple scales, the related computations are likely much more efficient when based on CHMs rather than underlying point data due to the efficient processing routines developed for raster data.Specifically, the time complexity related to CHM-based features reduces from the number of points to the number of pixels of the CHM, which appears to be a considerable factor in large inventory areas.Relating the computation costs of extracting various features to the added value obtained could be a potentially interesting future topic, since at least we are not aware of studies related to the time and space complexity of the ALS algorithms.

Conclusions
CHMs interpolated from the ALS data were found to reflect some degree of textural variation that was useful for modeling the underlying forest attributes, particularly plot volume and basal area.Applications based on supervised (linear regression) and unsupervised (k-Means clustering) learning of forest structure were demonstrated.The latter indicates the utility of the derived features for improving the sampling of the field plots for forest inventories, which should be further verified with separate validation data.Among the features considered, the statistical, patch, and GLCM features outperformed those based on point data, indicating that improved information is contained in the textural features.Features based on LBP were less useful for the same purpose.Although the tested features were selected based on a comprehensive review of potential methods to quantify image texture, we acknowledge that many more textural feature extraction techniques could be considered, especially if the requirement to produce rotation-invariant features was relaxed.In all, even the sparse-density data include potential to develop features that quantify very different aspects of the data, which should be employed together to improve analyses of vertical and horizontal forest structure.

Figure 1 .Figure 1 .
Figure 1.(a) Points with height values >5 m above ground level extracted from an example plot.Canopy height models interpolated from the point data using pixel sizes of: (b) 0.5 m; (c) 0.75 m; (d) Figure 1.(a) Points with height values >5 m above ground level extracted from an example plot.Canopy height models interpolated from the point data using pixel sizes of: (b) 0.5 m; (c) 0.75 m; (d) 1.0 m; (e) 2.0 m; and (f) 3.0 m.In (a), the sizes of the dots are scaled according to the height values, whereas the tick marks of each sub-plot correspond to a horizontal distance of 2 m.

Figure 2 .
Figure 2. The R 2 of selected textural features and total stem volume with the pixel sizes of 0.5 m and 1.0 m.The interpolation parameter (α) was set to 0.1, 0.5, 1, 2…, 9 and 10.

Figure 2 .
Figure 2. The R 2 of selected textural features and total stem volume with the pixel sizes of 0.5 m and 1.0 m.The interpolation parameter (α) was set to 0.1, 0.5, 1, 2 . . ., 9 and 10.

Figure 3 .
Figure 3.The effect of the pixel size to the coefficients of determination (R 2 ) of selected textural features and plot-level forest attributes (total stem volume, basal area, mean diameter (DBH), and DBH variation).The interpolation parameter α was set to 5.

Figure 3 .
Figure 3.The effect of the pixel size to the coefficients of determination (R 2 ) of selected textural features and plot-level forest attributes (total stem volume, basal area, mean diameter (DBH), and DBH variation).The interpolation parameter α was set to 5.
Remote Sens. 2016, 8, 582 14 of 21 with lower priorities does not add information on the relationship.The RMSE stabilized after including 70 plots with the highest priority (Figure4b).

Figure 4 .
Figure 4. (a) The relationship between the airborne laser scanning estimate of mean height (H) × canopy cover (CC) and plot volume (V) in the data studied.The filled and open circles correspond to clusters including single or several plots, respectively, and the size of the open circles indicates the dispersion of the plot from the respective cluster center; (b) The development of the root-meansquared-error, when the prediction of V is based on Equation (6) and an increasing number of sample plots for model fitting, selected in the order of the decreasing dispersion.

Figure 4 .
Figure 4. (a) The relationship between the airborne laser scanning estimate of mean height (H) ˆcanopy cover (CC) and plot volume (V) in the data studied.The filled and open circles correspond to clusters including single or several plots, respectively, and the size of the open circles indicates the dispersion of the plot from the respective cluster center; (b) The development of the root-mean-squared-error, when the prediction of V is based on Equation (6) and an increasing number of sample plots for model fitting, selected in the order of the decreasing dispersion.

Table 1 .
Central total and species-specific characteristics of the reference data, aggregated to the plot-level.

Table 2 .
The proportion of observed canopy height model pixel values with the considered pixel size in the reference plots.

Table 4 .
The coefficients of determination (R 2 ) of selected features (17 textural features and three point-based features) and three forest attributes with the pixel size of 1.0 m and interpolation parameter of 5.

Table 4 .
The coefficients of determination (R 2 ) of selected features (17 textural features and three point-based features) and three forest attributes with the pixel size of 1.0 m and interpolation parameter of 5.