Canopy Height Estimation from Single Multispectral 2D Airborne Imagery Using Texture Analysis and Machine Learning in Structurally Rich Temperate Forests

: Canopy height is a fundamental biophysical and structural parameter, crucial for biodiversity monitoring, forest inventory and management, and a number of ecological and environmental studies and applications. It is a determinant for linking the classiﬁcation of land cover to habitat categories towards building one-to-one relationships. Light detection and ranging (LiDAR) or 3D Stereoscopy are the commonly used and most accurate remote sensing approaches to measure canopy height. However, both require signiﬁcant time and budget resources. This study proposes a cost-effective methodology for canopy height approximation using texture analysis on a single 2D image. An object-oriented approach is followed using land cover (LC) map as segmentation vector layer to delineate landscape objects. Global texture feature descriptors are calculated for each land cover object and used as variables in a number of classiﬁers, including single and ensemble trees, and support vector machines. The aim of the analysis is the discrimination among classes in a wide range of height values used for habitat mapping (from less than 5 cm to 40 m). For that task, different spatial resolutions are tested, representing a range from airborne to spaceborne quality ones, as well as their combinations, forming a multiresolution training set. Multiple dataset alternatives are formed based on the missing data handling, outlier removal, and data normalization techniques. The approach was applied using orthomosaics from DMC II airborne images, and evaluated against a reference LiDAR-derived canopy height model (CHM). Results reached overall object-based accuracies of 67% with the percentage of total area correctly classiﬁed exceeding 88%. Sentinel-2 simulation and multiresolution analysis (MRA) experiments achieved even higher accuracies of up to 85% and 91%, respectively, at reduced computational cost, showing potential in terms of transferability of the framework to large spatial scales. spatial and spectral resolution, moving window dimensions, and classiﬁer selection.


Introduction
In recent years, there has been an increasing interest in habitat mapping as a basis to monitor ecosystem status and to assess the human impact on land cover. Canopy height is a basic variable to characterize forests' structure and is known to correlate with important biophysical parameters, such as primary productivity [1] and forest health. It is important for a number of applications, such as biodiversity [2] and habitat monitoring and assessment, above-ground biomass estimation [3,4], environmental protection (flora-fauna), terrain characterization, fire or flood modeling, and disaster management.
Traditional methods applied for canopy height estimation rely on terrestrial measurements through field campaigns for a high accuracy. Resource constraints and rapid changes in the light of global changes, however, stress the need for satellite and airborne sensors as time-, cost-, and labor-efficient alternative methods. Remote sensing advancements emerging at an unprecedented rate seem to be well-performing, while providing accurate information on vegetation structure. In this context, numerous approaches have been proposed in the literature to estimate canopy height [5,6].
Light detection and ranging (LiDAR) data, as the most efficient alternative to ground measurements, are considered to provide the most accurate information on vegetation structure. LiDAR data, depending on the aerial platform, are broadly used to map canopy height of quite large areas, both as a single method [7][8][9][10] and in conjunction with data from other sensors (active or passive) [11][12][13]. However, the cost per unit area may be relatively high. Today, airborne LiDAR systems are in operational use in many countries around the world for forest monitoring and management [14].
Synthetic aperture radar (SAR) data have proven to be effective in registering the vegetation structure [15][16][17]; however, their use is still limited. As an active sensor alternative, interferometric synthetic aperture radar (InSAR) data are used for the development of surface maps. Most studies which use SAR data focus on polarimetric interferometry or use the difference in wavelengths sensitivity [18]. In this context, Landsat images were combined with intensity and coherence images from ALOS PALSAR-1 to map canopy height in Chile, using ensembles of regression trees, with in situ data and airborne LiDAR as ground truth [18]. Root mean square errors of ≈ 2 m are achieved for trees up to 30 m, whereas accuracy drops to ≈ 4 m if higher trees up to 40 m are present. In fact, the height mapping for high trees is mainly supported by the optical Landsat imagery.
In digital photogrammetry, both nadir aerial and satellite-SAR stereo images [19,20] have been extensively involved for years in the derivation of digital surface models (DSMs). In particular, dense pixel-based image matching algorithms are applied to two or more passive images of the same area for the generation of DSMs. When aerial or satellite stereo images are employed, canopy height is calculated either by the subtraction of a LiDAR-derived digital terrain model (DTM) from the DSM [21] or by averaging the values of the highest DSM percentiles [22]. Similarly, SAR-based DSMs and existing DTMs can be combined to predict canopy height (SRTM; [23]). These methods provide less expensive height estimations in comparison with the application of LiDAR data [24][25][26].
As an alternative to the aforementioned data, efforts are being undertaken by researchers to exploit high spatial resolution remotely sensed earth observation data stemming from passive multispectral airborne or satellite sensors. These focus on the estimation of canopy height using regression analysis, which is based on (i) various statistical measures of reflectivity [27], (ii) spectral indices, e.g., normalized difference vegetation index-NDVI, leaf area index-LAI [28,29], normalized difference eater index-NDWI or optimized soil adjusted vegetation index-OSAVI [30], or (iii) simple texture features including mean, standard deviation, variance, sill variance, and ratios of these metrics [31,32].
Utilization of texture features has proven particularly effective to derive information about the physical structure and the edges of a canopy [33,34]. Some of the commonly utilized texture features are (i) the texture statistics (Haralick features) [35] retrieved by the gray level co-occurrence matrix (GLCM) [36,37], (ii) the Markov random fields [38], and (iii) the wavelet transforms [39]. In Kayitakire, et al. [40], GLCM texture features calculated from IKONOS-2 imagery have been related to the height of oak, beech, and spruce trees with R 2 values of up to 0.76. Moreover, global morphological texture descriptors (e.g., circular covariance histogram, rotation-invariant point triplets) have also been applied and evaluated for the segmentation of natural landscapes, forest mapping, species discrimination, and content-based image retrieval [41], achieving better retrieval scores than Gabor filters, local binary patterns (LBPs), and local invariants.
Further approaches use multiscale geographic object-based image analysis (GEOBIA) models, in conjunction with texture analysis and reflectance values within a moving window, for the approximation of canopy height and other forest structural parameters. As a characteristic example, spectral and texture features and shadow fraction were derived from a QuickBird image for each object in Chen et al. [42]. The results verified that object-based models may achieve more accurate estimations of forest height in most object scales. Furthermore, texture features based on local variance, entropy, and binary patterns, calculated from a WorldView-2 imagery, have already been tested for canopy height estimation in the Netherlands with a variety of vegetation types, ranging from deciduous and coniferous trees to shrubs and heathlands [33]. They proved particularly effective in discriminating among different vegetation height categories in a wide range of height values (from less than 5 cm to 40 m), reaching very high object-based overall classification accuracies of over 90%. The coarse spatial level of analysis, due to the pixel size of the classified objects (i.e., from few to thousands pixels), suggests the use of texture features for habitat monitoring applications, where spatial accuracy is not a strong prerequisite.
This study introduces a cost-effective object-based approach for canopy height estimation in non-overlapping height classes, built around the texture analysis of a very high-resolution (VHR) passive airborne sensor image. This algorithm transforms the estimation problem into a multiclass prediction one. The main objective is the prediction of the canopy height class associated to each landscape object according to the habitat categorization scheme of general habitat categories (GHC) [43], providing valuable contribution to habitat monitoring applications. The problem is formulated as a classification rather than a regression task, with the extracted height classes being able to discriminate objects with similar spectral characteristics, which are assigned to different land cover categories in the reference map. The proposed framework is using [33] as a baseline, presenting substantial changes to improve the performance of the method. These are comprised initially of testing its application with varying moving window dimensions to account for the effect of the differences in crown size in an area with highly heterogeneous forests, e.g., the Bavarian Forest National Park (BFNP), and the application of data imputation techniques, such as missing data handling, outlier removal, and data normalization. Simulated (spatially aggregated) Sentinel-2 images are also put to test, since the abundance and accessibility of such data increases the applicability of the examined method. However, the most promising results are expected by the utilization of varying spatial resolutions and their hierarchical combination through multiresolution approach.
The name multiresolution analysis has been broadly used for sub-band coding (signal processing), quadrature mirror filtering (speech processing), and pyramidal coding (image processing). Pyramidal coding refers to multiscale image representation, in which an image is subject to repeated smoothing and subsampling. Images in general have locally varying statistics of pixel intensities resulting in combinations of edges, abrupt features, and homogeneous regions. Analysis of statistical properties of pixel neighborhoods of varying sizes may be useful. The fact that small objects are viewed at high spatial resolutions, while large objects require only a coarse resolution, leads us to evaluate multiresolution analysis (MRA) in conjunction with the proposed framework. Processing cost of both feature extraction and classification processes can be notably reduced, while very large areas of interest and ultra-high-resolution imagery can be assessed.

Study Area
The study area includes the entire Bavarian Forest National Park (BFNP) (243.69 km 2 ) located at the German-Czech border in the state of Bavaria (49 • 3 19"N, 13 • 12 49"E) ( Figure 1). The southern part of the current park was designated as the first national park of Germany in 1970 and was extended to the northwest in 1997. Together with other neighboring forests, such as the Czech Sumava National Park, they form the Bohemian Forest Ecosystem, one of Europe's largest protected forest ecosystems. The terrain is rough and uneven, while the forest area is characterized as mountainous with an altitude ranging between 600 and 1453 meters a.s.l. The temperate climate with its seasonal fluctuations is characterized as cold and humid, with the average annual temperature ranging from 3 to 6.5 • C and the annual rainfall estimated at 1200 to 1850 mm. Forest areas of the highest altitude are covered with heavy and long-lasting snow for most of the year (7 to 8 months). The entire BFNP has been managed under a no-intervention policy by maintaining natural ecological processes, which has resulted in heterogeneously structured and complex stands. The identified study areas mainly encompass three major forest communities, including sub-alpine stands dominated by Norway spruce (Picea abies L. Karst) and occasionally mountain ash (Sorbus aucuparia L.), slopes covered by a mixture of Norway spruce, silver fir (Abies alba Mill.), European beech (Fagus sylvatica L.), and sycamore maple (Acer pseudoplatanus L.), and valley bottoms dominated by Norway spruce, mountain ash, and birches (Betula pendula Roth and Betula pubescens Ehrh.) [44].

Height Classification Scheme
In particular, six (6) studied height classes are defined (see Table 1) as the core of the multiclass prediction problem, ranging from 0 to 40 m. They are based on GHC reference habitat mapping framework, as the main characteristic to discriminate among different tree and shrubs (TRS) species. Based on the aforementioned height classification scheme, areas of interest were selected as the most suitable representatives of the BFNP, spanning across the whole range of canopy heights. In some study cases, four (4) height classes were used instead of six (6), i.e., the first three classes were merged into one, corresponding to a height class for land objects below 0.6 m, while the other three classes remained unchanged. This procedure was followed in order to reduce the effect of the terrain at areas with very low vegetation that incorporate noise in the study of other classes and to cover for cases with very few land objects that belong to the first three classes, resulting in very low representativeness. This way a more reliable low and sparse vegetation reference layer was obtained.
As far as the selection of the areas involved in the evaluation phases (see Appendix A) is concerned, it is based on the following criteria: (i) existence of validation and segmentation layer, (ii) small degree of inclination to reduce topographical and shadow factor, as demonstrated in Figure 2, (iii) representativeness of height classes, and (iv) absence of land objects containing artificial constructions, considered as out of the scope for this work. The orthomosaic is projected in the DHDN/3-degree Gauss-Kruger zone 4 coordinate reference system.

Dataset
Two orthomosaics based on DMC II multispectral airborne images have been employed for the study area (see Figure 1). Raw images were acquired from ILV-Fernerkundung GmbH and captured on 19 and 20 August 2012, corresponding to the time of maximum biomass and foliage (peak flush). Each orthomosaic has three (3) spectral bands in the visible and near-infrared band (NIR) part of the electromagnetic spectrum with a spatial resolution of 40 cm. The first orthomosaic has three (3) spectral bands corresponding to a true color image, while the second color-infrared (CIR) orthomosaic corresponds to a false color image that is synthesized of the NIR, red and green bands. The NIR band was extracted from the CIR orthomosaic to enhance the discriminatory power of the feature sets. The orthomosaics used for this study were already georeferenced, orthorectified, atmospherically/radiometrically corrected geospatial products. Cloud contaminated pixels were not existent, because aerial image acquisition missions are usually performed in clear-sky conditions. A DTM and a DSM derived from the LiDAR dataset were employed as reference data for training and validation purposes. A full-waveform LiDAR campaign for the entire BFNP was carried out within three days in June 2012 in a leaf-on condition. The data were recorded by the Riegl 680i sensor (350 KHz pulse repetition rate, nominal point density of 30-40 points m 2 , average altitude 650 m above ground) at a 0.32 m pulse footprint diameter and 300-400 m swath width (depending on flight altitude) by Milan Flug GmbH. The waveform LiDAR data set was decomposed in XYZ coordinates of individual returns by a sum of Gaussian functions to derive a dense 3D point cloud featuring return number and intensity [45]. DTM and DSM were developed using TerraSolid software. After processing LiDAR data to extract a height value for each point and noise filtering, the LiDAR point clouds were rasterized to a grid by assigning to each 1 m × 1 m grid cell (pixel) the lowest (DTM) (see Figure 3) and the maximum (DSM) height value of the respective LiDAR points [46]. Geographic coordinates of the models are that of the coordinate reference system used in this study, which is the DHDN/3-degree Gauss-Kruger zone 4. A land cover (LC) map, generated by local experts for the selected areas, was used to delineate the objects (landscape patches) within the image and identify the vegetated areas of interest for this study. Two sets of vector-based forest habitat-type information were prepared by experts using visual interpretation of color infrared aerial imagery in 2008 and 2012 and were provided by the Administration of BFNP. A total of 25 habitat types embracing different age and also nonforest classes (residential areas, waterbodies, railways, and asphalted roads) were existent in the LC map of 2012, summarized into six (6) main forest habitat types of coniferous, deciduous, mixed, lying deadwood, standing deadwood, and clear-cut [47]. The majority of all conifer and deciduous ages belong to [5,40) and [2,5) height classes, while the [0.6, 2) class mainly comprises schrub pines and mixed stands. All available data, maps, and the areas of interest that were selected for the evaluation phases are shown in the Appendix A. The accuracy with which the LC map was created is taken for granted, but some objects might have erroneously been assigned to each vegetation class or incorrectly mapped to their geometry thus, not ensuring the evaluation consistency; however, this small influence of such cases is ignored.
Although a period of 2 months intervened between LiDAR and airborne acquisitions, no significant alterations of height were expected in the selected LC and habitat classes involved in this study, favoring the consistency of measurements. The BFNP forest administration, as knowledgeable experts, certified that no major forest regeneration or felling occurred at that time interval. More specifically, bark beetle mapping in the whole year of 2012 indicated that just 12 ha had been infested in the whole park, which is completely negligible. Further, no windthrows occurred during this time span. All these aforementioned facts resulted in tree differences of less than 10 to 20 cm. The CHM was extracted as a result from pixel-by-pixel subtraction of the DTM, indicating the ground surface, from the DSM, indicating the top of the vegetation. The processing was performed using QGIS, its GDAL library, and the algorithm raster calculator, with 1 m spatial resolution. CHM always meets the spatial resolution of the input image with pixel values corresponding to the height of the land surface points.

Methodology
The approach for the canopy height estimation consists of two main parts, the preparation of input data and the training phase, as illustrated in the flowchart (see Figure 4). The main core of the processing pipeline comprises the following actions: (i) the derivation of texture features from the multispectral orthoimages (bands R, G, B, and NIR) for each vegetation class object, (ii) the application of a number of data processing methods for the handling of indefinite values, and (iii) the supervised learning within the classification. Furthermore, in an attempt to optimize the performance, a hierarchical study of multiple spatial resolutions (i.e., 40 cm, 2 m, 10 m) is evaluated (see Figure 5), in addition to the airborne 40 cm and the simulated Sentinel-2 (at 10 m) ones. Given the appropriate types of image input data, the classification model can be trained to categorize each land object (LC class segments) towards one of the predefined height classes. Classification accuracy [48] to the height classes is evaluated both in terms of overall accuracies and kappa coefficient [49,50].

Texture Feature Extraction
A number of 24 texture features (see Table 2) are calculated per selected object (landscape patch) for each individual band of the airborne images to characterize intra-object variability and heterogeneity. Textural characteristics are expected to reveal spatial and structural properties of the object. The selected features are the ones recently used in Petrou et al. [33] and Petrou et al. [51] for the discrimination of habitat types based on the vegetation height calculated for the respective LC classes. For a specific feature, a texture value is calculated for each pixel of an object, based on the values of its surrounding pixels within a square moving window of predefined size. The process is repeated for every selected object and all features and bands. The specific texture value of each object for each image band is generated by averaging the texture feature pixel values. Pixels taken into consideration are only those whose neighborhood (surrounding pixels inside square moving window) of the specified radius totally lies within the area of the object under consideration, in order to avoid inconsistencies existing across the borders of each land object. Considering all pixels of the object, or leaving a margin of constant number of pixels can be used as alternatives. Note that each of the 24 texture features uses different moving window sizes for the same object; thus, the pixels taken into account for the calculation of the average value of each feature may be different. Appropriate padding with the edge values was used for boundary land objects.
Local variance, provides an indication of the intra-patch heterogeneity. Windows of 3 × 3 and 5 × 5 pixels have been employed to capture and express local variations in as much detail as possible. For pixel resolutions of 40 cm and 2 m, but for 10 m in minor degree due to the much bigger spatial coverage of the derived windows, the above windows appear small enough to capture variations within the same habitat (land object) and restrict edge effects with neighboring ones. At the same time, they are large enough to avoid expressing variability within the same plant (e.g., conifer tree). As far as feature computational complexity is concerned, for each central pixel surrounded by a window of n × n pixels, the only possible loop might involve the n 2 operations required to calculate the difference between each pixel and the mean pixel value in the window. Thus, the complexity of O(n 2 ) can be considered.
Local entropy (LE) is a measure of variability, heterogeneity, and the overall information contained in an image. In particular, it offers an indication of texture randomness in the studied area. Similar to local variance, LE is calculated on a per pixel basis. For the calculation of the LE, initial pixel values ranging in [0,255] were used. Due to potential sensor noise, insignificant value fluctuation would be considered as different values and could result in an unwanted entropy outcome. Two alternative quantization techniques, also introduced in Petrou et al. [33], are employed as a countermeasure. In a window of n × n pixels, n 2 pixels are serially searched to identify the distinct values. As the distinct values are less than or equal to n 2 , a maximum of n 2 iterations are required for entropy calculation. Therefore, O(n 2 ) can be considered as the complexity of the calculation of entropy for each moving window in an object.
For the characterization of relative variations within a small neighborhood compared with its surrounding one, the ratio of the entropy values (LER) of two concentric windows is extracted in the same way as in Petrou et al. [33]. Two versions of LER calculation are tested depending on the utilization of the pixels of the inner window. As a note, the smaller the ratio, the more homogeneous the immediate neighborhood of the central pixel is, compared to its broader surroundings.
Local binary patterns (LBP) [52] are also tested in capturing local changes in texture, computed on a per pixel basis. For each pixel, its surrounding pixels in a circle of predefined radius are considered for the calculation. Starting from the pixel on the left of the central one, a scanning of the surrounding pixels in a clockwise order takes place. An LBP feature variation is also calculated, namely, the rotation invariant, by considering every surrounding pixel in the circle as the starting point. The respective binary and sequentially decimal numbers are calculated with the smallest assigned now to the central pixel. To counterbalance the LBP drawback of ambiguity, two variations of the LBP algorithm are tested, the local ternary patterns (LTP) and the local ter-binary pattern (LTBP), with the incorporation of a third value d in labeling neighboring pixels. Considering now a moving window of n × n pixels (n = 2r + 1, where r is the defined circle radius), less than 4n pixels (the pixels of the square enclosing the circle around the central pixel) are compared with the central one, for the extraction of the binary or ternary numbers. Thus, the complexity of the calculation of LBP-based features can be considered O(n).

Handling of Indefinite Values, Outlier Removal, and Data Normalization
All pixels considered for the calculation of texture features, both central and those within the defined surrounding windows, shall belong to the specific object under consideration. As a result, the texture features' calculation is limited to the area of each individual object, avoiding influences and biases from neighboring land objects. If this requirement is not fulfilled for a calculated feature, in cases such as with objects consisting of very few pixels, its value remains indefinite (NaN). This results in objects with NaN values in certain texture features (e.g., LER), whose calculation requires surrounding windows larger than the dimensions of the object, or all texture features, when one of the dimensions of the object is restricted to solely a few pixels. Prior to the classification to a height class, small objects with all their texture values NaN are excluded from the process, since no information for the classification is available. For objects with partially missing information, where only some of their texture values are indefinite (NaN), three (3) distinct processing approaches, followed by outlier removal and appropriate normalization, all firstly introduced as a pipeline in [33], are employed: -The first approach, coded as Feat with the derived dataset Feat. dataset, refers to the exclusion of one or more features from the classification process, which have missing data to at least one object. By reducing the number of final features used for each object, memory requirements and classification processing cost are reduced; however, the resulting feature vector may have significantly less discriminatory power compared to the entire set; -In the second approach, known as listwise deletion, case deletion, or complete-case analysis, objects with missing data to at least one texture feature are excluded, reducing the number of finally classified objects and affecting the representativeness of the results. Listwise deletion, coded as Obj with the derived dataset Obj. dataset, is supported by the fact that the assumption that missingness is not related to the observed and missing variables, i.e., missing-completely-at-random (MCAR) assumption [53] is not valid for the missing data; -In the third approach, coded as MI with the derived dataset MI. dataset, a multiple imputation technique of the missing data with approximated values is evaluated and employed. In particular, the Amelia II method [54] is used, based on the assumption that the values are drawn from a multivariate gaussian distribution, instead of drawing values from an unconditional distribution, where the missing data are filled in by randomly selected values among the observed ones. Five (5) complete datasets, including the values of all features from every image band, are created, averaged into one set to be used in the classification process. Five imputations have been selected as a good trade-off between efficiency and processing time, according to the formula proposed by Rubin [55].
Following the imputation of indefinite values, an efficient simple box plot approach, coded as OR, for multivariate data based on Mahalanobis distance has been employed as a check mechanism for outliers' detection, i.e., objects, which appear to be inconsistent with the remaining objects of the site, based on their texture features values. Objects with extreme values in a large distance from the median value, for each texture feature, are considered outliers and removed (OR. dataset).
Considering that each texture feature outcome value belongs to a different range, appropriate normalization of the data may prove effective to prevent texture features with larger values from having a higher impact than ones with smaller values during the classification process. The linear zero-mean feature normalization method, coded as Norm with the derived dataset (Norm. dataset), has been tested as providing comparable classifications results with the scaling to the range of [0, 1], and the nonlinear softmax scaling techniques [56] in Petrou et al. [33].

Classification
Classification is performed on an object basis, using a series of machine learning algorithms with each object of the LC map characterized by a vector of texture feature values. Texture values constitute the classification features. As far as the "ground truth" (i.e., the target for the classifiers' optimization) is concerned, CHM is used as the reference layer. The average canopy height is calculated per object out of its pixels in the CHM. The calculated height is mapped to one of the six height classes, and the respective label is assigned to the object. These are used for the training of the classifiers and for the evaluation of classification results. Objects having at least one texture feature assigned to them (i.e., not all values are indefinite) are classified to one of the six or four GHC height categories reported in Section 2.2.
Ten (10) classifiers have been applied to test the discriminatory potential of the extracted features. Decision trees and support vector machines (SVMs) [57] classifiers were employed, since they are extensively used in various remote sensing applications [58][59][60][61][62], producing high accuracies. Two basic decision tree implementations, namely, an implementation of the C4.5 algorithm introduced in Quinlan, 1993 [63], J48, and a reduced-error pruning implementation [57], REPTree, were used as individual classifiers. Aiming to reduce the generalization error and improve the classification performance of the individual classifiers, random forests [64], bagging [65], and AdaBoost.M1 [66] were employed as ensemble methods, with J48 and REPTree used as base classifiers. Furthermore, three SVM classifiers with linear and rbf kernel, with and without fitting logistic models to the output [67,68], and using sequential minimal optimization (SMO) have also been tested; for the classifiers' implementations, a WEKA environment was used.
The REPTree algorithm with a fast reduced-error pruning approach has a computational complexity of O(mn log n), where n is the number of land objects and m the number of features. However, J48 implementation uses more complex pruning (subtree raising); thus, its complexity is increased to O(mn log n) + O(n(log n) 2 ). The complexity of ensemble classifiers is calculated by multiplying the complexity of the basis classifiers (J48 and REPTree) by the number of trees t produced. In the Bagging method, however, due to the increased number of constructed trees (100) produced in relation to that for the AdaBoost.M1 (20), the computation time is increased by 5 times. Random Forests is the fastest method among the ensemble decision tree techniques, with a complexity of class O(tn log n log m), since log(m) + 1 attributes are considered for the separation of each node, in contrast to m used in simple J48 and REPTree implementations. Regarding the complexity of SVM, mostly in the case of nonlinear kernels, it is usually in the order of O(n 2 ) to O(n 3 ) since quadratic programming (QP) optimization is required.
Three (3) distinct experimental evaluation phases, with different versions each, have been conducted in this study, depending on image pixel resolution, the selected areas of interest, the size of moving window during texture feature extraction, the use of MRA analysis, and the number of height classes. Experimental phase 1 evaluated the canopy height estimation utilizing the very high resolution airborne dataset at 40 cm. Due to the relatively high computational cost of texture analysis for each image band and consequently for each image, three areas of interest (AOI_1,2 and 3) with a surface coverage of 6 km 2 each were initially extracted from the BFNP. They contain the most representative samples belonging to the first three vegetation height categories. In order to further improve the classification accuracies, a 50% increase in moving windows' dimensions (Phase 1') was tested.
The scope of experimental phase 2 was to emulate Sentinel-2 images with 10 m spatial resolution out of the airborne multispectral ortho imagery and to evaluate the ability to derive height information from texture analysis. Sentinel-2 imagery is the largest source of spaceborne input data for land resources assessments today. Simulation was employed for direct comparison reasons, as Sentinel satellites were not operational at the time of airborne data acquisition, which was 2012 for this study. For training and testing purposes, a larger area of interest (AOI_4) (see Figure A4 of Appendix A) was selected, covering an area of 36 km 2 . Due to the fact that 10 m spatial resolution would lead to an Obj. dataset with a very small number of objects using the initial dimensions of moving windows, a 50% decrease of windows' size was also examined (Phase 2'). Four height classes' results were evaluated in order to enhance the representativeness of observations. In this framework, only the Obj. dataset was studied as the one that outperformed other approaches in Phase 1.

Multiresolution Analysis
During evaluation phase 3, we examined the effectiveness and performance of the multiresolution analysis method in deriving the distinct GHC height categories. We proceeded with a 3-level analysis of the total area of interest 4 (AOI_4), using exclusively the Obj. dataset and the texture features described in Table 2, keeping the computational cost low and offering a degree of freedom in the parameters' configuration of each level. The third image input case uses an assessment of a combined input dataset, as follows. Particularly, a 3-level image pyramid is constructed with a subsampling factor of 5 (see Figure A4 of Appendix A). A collection of images at decreasing resolution levels is created, with top level 3 having 10 m resolution, level 2 of 2 m resolution and bottom level 1 (initial image) of 40 cm. For image downsampling cases, the resize data tool of ENVI software is used, particularly the pixel aggregate resampling method. The main scope of this hierarchical structure of resolutions is to classify very large objects using low spatial resolution, whereas objects of smaller size using higher image resolution. Training and classification are carried out separately for each level. This method is described in Figure 5 and by the following steps: Step 1. Start from the top of the pyramid and study one image level at a time.
Step 2. Identify objects belonging to the Obj. dataset for this level. Step 3. Proceed to the classification of the Obj. dataset to four height classes. Step 4. When on the bottom level, characterize objects not belonging to tje Obj. dataset (i.e., those with partially missing information in their texture feature vector) as 'not defined' and calculate the total area-based classification accuracy of the system. If not, continue with Step. 5.
Step 5. Proceed to the next lower level excluding the objects belonging to the Obj. dataset of the previous higher level. Continue with Step 2. For each level of analysis, all available classifiers are evaluated with the one achieving the highest accuracy participating in the calculation of the total system area-based classification accuracy.

Validation
A variety of measures (see Table 3) have been employed to assess the accuracy of methods and results depending on the characteristics of each evaluation phase used. Table 3 provides the definition of the accuracy metrics used throughout this research work. During all the phases in this study and for the validation of classification results, a stratified 10-fold cross-validation method is applied and performed via WEKA 3 Data Mining Software. The stratified method, as one of the most widely used approaches [69,70] that provides more consistent results over repetitive evaluations, ensures that in the initial division of the total dataset into 10 segments (folds), each segment will have approximately the correct sample ratio of each height class. In particular, dataset objects are randomly divided into 10 non-overlapping segments, where each segment shall reflect the percentage of objects of each height class of the total dataset. This means that each section contains about one tenth of the objects of each height class. This reduces the fluctuation of estimates, and results are in accordance after iterative evaluations. Then, the sorting process is repeated 10 times, wherein every time one-fold is used for testing, and the remaining nine for training purposes. Each land object is used only once for testing. The final evaluation is calculated from the average of the performance of each individual classifier corresponding to each of the ten folds. In addition to object-based accuracy, area-based accuracies are also studied. Area-based accuracies are derived by summing the number of pixels of all correctly classified objects and then dividing by the total number of pixels of all objects involved in the dataset. In MRA, the total performance is extracted in similar way excluding the 'not defined' objects from the calculation. Cohen's kappa coefficient κ, as defined in Table 3, is widely used in the remote sensing literature as a measure of a classifier's accuracy, and more specifically to demonstrate the degree to which each classification result assimilates a random classification [71,72]. Values equal to 1 denote perfect classifications, while values equal to 0 denote random classifications. In general, values of κ greater than 0.5 indicate that the majority of the assignments into the height classes, excluding the correct assignments due to chance, are correct. As reference, for values of 0.41 ≤ κ ≤ 0.6, agreement is considered moderate, while for values 0.61 ≤ κ ≤ 0.8, agreement between observed and predicted values is considered substantial.

Results
The best results corresponding to the highest classification accuracies of all the evaluation phases are included in Table 4. Obj and MI datasets are the best-performing ones, producing the highest accuracies when SVM classifiers are employed. The high classification accuracies achieved throughout the different experimental evaluation phases, and the fact that the pilot areas of interest were mostly covered with forest phanerophyte (FPH) objects demonstrates the benefit of including texture features when dealing with high-resolution images (high producer's accuracy (PA)(%) and user's accuracy (UA)(%) of the FPH class), especially in areas of very high canopies (e.g., above 5 m).

Phase 1: Estimation of Canopy Height Using Ultra-High Resolution Airborne Imagery of 40 cm
The transformation of the NIR band of AOI_1 to a texture image based on LBP, LTP, and LTBP texture features is illustrated in Figure 6. Areas of low vegetation appear with more a heterogeneous texture compared to areas of tall vegetation, where canopy, tree trunks, and bare ground alternate, resulting in a homogeneous texture. The introduction of a d range in LTP and LTBP variations reduces unwanted effects caused by small variations in neighboring values due to noise affliction, compared to LBP features. The highest overall classification accuracies (OA) acquired for all different datasets, derived by applying different combinations of data processing methods, both for the six vegetation classes and for the four classes, are illustrated in Figure 7. For each data processing method, only the highest accuracy out of the ten classifiers is presented, characterizing the combination of a method with a particular classifier. Worth noting is that no significant fluctuations across different classifiers were observed. The maximum accuracy ranges from about 56% to 64% for the six classes and reaches 68% for the four classes. The lowest accuracies are observed in the "Feat", "Feat-Norm", and "MI" datasets, including the total 605 land objects, while the highest ones are observed for the "Obj", "Obj-Norm", and "MI-Norm" datasets, where listwise deletion was applied in small objects with at least one indefinite value. Deleted objects in the latter case were those with dimensions smaller than a 21 × 21 pixel moving window, which was the largest surrounding window employed in the texture analysis. The calculated κ coefficient values showed results far from chance (greater than 0.4), indicating that FPH bias was limited and the other classes were only slightly affected.
Confusion matrices (see Tables 5 and 6) correspond to the highest accuracies acquired for both the six (6) and the four (4) height classes, respectively. High values of producer's (89.63% and 88.15% PA) and user's (77.73% and 76.13% UA) accuracy are observed for the class [5,40) or FPH, reflecting low values of omission and commission errors, respectively; the majority of [5,40) classes were classified correctly, as can be seen. Moreover, the cell values '47' and '45' (referring to objects of the closest in height category of [2,5) or tall phanerophytes (TPH)) of the table indicate that the classification results are slightly biased for the benefit of the FPH class containing the majority of the objects, as also illustrated by the low percentages of -PA-and -UA-for the class [2,5). Results seem to be slightly skewed by the domination of the FPH classes, outnumbering the other classes. Even in the class which was the result of the merging process, the classifier achieves a relative high user's accuracy (73.57%), showing signs of good performance for observations belonging to the extreme classes but not so satisfactory for the intermediate ones, such as for [0.3, 0.6) class with 30.95% -PA-. However, the correctly classified observations of each class remain higher than the misclassifications derived, for Table 6. Table 5. Confusion matrix from the classification of the "Obj-Norm" dataset using the support vector machine (SVM) classifier with linear kernel fitting logistic models to the output.   Table 6. Confusion matrix from the classification of the "MI-Norm" dataset using Bagging with J48 as the basis classifier after the merging of the first three height classes. The total object-based accuracy of 64.35% in Table 5 corresponds to an area-based accuracy of 83.48%, whereas the accuracy of 67.77% in Table 6 corresponds to an area-based accuracy of 88.99%. This percentage refers to the total area in pixel that the correctly classified objects occupy. During phase 1', insignificant influence was observed in the final performance, in general, providing negligible positive changes in the results. Further increment in window sizes brings along very high computational costs of the calculation of texture features.
Based on the average classification accuracies (see Figure 8) achieved from the application of various classifiers for the '6 classes' case, it is concluded that the SVM-Lin.M algorithm outperforms the other classifiers, followed by the Random Forests and Bagging methods. Regarding the decision tree algorithms, ensemble classifiers, i.e., Random Forests, Ada-Boost.M1, and Bagging, in which multiple trees are generated, perform better than the single J48 and REPTree, while Bagging is superior to the Boosting method. Furthermore, it is evident that fitting the logistic model to the SVM output improves the accuracy of single-SVM with a linear kernel.

Phase 2: Estimation of Canopy Height Using Spaceborne Spatial Resolution of 10 m
From the implementation of phase 2, it was observed that the Obj. dataset has only 90 objects, with the majority of them belonging to the [5,40) height category. Thus, the very high classification accuracy of 91.11% presented in Figure 9 cannot be taken into consideration, since the dataset has no statistically significant number of objects to rely on. By contrast, after reducing the windows' sizes (Phase 2'), the Obj. dataset includes 714 objects, including: (i) 44 in [0, 0.6) or dwarf chamaephytes (DCH)-shrubby chamaephytes (SCH)-low phanerophytes (LPH); (ii) 111 in [0.6, 2) or mid-phanerophytes (MPH); (iii) 124 in [2,5) or TPH; and (iv) 435 in [5,40) or FPH objects. The proposed texture features lead to successful discrimination of the four vegetation height categories with accuracies starting from around 76%, reaching up to 85%, and from 0.58 up to 0.74 for the κ coefficient, respectively. The highest classification accuracy observed was 85.29% (SVM.Lin-M classifier), with a kappa coefficient of 0.738, implying a small bias towards the more populated class of FPH. Although only a simulation of the ground sampling distance (GSD) was performed on this phase, Sentinel-2 imagery potential usage in habitat monitoring applications, enhancing the vegetation height characterization capacity, may be supported from the results in Figure 9. Once a 10 m spatial resolution is considered with pixel dimensions of the moving windows remaining the same, each moving window corresponds to a bigger neighborhood in terms of square meters. For example, the LV2 texture feature which incorporates a 5 × 5 pixels window size with spatial resolution of 40 cm covers areas of 2 m × 2 m, and for a spatial resolution of 10 m covers areas of 50 m × 50 m. It is inferred that having lower spatial resolutions and keeping the same window size or larger window dimensions for the same spatial resolution generally improves the classification performance. This finding relates to the size of the 2D cross section of tree crowns and whether the textural characteristics can identify local variations in object-to-object transitions or to the empty space.
The confusion matrix for the best classification results achieved with Sentinel-2 simulated dataset following a listwise deletion ('Obj') and for a 50% decrease in window sizes is demonstrated in Table 7. The SVM classifier with linear kernel fitting logistic models to the output achieved an overall accuracy of 85.29%. As noticed, most of the GHC classes were identified with high user's and producer's accuracies. A small bias of the classifier towards the FPH class is again observed, resulting in misclassification of some objects of the previous TPH height class as FPH. The high value of the kappa coefficient (0.738) indicates that the results are not derived from random classification, and the majority of the class assignments, excluding correct assignments due to chance, are correct. In the theoretical case, all objects were classified as FPH, and the overall accuracy would be 60.92% based on the number of FPH objects over the total ones. However, in that case, the value of κ would be very low, equal to 0, highlighting the biased results.

Phase 3: Fusion of Spatial Resolutions in Hierarchical Mode-Multiresolution Analysis
From the previous phases, it can be inferred that the 10 m spatially aggregated image achieved height estimates with gain in precision compared to the dataset of 40 cm resolution, whereas the Obj. dataset outperformed the other data processing methods in most cases. This result is similar to the findings of [73], who reported near constant mean spectral values and decreasing variances following spatial averaging to coarser pixel resolutions within forested pixels. Figure 10 presents spatial distribution of land objects at each level of analysis. Land patches with smaller dimensions than the minimum required for the calculation of certain texture features (e.g., LER) tend not to belong to the Level 1 Obj. dataset and are thus characterized as 'not defined'. Classification results produced by this hierarchical analysis are illustrated in Figure 11. Tiny objects, containing very few pixels (i.e., objects not even belonging to the Level 1 Obj. dataset) and the nonvegetative ones are excluded from the process and are depicted with red in the figure. It is clearly observed that a very high area-based classification accuracy of 91.39% is achieved, both in large and small area objects (see Table 8). The highest classification accuracy observed is 91.11%, with a kappa coefficient of 0.812 for Level 3, whereas the lowest accuracy of 66.55% and kappa value of 0.470 is observed in Level 1. As the image pyramid from the bottom (Level 3) to the top level (Level 1) is analyzed, more objects tend to remain in the Obj. dataset, with the classification accuracies and kappa coefficient values reducing. This outcome demonstrates that larger objects analyzed with low spatial resolutions are classified with a higher accuracy assessment ratio. Figure 11. Results of the classification of land objects to height classes using the multiresolution analysis as derived from the evaluation of the fusion of the datasets with varying spatial resolutions in hierarchical mode.

Discussion
Multiple experimental evaluations showed that application of texture analysis based on local variance, entropy, and local binary patterns proved to be particularly effective in canopy height estimation from 2D images, leading to a marginally satisfactory classification of land objects into six height classes, while percentages are improved in the '4 classes' case. Previous works dealing with canopy height and structure [74][75][76] have mostly ignored texture. It is worth focusing on the particularly high rates of the total area correctly classified in each phase, implying that objects of a sufficiently large size favor the extraction of more powerful texture features, resulting in higher separation ratios during the classification process. Our results reveal that finding the optimum dimension of moving windows is a critical point for the discriminatory power of textural characteristics and for the efficiency of classification. This depends on the spatial resolution of the input images and on the size of the 2D cross-sections of tree crowns. Undoubtedly, image downgrade can act as a filter and lead to better performance as seen in phase 2. Higher canopy crowns achieved a better performance compared with the first three height classes, due to the fact that tall canopies appear with a more homogeneous texture.
As far as data imputation, outlier removal, and normalization are concerned, they had little to no significant effect in feature discriminatory power, whereas listwise deletion of land objects with missing information resulted in higher classification accuracies. Particularly, filling in of missing values NaN (imputation) provided a slight positive effect in accuracy compared to the simple exclusion of texture features with NaN values and the dimensionality reduction of the attribute space. However, discarding objects considered as outliers does not significantly influence accuracy. For the four height classes case of phase 1, "Feat" and "Feat-Norm" datasets achieved accuracies almost similar to those of other datasets, while the feature space and the computational cost were reduced. Thus, it is inferred that significantly localized variations of texture are proven crucial for canopy height characterization, instead of features that do study texture in larger neighborhoods. The observed advantage of features extracted with small moving windows, over features with large ones, supports the selection for per pixel calculation of values and their averaging over each object. High accuracies were achieved among different classifiers' application, with ensemble decision trees Random Forest, Bagging with REPTree, and SVM with rbf kernel outperforming the single trees approaches. Decision tree ensemble classifiers, in particular bagging trees, have been consistently among the top-performing classifiers in both canopy height estimation [33] and habitat classification applications [77] and can therefore be recommended for similar applications. Taking into account that the practical algorithmic complexity of SVM is between those of AdaBoost.M1 and Bagging (see Section 3.3), it is concluded that the Random Forest method is the best trade-off between classification accuracy and computational complexity.
To the extent of our knowledge, relatively few studies have so far assessed the estimation performance of Sentinel-2 to map forest properties, including the growing stock volume for Mediterranean forests, using regression tree ensembles [78], and canopy cover, as well as the LAI index for boreal and temperate forests, with generalized linear models [79] and multiple linear regression models [80], respectively. Especially for canopy height estimation from Sentinel-2, only the attempt of [5] has been reported so far; a two-layer perceptron trained on a small number (<200) of field plots concluded that predictions from Sentinel-2 have slightly lower errors than those from Landsat for a boreal forest. As such, the exploitation of spaceborne pixel resolution of 10 m, stemming from the simulation of airborne imagery as Sentinel-2 derived, could lead to valuable insights of Sentinel-2 imagery potential for the estimation of canopy height through texture analysis. High overall classification results of up to 85.29% object-based and 91.94% area-based demonstrate the efficacy of the proposed methodology, texture measures, and type of data. Hence, given a moderate amount of reference data, dense vegetation height maps with 10 m GSD can be derived at country-scale. However, further analysis should be done with real Sentinel-2 imagery in order to extract consistent and confident conclusions. Although in situ or LiDAR data provide much more accuracy in comparison to our method, canopy height retrieval from high resolution optical satellite imagery could in some cases reduce the need for expensive LiDAR campaigns. In addition to the high spatial resolution, Sentinel-2 provides a new image up to a frequency of 5 days. In a similar context is the representative example of [6], where Sentinel-2 multispectral images were used to regress per-pixel vegetation height for Gabon and Switzerland using a deep CNN. Suitable spectral and textural features that were applied resulted in a mean absolute error (MAE) of 1.7 m in Switzerland and 4.3 m in Gabon, correctly estimating heights up to 50 m.
Extensive research has demonstrated that classification based on multiresolution analysis methods resembling the human vision system shows effectiveness over traditional techniques that do not make strong use of region-based spatial information [81]. Hence, these methods are widely used for the classification of textures and for many image analysis tasks. The most commonly employed multiresolution analysis techniques include the Gabor transformation [82] and the wavelet transformation [83], where the texture image is transformed to local spatial-frequency representation by wrapping the image with appropriate band-pass filters tuned to specific parameters. In this study, however, the spatial information for the MRA was derived over a range of resolutions, choosing the subsampling simple aggregation method instead of scale-space transform or wavelet decomposition, achieving area-based classification accuracies of over 91% in total. MRA seems to provide significant influence in classification performance, in general, compared to other approaches studying different set of parameters and data processing methods. The 3-level fusion of spatial resolutions allows studying large areas in hierarchical mode at reasonable processing times and a low computational cost, giving the freedom to customize each level individually, e.g., spatial and spectral resolution, moving window dimensions, and classifier selection.
Numerous actions can be taken towards a potential increase of the classification accuracies, either by the utilization of a wider region in the spectrum to enhance the feature analysis or the employment of images from both the peak and decline of vegetative periods. However, the applied methodology and the available dataset pose constraints for assessing the robustness of the method to different spectral and vegetation conditions. Worth mentioning here is that a future merging of adjacent objects belonging to the same height class is not a good practice, since the target is to examine the classification of land objects of different habitat categories. It is desirable also for the methodology to make decisions for small objects and moving windows, extracting the highest degree of height information from the image. The detailed segmentation of the LC map, however, creates regions of small dimensions, which end up being rejected from the process.
The reported classification accuracies pertain to areas of BFNP having a small degree of inclination (see Figure 2). The utility of the approach at higher inclined areas was not examined. Further research steps comprise: (a) the consideration of multiple areas whereupon investigation may reveal how inclination-and species-dependent features are, or how they are affected by the growing stage of a single species; (b) the assessment of supervised (e.g., PCA, LPP, NPE, Isomap) and unsupervised (e.g., LDA) dimensionality reduction techniques or feature subset selection (e.g., correlation-based feature selector (CFS)) methods, aiming at reducing the processing cost of both feature extraction and classification processes with insignificant loss in accuracy; (c) the use of multiple classifiers trained on feature vectors of reduced size as a more natural alternative; thus, in regions where not all of the texture features would have been calculated (but only some of them), only the trained classifier of that specific features would have been utilized; and (d) the use of very deep convolutional neural networks (CNNs) which are able to learn a tailored multilevel feature encoding for a given prediction task from raw images, given a sufficient (large) amount of training data.

Conclusions
The majority of previous studies employing texture analysis focused on coniferous [32,84] and hardwood [32] forests, or oak, beech, and spruce trees [40] of several meters high. In the present work, a wide range of tree species were assessed, including coniferous, deciduous, dead woods lying and regenerated, ecotones, meadows, mixed stands, scrub pines, both with various height classes ranging from less than 5 cm to some tens of meters. In this context, there is a high likelihood that the approach can be scaled up to global coverage.
Texture descriptors based on local variance, entropy, and local binary patterns, followed by object-based classification, proved particularly effective in discriminating among different height classes related to habitat categories, useful for habitat monitoring and land use mapping applications. Resulting in classification accuracies of over 85% for the simulated Sentinel-2 image, or over 91% in fusion of original 40 cm, and simulated 2 m and 10 m images (MRA), they indicate their high potential for canopy height estimation.
Being effective in such a complex area of BFNP supports the expectation that the methodology can be applicable in other geographical areas, especially of similar climatic and topographic conditions or with a similar vegetation and forest type; however, the degree of transferability is still to be further investigated. The suggested approach does not intend to replace data sources offering more accurate and dense canopy height information, such as the LiDAR-derived CHM used as reference layer, but rather to act as a cost-effective and higher-frequency-monitoring surrogate of more expensive and resource-demanding approaches. Funding: This research study leading to these results has been funded and supported by the European Union's Horizon 2020 research and innovation program under Grant Agreement No. 641762, ECOPOTENTIAL (http://www.ecopotential-project.eu/). Data has been provided by the Data Pool Initiative of the Bohemian Forest Ecosystem.

Acknowledgments:
We gratefully acknowledge the support of Kalliroi Marini, for her initial literature review and involvement in draft preparation. Furthermore, authors greatly appreciate the useful explanations related to the acquisition and creation process of LiDAR-derived digital models and true-color orthomosaics that have been given from Milan Flug GmbH and ILV-Fernerkundung GmbH companies, respectively. Part of the experiments have been conducted during the internship of C.B. at AeroPhoto Co. Ltd. company; hence, credits for technical support and access to computing resources shall be given.

Conflicts of Interest:
The authors declare no conflict of interest. The funding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: