Urban Tree Health Classification Across Tree Species by Combining Airborne Laser Scanning and Imaging Spectroscopy

Declining urban tree health can affect critical ecosystem services, such as air quality improvement, temperature moderation, carbon storage, and biodiversity conservation. The application of state-of-the-art remote sensing data to characterize tree health has been widely examined in forest ecosystems. However, such application to urban trees has not yet been fully explored—due to the presence of heterogeneous tree species and backgrounds, severely complicating the classification of tree health using remote sensing information. In this study, tree health was represented by a set of field-assessed tree health indicators (defoliation, discoloration, and a combination thereof), which were classified using airborne laser scanning (ALS) and hyperspectral imagery (HSI) with a Random Forest classifier. Different classification scenarios were established aiming at: (i) Comparing the performance of ALS data, HSI and their combination, and (ii) examining to what extent tree species mixtures affect classification accuracy. Our results show that although the predictive power of ALS and HSI indices varied between tree species and tree health indicators, overall ALS indices performed better. The combined use of both ALS and HSI indices results in the highest accuracy, with weighted kappa coefficients (Kc) ranging from 0.53 to 0.79 and overall accuracy ranging from 0.81 to 0.89. Overall, the most informative remote sensing indices indicating urban tree health are ALS indices related to point density, tree size, and shape, and HSI indices associated with chlorophyll absorption. Our results further indicate that a species-specific modelling approach is advisable (Kc points improved by 0.07 on average compared with a mixed species modelling approach). Our study constitutes a basis for future urban tree health monitoring, which will enable managers to guide early remediation management.


Introduction
Urban trees play a crucial role in mitigating urban environmental problems by providing a range of crucial ecosystem services, e.g., reducing air pollution, moderating temperatures, reducing stormwater runoff and storing carbon [1,2]. Increased recognition of the high value of trees to urban environments has encouraged municipal tree planting programs around the world [3]. Although it has been reported that trees at higher, cooler latitudes may benefit from urban warming [4,5], urban trees, in general, are under continuous pressure from environmental problems typical to the urban environment. Paved surfaces modify the moisture dynamics of underlying soils [6], increasing the risk of water stress for trees, and therefore, their susceptibility to pests [7][8][9]. The reduced soil aeration and limited rooting The heterogeneity of tree species found in urban environments further complicates urban tree health mapping. Xiao and McPherson [47] used the NDVI for a binary classification of healthy vs. unhealthy trees at the campus of the University of California, Davis. They found it was hard to map tree health across different species or in places where many tree species co-existed. For example, some healthy, well-grown conifer trees had the same NDVI value as an unhealthy broadleaf deciduous tree. Indeed, tree characteristics (e.g., leaf size, water and pigment content and tree structure) may vary among healthy trees of different species, and as a result, serious tree health classification errors could be produced when applying the same classification model across different tree species. Therefore, there is a clear need to identify the most informative spectral and structural indices related to tree health across species.
The overall aim of this study was to assess the potential of the synergetic use of ALS and airborne HSI for urban tree health mapping. To this end, a tree health classification procedure using a Random Forest (RF) classifier with input indices derived from the two data sets was established. In particular, classification was conducted for three dominant broad-leaved species in Brussels, Belgium. Based on the presented research progress on urban tree health, and known hurdles for the widespread adoption of high-resolution remote sensing data in urban tree health assessment, we defined three specific research questions: (1) What are the differences in the predictive performance of ALS data, airborne HSI and their combination for different aspects of urban tree health (i.e., defoliation, discoloration and their combination, i.e., damage)? (2) To what extent do tree species mixtures affect the accuracy of urban tree health classification? (3) Which input indices derived from ALS and airborne HSI are most important/informative for tree health classification?

Airborne Laser Scanning and Hyperspectral Imagery
Airborne laser scanning (ALS) data, covering the eastern part of Brussels, Belgium (Figure 1), was collected on 11 June 2015, by Aerodata Surveys Nederland BV using a full-waveform sensor with a wavelength of 1550 nm. The sensor was mounted on a plane with an altitude of 3000 m above the ground. The average point density was 15 points/m 2 . After removing noise points, a digital terrain model (DTM), digital surface model (DSM) and normalized digital surface model (nDSM) were derived using LAStools software (http://lastools.org) with a spatial resolution of 0.25 m. In addition, a normalized ALS point cloud was obtained by calculating the height difference between the original ALS points without noise points and the DTM layer, which was subsequently used for deriving indices relevant to tree health (see Section 2.4).

Field Data Collection
Collection of field data was carried out in June 2015, coinciding with the remote sensing data acquisitions, for a total of 131 trees. The trees were randomly selected across the study area ( Figure  1) and covered a range of different sizes and health status. Tree height, diameter at breast height (1.37 m; DBH) and crown diameter in a north-south direction and west-east direction were recorded. The health status was scored by two independent researchers using the FAO (The Food and Agriculture Airborne hyperspectral imagery (HSI) over the same area was acquired on 30 June 2015, using the Airborne Prism Experiment (APEX) sensor mounted on a plane, at a flying height of about 3600 m. An automated processing chain at the Flemish Institute for Technological Research was used to preprocess the imagery [48]. The radiometric, spectral, and geometric calibration was performed by means of calibration cubes generated from data measured and collected on the APEX Calibration Home Base (CHB) hosted at DLR Oberpfaffenhofen, Germany [49]. To reduce smile effects (i.e., the central wavelength depends slightly on the column pixel location) and spectral instabilities caused by pressure and/or temperature variations, an in-flight spectral wavelength shift analysis was conducted on the basis of atmospheric absorption features to reassign new central wavelengths to each pixel. The geometric correction was performed via direct georeferencing. Input data from the sensor's GPS/IMU, boresight correction data and the derived DTM/DSM from ALS data were further used during the geometric correction process. The data were then projected to a Belgium Lambert 72 coordinate system. The atmospheric correction was conducted using a MODTRAN4 radiative transfer model [50,51]. The resulting spectral data comprised 285 spectral bands distributed across the 412-2431 nm spectral range, with a spatial resolution of 2 m. Atmospheric absorption bands (412-450 nm, 1340-1500 nm, 1760-2020 nm, and 2350-2431 nm) were then excluded from further analysis.

Field Data Collection
Collection of field data was carried out in June 2015, coinciding with the remote sensing data acquisitions, for a total of 131 trees. The trees were randomly selected across the study area ( Figure 1) and covered a range of different sizes and health status. Tree height, diameter at breast height (1.37 m; DBH) and crown diameter in a north-south direction and west-east direction were recorded. The health status was scored by two independent researchers using the FAO (The Food and Agriculture Organization) guidelines for the assessment of forest crown conditions [22,23]. Crown defoliation and discoloration of each tree were separately estimated on a 0-3 score, and the damage scores were obtained by combining the defoliation and discoloration scores according to Lakatos et al. [22] ( Table 1). Therefore, each tree was labeled with three scores indicating the classes of defoliation, discoloration and damage, respectively. Because of the rare occurrence of trees with a crown defoliation or discoloration score of 3, we decided to merge these trees into class 2. Therefore, class 2 in this paper corresponds to trees with more than 25% crown defoliation or discoloration. The final distribution of tree health scores is shown in Figure A1. We focused on the four most dominant tree genera in Brussels, including Acer (mainly A. pseudoplatanus and A. platanoides; n = 37), Aesculus (A. hippocastanum; n = 34), Platanus (P. x acerifolia; n = 18) and Tilia (mainly T. x euchlora and T. x europaea; n = 42). Four tree health datasets were created for tree health classification (see Section 2.6.1): (1) Acer trees, (2) Aesculus trees, (3) Tilia trees, and (4) all the 131 trees. Platanus trees were not used as an independent dataset, due to the small sample size.

Tree Crown Identification and Delineation
We utilized an object-based classification procedure described in Degerickx et al. [46] to isolate tree canopies from non-tree objects. The main input data included a rasterized tree index [52] with a spatial resolution of 0.25 m generated using the normalized ALS point cloud, the nDSM and a NDVI image calculated from the HSI. The rasterized tree index was used to initially isolate tree canopies, and non-tree objects and the NDVI image was subsequently used to remove the building edges which were misclassified as tree canopies. Please refer to Degerickx et al. [46] for more details about the tree crown identification algorithm. The correct tree objects were then delineated using a watershed segmentation algorithm to obtain individual tree crowns. Tree crown identification and delineation were conducted in eCognition 9.4 (http://www.ecognition.com/).
To assess the accuracy of automatically delineated tree crowns (resulting layer), we manually drew tree crown polygons (reference layer) for randomly selected 40 trees (10 trees per tree species) based on visual interpretation of the nDSM and ortho-photo images with 7.5 cm spatial resolution. The overlapping proportions (OPs) between the polygon sets calculated by setting each of them as the basis layer, respectively (i.e., OP ref and OP result ) were used to assess the accuracy of the delineation algorithm [53].

ALS Intensity Normalization
ALS point cloud intensity has been recommended as important information for forest condition assessment [54]. The intensity values recorded by the ALS system correspond to the amount of energy reflected from the target to the laser sensor and relate to radiometric properties of the target. Other factors, such as environmental factors, angle of incidence, and the sensor-target distance (or range) also have an effect on the intensity [55]. Therefore, intensity normalization is important for successful tree health detection using ALS data. In this study, we used a distance-based approach [56] (Equation (1)) to normalize the intensity data, which are on an 8-bit scale. This method is based on the fact that the distance from the target to the ALS sensor and reflectivity of the target are the two important factors directly affecting the intensity values.
where I normalized is the normalized intensity value, I is the raw intensity value, R is the distance between the ALS sensor and the point, and R ref is the reference distance (average flight altitude). Ideally, R is calculated using the GPS time when the point was recorded combined with the plane position assigned to the point. Due to lack of information on flight paths, we used an alternative method to calculate R, which is the difference of the average flight height and the height of each point [57].

ALS Points Classification
ALS points for each tree were isolated using the automatically delineated tree crown polygon. To accurately calculate indices from ALS points, we developed a procedure to classify the normalized ALS points (cf. Section 2.1) for each tree as ground points, trunk points (under the tree crown) and crown points (the points within a tree crown). The crown points were further classified as leaf points and points from woody materials. First, the normalized ALS points with a height below 1 m were labeled as ground points and the remaining points were regarded as non-ground points. Next, the non-ground points for each tree were sliced into bins with a height of 0.2 m along the z-axis. Bins having a 2D projected area, which was calculated based on the x and y coordinates of the outmost points within the bin, smaller than a threshold were regarded as part of the trunk [58] (Figure 2). The threshold value in this study was set in a dynamic way, matching the basal area calculated based on the DBH measured in the field. The points in bins with a 2D projected area larger than the calculated basal area were labeled as canopy points. Finally, the canopy points for each tree were classified as leaf points or woody materials points based on the values of range-corrected intensity as many researchers found the intensity of ALS pulses reflected from vegetation foliage and woody materials to be different [59,60]. Based on the histogram distribution of the intensities of extracted trunk points, we found ALS pulses at 1550 nm reflected weakly from woody materials. Therefore, canopy points with an intensity higher than 134DN (95th percentile intensity value of all trunk points) were classified as leaf points and the remaining points were regarded as woody materials. This particular intensity threshold closely matches the threshold of 130DN determined by Shendryk et al. [30].
Remote Sens. 2020, 12, x 6 of 26 to be different [59,60]. Based on the histogram distribution of the intensities of extracted trunk points, we found ALS pulses at 1550 nm reflected weakly from woody materials. Therefore, canopy points with an intensity higher than 134DN (95th percentile intensity value of all trunk points) were classified as leaf points and the remaining points were regarded as woody materials. This particular intensity threshold closely matches the threshold of 130DN determined by Shendryk et al. [30].
(a) (b) Points in bins with a 2D convex hull area smaller than 0.23 m 2 and higher than 1 m were labeled as trunk points (in brown). Note that points from upper crown were excluded to avoid misclassification.

Deriving Tree Crown Specific ALS Indices
For each tree, 140 ALS indices, including height indices, intensity indices, point density indices, tree size and shape indices, were calculated (Table 2) from the ALS point cloud. The selection of indices was based on their known relevance for tree health assessment [26,29,30,54].

Hmad
Median Absolute Deviation of the height of the tree points: median(abs(H-Hp50)) Hipr Inter-Percentile Range of the height of the tree points: Hp75-Hp25 Hravg Ratio of the number of tree points above Havg to the number of all tree points Points in bins with a 2D convex hull area smaller than 0.23 m 2 and higher than 1 m were labeled as trunk points (in brown). Note that points from upper crown were excluded to avoid misclassification.

Deriving Tree Crown Specific ALS Indices
For each tree, 140 ALS indices, including height indices, intensity indices, point density indices, tree size and shape indices, were calculated (Table 2) from the ALS point cloud. The selection of indices was based on their known relevance for tree health assessment [26,29,30,54]. Table 2. Indices calculated from airborne laser scanning point cloud (H is the height of an individual point, and I is the range-corrected intensity of an individual point).

Height Indices Hmin
Minimum height of crown points Hmax Maximum height of the tree points Hran Difference between Hmax and Hmin Havg Average height of the tree points Hstd Standard deviation of the height of the tree points Hcv Coefficient of variation of the height of the tree points Hvar Variance of the height of the tree points Hske Skewness of the height of the tree points Hkur Kurtosis of the height of the tree points Hp05-Hp95 Height percentiles (i.e., 5,10, . . . ,95) of tree points Haad Average Absolute Deviation of the height of the tree points: mean(abs(H-Havg)) Hmad Median Absolute Deviation of the height of the tree points: median(abs(H-Hp50)) Hipr Inter-Percentile Range of the height of the tree points: Hp75-Hp25 Hravg Ratio of the number of tree points above Havg to the number of all tree points Hrmed Ratio of the number of tree points above Hp50 to the number of all tree points Hqav Average of the quadratic height of tree points

ALS Indices Description
Height Indices

Hm0_10n-Hm90_100n
Average height of tree points between intervals of the height percentiles (i.e., 0-10, 10-20, . . . ,90-100) normalized by Havg Intensity Indices Iam Average intensity of tree laser returns above Hp50 Ibm Average intensity of tree laser returns below Hp50 Imin Minimum intensity of tree laser returns Imax Maximum intensity of tree laser returns Iran Difference between Imax and Imin Iavg Average intensity of tree laser returns Istd Standard deviation of intensity of tree laser returns Icv Coefficient of variation of intensity of tree laser returns Iske Skewness of intensity of tree laser returns Ikur Kurtosis of intensity of tree laser returns Iaad Average Absolute Deviation of intensity of tree laser returns: mean(abs(I-Iavg)) Imad Median Absolute Deviation of intensity of tree laser returns: median(abs(I-Ip50)) Ip05-Ip95 Intensity percentiles (i.e., 5,10, . . . ,95) of tree laser returns Iipr Inter-Percentile Range of intensity of tree laser returns: Ip75-Ip25 Ivar Variance of intensity of tree laser returns

Iravg
Ratio of the number of tree laser returns with an intensity higher than Iavg to the number of all tree laser returns

Irmed
Ratio of the number of tree laser returns with an intensity higher than Ip50 to the number of all tree laser returns Iqav Average of the quadratic intensity of tree laser returns

Pixel Selection
As there is high possibility that HSI pixels at the border of tree crown polygons were contaminated by background materials (e.g., impervious surfaces, shrubs and grass), the border pixels were removed before extracting tree crown specific pixels. In addition, we used a threshold of 75% of the maximum brightness within the tree crown to further remove shaded crown pixels. The pixel selection procedure remained at least three pixels within the tree crown for 95% of trees. The remaining HSI pixels contained within the tree crown polygon were used for further spectral characterization of individual trees.

Deriving Tree Crown Specific HSI Indices
Building upon previous tree health studies [27,28,37], we calculated 61 existing spectral indices (HSI indices; Table 3) for each selected HSI pixel enclosed within the automatically delineated tree crown. The HSI indices are associated with the absorption features of chlorophyll, carotenoid, anthocyanin, xanthophyll, water and tree structure. Subsequently, each tree crown specific HSI index was averaged per tree crown.

Random Forest Classification
Random forest (RF), introduced by Breiman [103], is a non-parametric classifier using a bootstrapped set of training samples and subsets of input features to build a large number of classification trees, from which the final classification result is determined as a voting result. When each tree was built, one-third of training samples, called out-of-bag (OOB) samples were left out. For each OOB sample, the classification result was determined by the majority vote from the trees generated without using this sample. An RF classifier is particularly suitable for this study because (i) it has shown to perform well even when the number of features is much larger than the number of observations, (ii) it does not require normalization [104], and (iii) it returns measures of feature importance [105].

Classification Scenarios
Four classification scenarios were defined in this study. Scenarios 1-3 aimed at comparing the performance of ALS indices, HSI indices and their combination for predicting tree health. In scenario 1 (ALS models), each of the three tree health indicators (defoliation, discoloration and damage) was classified into class 0, 1, or 2 for datasets 1-3 separately (Acer spp., Aesculus spp. and Tilia spp.; cf. Section 2.2) using 140 ALS indices. This means a total of nine classification models was created in scenario 1 (three tree health indicators × three species). The same strategy was applied in scenarios 2 and 3, yet with other input indices (61 HSI indices in scenario 2 (HSI models) and all indices in scenario 3 (ALS-HSI models)). In scenario 4 (ALS-HSI mixed models), the effects of tree species mixtures on defoliation, discoloration and damage classification were examined using dataset 4 (i.e., the mixed species dataset, cf. Section 2.2) and all indices.

Classification Procedure
We used leave-one-out cross-validation (LOOCV), which was recommended for small sample sizes in previous studies (such as DREAM challenge dataset with 35 samples [106]) to train and test RF models. In LOOCV, each training set is created by taking all the samples except one, the test set being the sample left out. Therefore, for n samples, we had n different training sets and n different test sets. The distribution of the health classes in the study area was unbalanced, with healthy trees appearing more frequently than unhealthy trees ( Figure A1). To avoid high omission errors of the minority class, a balanced RF classifier was applied [107]. In this way, the accuracy of majority and minority classes was equally estimated by the classifier. The two important parameters for RF classification are the number of classification trees (ntree) and the number of features selected at each split in the tree building process (mtry). ntree was set at 500 and mtry was set at sqrt(M) (M being the number of input indices) based on previous work [26,54,105]. The classification procedure was conducted in Python using the scikit-learn library. Classification accuracy was assessed using a weighted kappa coefficient (Kc) and overall accuracy (OA) based on the predicted and observed scores of the n test sets. For ordinal categories, a weighted Kc is more meaningful as misclassification by more than one category was penalized more heavily, making the Kc values conservative [108,109]. Note that in scenario 4 (mixed tree species), the Kc and OA were additionally calculated for Acer spp., Aesculus spp. and Tilia spp. separately to compare with the results of scenario 3. It is worth noting that although there have been some concerns on the suitability of the Kc in accuracy assessment [110,111], such as the substantial difficulties in its interpretation, the risks of misleading by using the Kc in the current study are low as our focus is on comparing different models using the same samples.

Feature Importance and Selection
One of the features of RF is its ability to evaluate the importance of each input feature. Here, the importance of each index was calculated using the method of average decrease in impurity (MDI), which is calculated over all trees of the ensemble [112], and is found to be more appropriate for smaller sample sizes [113]. Based on the average rank of each index in the n models, important indices were identified even if they were highly correlated and performed similar functions.
Although RF can handle a large set of input features, many irrelevant input features can decrease the model accuracy, due to the overfitting problem. The process of feature selection can identify small sets of features that can still achieve good predictive performance [105]. Existing feature selection techniques can be classified as either filter approaches or wrapper approaches [114]. Filter approaches, which determine the relevance of features using training data alone, are relatively computationally cheap. Due to the relevant features being selected independently of the learning algorithm, they may, however, not match the chosen algorithm [115]. Wrapper approaches call on learning algorithms for evaluating multiple subsets of input features, are reported to produce higher accuracy than filter approaches [115,116], but are computationally intensive. In this study, we used a forward selection method (a wrapper approach) in combination with the importance ranking to select the smallest set of indices that could produce the highest accuracy. The correlated indices were retained in the final classification as a preliminary test indicated decreased classification accuracies when removing the less important indices which are highly related (Pearson's correlation coefficient > 0.9) to the indices with a high importance ranking. The classification accuracies reported in this study were based on the forward selection method.

Statistical Analysis
To examine the structural and spectral differences represented by the ALS and HSI indices between the four tree species, we conducted Tukey's HSD test for each index between healthy trees (damage score = 0) of the four tree species using the module of statsmodels in Python.

Tree Crown Identification and Delineation
According to the overlapping proportions (OPs) between the manually and automatically delineated tree crown polygons, 67% of individual tree crowns showed a good match between the two sets of polygons (OP ref > 90% and OP result > 90%). Twenty-three percent of tree crowns were slightly over-segmented (75% < OP result < 90% and OP ref > 90%), and 15% of tree crowns were slightly under-segmented (75% < OP ref < 90% and OP result > 90%). The accuracy assessment proved that 91% of tree crowns were properly delineated. Table 4 shows the classification accuracies for individual tree species in -ALS models (scenario 1), HSI models (scenario 2) and ALS-HSI models (scenario 3). The predictive performance of ALS indices and HSI indices varied between tree species. ALS models worked better than HSI models when classifying tree crown defoliation and damage (i.e., combination of defoliation and discoloration) for Acer spp. and Aesculus spp., while opposite results were found for Tilia spp. In tree crown discoloration classification, ALS models performed better than HSI models for Acer spp. and Tilia spp., while HSI models produced a slightly higher accuracy for Aesculus spp. The average Kc values across the three tree species were 0.73 (ALS models) and 0.68 (HSI models) for defoliation; 0.61 (ALS models) and 0.57 (HSI models) for discoloration; and 0.72 (ALS models) and 0.64 (HSI models) for damage ( Figure 3). Table 4. Classification results in scenarios 1-3 in which Random Forest models were trained and tested using ALS indices (ALS models in scenario 1), hyperspectral imagery (HSI) indices (HSI models in scenario 2) and their combination (ALS-HSI models in scenario 3), respectively, for individual tree species (datasets 1-3, cf. Section 2.2). The highest accuracies for predicting each tree health indicator for each species are highlighted in bold. models were compared with their counterparts in scenario 3 ( Figure 4). ALS-HSI models calibrated per tree species almost systematically outperformed the ALS-HSImixed models, with an average increase in Kc points of 0.15, 0.03, and 0.05 for Acer spp., Aesculus spp., and Tilia spp., respectively. When focusing on the tree health indicators, the highest average increase in Kc points (0.12) was found for defoliation, followed by damage (0.09) and discoloration (0.01). Table 4. Classification results in scenarios 1-3 in which Random Forest models were trained and tested using ALS indices (ALS models in scenario 1), hyperspectral imagery (HSI) indices (HSI models in scenario 2) and their combination (ALS-HSI models in scenario 3), respectively, for individual tree species (datasets 1-3, cf. Section 2.2). The highest accuracies for predicting each tree health indicator for each species are highlighted in bold.   Table 5 shows the 17 most important indices (a maximum of 17 indices was required to achieve the best classification accuracy) determined by the MDI method in the ALS-HSI models and ALS-HSImixed models for tree crown defoliation, discoloration and damage classification. The importance ranking of indices varied considerably between tree species. The ALS indices that appeared most frequently in the 17 most important indices were those related to the ALS point density (i.e., FG and FC in 10 models, DEN and EWI in seven models and GT in five models), tree size, and shape (i.e.,  (Table 4). For the majority of tree health indicators and tree species, the highest accuracy was obtained using ALS-HSI models. The increase in Kc points, compared with ALS models, ranged from −0.01 to 0.13 with an average of 0.05, and compared with HSI models from 0.01 to 0.20 with an average of 0.11. The average Kc values for the ALS-HSI models for tree crown defoliation, discoloration and damage classification were 0.77, 0.68, and 0.77, respectively, and the corresponding OA values were 0.87, 0.84, and 0.88, respectively (Figure 3). When considering the performance of ALS-HSI models between tree species, the health condition of Tilia trees was best predicted (average Kc = 0.78 for the three tree health indicators), followed by Aesculus trees (average Kc = 0.75 for the three tree health indicators) and Acer trees (average Kc = 0.68 for the three tree health indicators) ( Table 4).

Tree Health Classification for Mixed Species
Recall that scenario 4 differed from scenario 3 (ALS-HSI models) in that way that in scenario 3 the classification was performed on individual species. Scenario 4, in which the combination of ALS and HSI indices was used to train and test tree health classification models for the combination of tree species (all the 131 sampled trees; cf. Section 2.2) (ALS-HSI mixed models), enabled us to examine the effect of mixed species on tree health classification. The final Kc values for defoliation, discoloration and damage were 0.65, 0.68, and 0.70 with corresponding OA values of 0.78, 0.81, and 0.81, respectively (Figure 3). The Kc and OA values calculated for individual species in ALS-HSI mixed models were compared with their counterparts in scenario 3 (Figure 4). ALS-HSI models calibrated per tree species almost systematically outperformed the ALS-HSI mixed models, with an average increase in Kc points of 0.15, 0.03, and 0.05 for Acer spp., Aesculus spp., and Tilia spp., respectively. When focusing on the tree health indicators, the highest average increase in Kc points (0.12) was found for defoliation, followed by damage (0.09) and discoloration (0.01).
Remote Sens. 2020, 12, x 13 of 26 WH, RCHVH, and CD in nine models and CA in seven models). The height and intensity indices did not significantly contribute to tree health classification. The most important HSI indices were associated with spectral features of chlorophyll absorption (e.g., CCI in eight models, VOG1 and VOG2 in seven models, REP in six models and ICHL, DattNIRCabCx + c and MTCI in five models) and leaf area index (i.e., ILAI in six models). In an attempt to identify the most consistent indices across tree species, we found that in ALS models (scenario 1) EWI, FG, and FC were important for all the three tree species for tree crown defoliation classification; DEN, EWI, FG, and FC were consistently important for discoloration and damage classification (Table A1). When HSI indices alone were used for classification (HSI models; scenario 2), the consistently important indices were: MTCI and VOG3 for defoliation; MTCI, REP, and HI for discoloration; REP and VOG3 for damage (Table A2). In ALS-HSI models (scenario 3), the consistent indices across species for discoloration classification included FC, DEN, FG, and EWI (Table 5). No consistent indices were found for defoliation and damage in this case.

Figure 4.
Comparison of tree health classification accuracy for individual species between ALS-HSI models in scenario 3 (models trained and tested for individual tree species using both ALS and HSI indices) and ALS-HSI mixed models in scenario 4 (models trained and tested for mixed tree species using both ALS and HSI indices). Table 5 shows the 17 most important indices (a maximum of 17 indices was required to achieve the best classification accuracy) determined by the MDI method in the ALS-HSI models and ALS-HSI mixed models for tree crown defoliation, discoloration and damage classification. The importance ranking of indices varied considerably between tree species. The ALS indices that appeared most frequently in the 17 most important indices were those related to the ALS point density (i.e., FG and FC in 10 models, DEN and EWI in seven models and GT in five models), tree size, and shape (i.e., WH, RCHVH, and CD in nine models and CA in seven models). The height and intensity indices did not significantly contribute to tree health classification. The most important HSI indices were associated with spectral features of chlorophyll absorption (e.g., CCI in eight models, VOG1 and VOG2 in seven models, REP in six models and ICHL, DattNIRCabCx + c and MTCI in five models) and leaf area index (i.e., ILAI in six models). Table 5. The 17 most important indices in ALS-HSI models (scenario 3; Acer, Aesculus and Tilia) and ALS-HSI mixed models (scenario 4; Mixed) for tree crown defoliation, discoloration and damage (i.e., combination of defoliation and discoloration) classification. ALS indices were underlined to distinguish them from HSI indices. The indices used in the final classification models (according to the forward selection method; cf. Section 2.6.3) are highlighted in bold. The explanation of the index abbreviations can be found in Tables 2 and 3.  In an attempt to identify the most consistent indices across tree species, we found that in ALS models (scenario 1) EWI, FG, and FC were important for all the three tree species for tree crown defoliation classification; DEN, EWI, FG, and FC were consistently important for discoloration and damage classification (Table A1). When HSI indices alone were used for classification (HSI models; scenario 2), the consistently important indices were: MTCI and VOG3 for defoliation; MTCI, REP, and HI for discoloration; REP and VOG3 for damage (Table A2). In ALS-HSI models (scenario 3), the consistent indices across species for discoloration classification included FC, DEN, FG, and EWI (Table 5). No consistent indices were found for defoliation and damage in this case.

Data and Feature Selection for Urban Tree Health Assessment
For urban tree health assessment, selection of proper remote sensing data is crucial. However, few studies have conducted comparison between data types for tree health assessment across different tree health indicators. Our results show that ALS models outperformed HSI models in six out of the nine classifications (Table 4), which seems to indicate a better performance of ALS models. Tree crown defoliation, related to the amount of within-canopy gaps, increases penetration of ALS pulses through tree canopies and subsequently results in a greater portion of ground returns [59,117]. In our study, the good performance of ALS indices for defoliation classification, especially for Acer spp. and Aesculus spp., corroborates results of previous studies in which ALS was found to be robust in detecting fire-and insect-induced leaf loss in forest canopies [26,29,118]. According to the MDI method, FC, FG, and EWI which are all related to the ALS point density (Table 2), were consistently important across different species (Table A1), which confirms the usefulness of these ALS indices in modeling defoliation. EWI was also found to be able to explain the variance related to tree crown transparency in Shendryk et al. [30]. Four tree size and shape indices (i.e., CD, CA, RCHVH, and WH) which were calculated based on the spatial distribution of ALS points (Table 2) were also important for defoliation classification (Table 5, Table A1). This makes us argue that these indices could characterize the geometric changes, due to defoliation. Similarly, Yao et al. [119] also suggested the contribution of using these geometry-related indices in the detection of standing deadwood.
With regard to discoloration classification, ALS models were able to produce higher classification accuracy for Acer spp. and Tilia spp. and a similar accuracy for Aesculus spp. compared with HSI models (Table 4). Especially for Tilia spp., several intensity indices were selected for discoloration classification (Table A1), indicating that the radiometric properties of the target are sensitive to tree crown discoloration. However, Table A1 also shows that the selected ALS indices for discoloration classification were mainly related to point density, tree size, and shape, which are conceptually unrelated to discoloration. A plausible explanation for this phenomenon might be found in the strong positive correlation between our field assessed defoliation and discoloration scores (discoloration score equal to defoliation score for 78.6% sampled trees; Figure A1a). Also note that classification performance of discoloration based on ALS data was highest for Tilia spp., i.e., the species with the strongest correlation between defoliation and discoloration scores (discoloration score equal to defoliation score for 81% Tilia trees). Therefore, future tree health monitoring could, to some extent, rely on ALS data for discoloration prediction in case defoliation and discoloration co-exist in most tree crowns in the study area. To better understand the potential of ALS data for identifying tree crown discoloration, further analysis should be conducted on trees experiencing discoloration only. Given the lack of such trees in our datasets, such analysis was not feasible here. As for the poorer performance of HSI, which was expected to have higher predictive power for discoloration classification, it could be explained by: (i) The coarser spatial resolution compared with ALS data, and (ii) the pixels containing information on the mixture of leaves, woody materials and backgrounds thereby having lower applicability in differentiating tree crown discoloration levels for our samples with an unbalanced distribution of discoloration classes ( Figure A1). Figure A1b shows that in our particular case study, overall tree damage scores were more driven by defoliation (damage score equal to defoliation score for 96.2% sampled trees), compared with discoloration (damage score equal to discoloration score for 81.7% sampled trees). This causal effect was also reflected in the classification accuracies, where defoliation and damage classification showed a similar pattern: ALS models outperformed HSI models for Acer spp. and Aesculus spp. but not for Tilia spp. (Table 4). Note that the superiority of ALS models for tree damage classification is likely to be strongly case-specific. It should be further investigated whether the same conclusion would be reached in a case where tree health is mostly driven by leaf discoloration. Due to the strong correlation between defoliation and the other two tree health indicators, we found the ALS indices important for discoloration and damage classification to show large overlap with those for defoliation ( Table 5; Table A1).
Our results show that the overall highest level of accuracy was achieved in ALS-HSI models (average Kc = 0.74 and OA = 0.86; Table 4), highlighting the complementarity of ALS and HSI data for tree health assessment and supporting earlier findings from Shendryk et al. [30] and Meng et al. [29]. Tree health is reflected by both structure and foliar chemistry, especially pigment content, the latter known to be strongly associated with tree crown reflectance properties [21,23]. By explicitly providing this reflectance information by means of HSI, tree health classification was improved from an average Kc of 0.69 (only ALS data) to 0.74 (combination of ALS and HSI). This was highlighted by the important HSI indices determined by the MDI method. We found most important HSI indices were associated with the chlorophyll absorption and red-edge spectral feature (e.g., CCI, VOG1, VOG2, REP, ICHL, DattNIRCabCx + c and MTCI; Table 5), which supports previous studies [27][28][29]. The red-edge spectral region was shown in the 1980s to be sensitive to chlorophyll content [74] and ever since has been applied in research on tree health [120]. Our study confirmed that HSI indices associated with the chlorophyll absorption feature are more relevant compared to other HSI indices in identifying urban tree health [23]. In addition, ILAI, which demonstrated a good relationship with leaf area index in Degerickx et al. [46], was also an important index for predicting tree crown health (Table 5).

Species-Specific vs. Mixed Species Modelling Approach
By constructing both species-specific models (ALS-HSI models) and mixed species models (ALS-HSI mixed models), we quantified the effect of mixed species on classification, which is another crucial issue for urban tree health assessment. Species-specific ALS-HSI models overall showed better performance than their mixed species counterparts (an average increase in Kc points of 0.07 calculated for all three tree species and tree health indicators) ( Figure 4). This difference was found to be larger for Acer spp. (average increase in Kc points of 0.15 for the three tree health indicators) compared to the other tree species. To explore the underlying reasons, we conducted a Tukey's HSD test for each index between healthy trees (damage score = 0) of the four tree species (Table A3). Table A3 shows that significant differences (p < 0.05) in indices between tree species was mainly found between Acer spp. and the other tree species, especially for HSI indices, which indicates that Acer trees observed in our study tended to have different structural and spectral characteristics from other tree species. In this context, a mixed species modelling approach would increase the possibility of misclassification for Acer spp. when using a machine learning algorithm like RF. We therefore predict that, with the increase of tree species diversity, especially with the combination of broadleaved and coniferous species, the performance of mixed species models would rapidly decrease [47]. This should, however, be further verified in follow-up research.
Given the reasoning mentioned above, it is safe to assume that in practice, a species-specific tree health mapping approach will be preferred in case of urban areas with diverse tree species. Some well-developed methods for urban tree species classification have constituted a solid basis for generating such a map. For example, Alonzo et al. [121] mapped 29 common tree species in Santa Barbara, California, USA using fused HSI and ALS data and an RF classifier, with an overall accuracy of 93.5%. Although this additional classification step will undoubtedly increase the workload, the issue could be addressed by using the tree species database, which actually is already available in some cities. Based on such a tree species map or database, sufficient training data on tree health for each tree species should be collected in order to build a robust classification model. On the other hand, we should not ignore that some important indices (e.g., FG, FC, REP, and HI; Table A3) have the potential to be independent of tree species, which might to some extent improve the classification accuracy of a mixed species model. Future urban tree health research should take efforts to search for stable and informative indices which are not affected by tree species. The above-mentioned four indices would constitute an excellent starting point for this endeavor and should be further examined for other tree species.

Conclusions
Considering the threats of increasingly prominent urban environmental problems to tree health, frequent and accurate mapping of tree health is critical to design adaptive tree management strategies. With a complete workflow encompassing individual tree crown delineation, index calculation, feature selection, classifier training, classification and validation, we compared the performance of airborne laser scanning (ALS), airborne hyperspectral imagery (HSI), and their combination, to classify urban tree health. The combined use of both types of data produced the highest level of classification accuracy, confirming their complementarity. Overall, ALS data had better performance in predicting tree health based on our samples. However, due to the stronger dependence of damage (i.e., combination of defoliation and discoloration) scores on defoliation scores than on discoloration scores in our tree samples, the potential of ALS data for predicting damage needs to be further examined in future studies. As a reference for future research, our results highlighted that ALS indices related to point density, tree size, and shape and HSI indices associated with the chlorophyll absorption feature were most important for tree health identification. The consistently important ALS indices (i.e., DEN, EWI, FG, and FC) and HSI indices (i.e., MTCI, VOG3, REP, and HI) across the three tree species in our study, however, should be further tested for their applicability and transferability to other common tree species in an urban context.
The high heterogeneity of tree species in urban environments makes it hard to map tree health on a city scale using remote sensing technologies. In our study, a species-specific tree health modelling approach (ALS-HSI models; scenario 3) proved to yield higher accuracies overall (an average increase in Kc points of 0.07) compared to combining all species into one tree health model (ALS-HSI mixed models; scenario 4). Even though further research is required covering more diverse tree species settings, some practical guidelines can already be formulated based on our findings. In case a quick overview of tree health needs to be obtained over an area with limited tree species diversity, a mixed species modelling approach should be preferred-since it represents the least time-consuming approach. In that case, most attention should be devoted towards identifying ALS and HSI indices which are independent of tree species. However, in case the study area shows a highly heterogeneous tree species pool, the added value of constructing species-specific models becomes truly significant. This can be achieved by involving an available tree species database or a tree species map generated from the same remote sensing data sources. Funding: This research was funded by the Belgian Science Policy Office in the framework of the STEREOIII program (UrbanEARS project (SR/00/307) and BelAir project (SR/01/354)). In addition, the authors acknowledge financial support from the China Scholarship Council.

Conflicts of Interest:
The authors declare no conflict of interest. Table A2. The 17 most important indices in HSI models (scenario 2) for tree crown defoliation, discoloration and damage (i.e., combination of defoliation and discoloration) classification. The indices used in the final classification models (according to the forward selection method; Section 2.6.3) are highlighted in bold. The explanation of the indices abbreviations can be found in Table 3.  Table A3. Results of Tukey's HSD test for the indices values between healthy trees (damage score = 0) of different tree species. a, b, c, and d referred to Acer spp., Aesculus spp., Platanus spp. and Tilia spp., respectively. The difference with a significance level of p < 0.05 are marked in bold. Only the indices whose values showed significant difference between at least two tree species are listed.