An Integration of Linear Model and ‘Random Forest’ Techniques for Prediction of Norway Spruce Vitality: A Case Study of the Hemiboreal Forest, Latvia

: The increasing extreme weather and climate events have a signiﬁcant impact on the resistance and resilience of Norway spruce trees. The responses and adaptation of individual trees to certain factors can be assessed through the tree breeding programmes. Tree breeding programmes combined with multispectral unmanned aircraft vehicle (UAV) platforms may assist in acquiring regular information of individual traits from large areas of progeny trials. Therefore, the aim of this study was to investigate the vegetation indices (VI) to detect the early stages of tree stress in Norway spruce stands under prolonged drought and summer heatwave. Eight plots within four stands throughout the vegetation season of 2021 were monitored by assessing spectral differences of tree health classes (Healthy, Crown damage, New crown damage, Dead trees, Stem damage, Root rot). From all tested VI, our models showed a moderate marginal R 2 and total explanatory power—for Normalized Difference Red-edge Index (NDRE), marginal R 2 was 0.26, and conditional R 2 was 0.49 ( p < 0.001); for Normalized Difference Vegetation Index (NDVI), marginal R 2 was 0.34, and conditional R 2 was 0.60 (p < 0.001); for Red Green Index (RGI), marginal R 2 was 0.36, and conditional R 2 was 0.55 ( p < 0.001); while for Chlorophyll Index (CI), marginal R 2 was 0.27, and conditional R 2 was 0.49 (p < 0.001). The reliability of the identiﬁcation of tree health classes for selected VI was weak to fair (overall classiﬁcation accuracy ranged from 34.4% to 56.8%, kappa coefﬁcients ranged from 0.09 to 0.34) if six classes were assessed, and moderate to substantial (overall classiﬁcation accuracy ranged from 71.1% to 89.6% and kappa coefﬁcient from 0.39 to 0.71) if two classes (Crown damage and Healthy trees) were tested.


Introduction
Climate change has enormous effects on various environments, including forests, and has already amplified the economic and ecological impacts caused by damages of biotic and abiotic factors [1]. The vulnerability to climate change is expected to become more challenging for forest management and nature conservation [2]. These factors are particularly important for coniferous tree species, such as Norway spruce (Picea abies (L.) Karst.), whose regeneration on nutrient rich soils or in eutrophic forests is possible only by planting. For example, more than 90% of all young Norway spruce stands on fertile rich forest soils in Latvia have been planted [3], and in Sweden planted spruce makes up more than 40% of the total growing stock [4]. In addition, in many countries spruce is artificially regenerated as pure stands. Spruce monoculture stands are very productive [5] but also are highly susceptible to damaging agents, both biotic and abiotic, including pathogen and insect infestations, wind, droughts during vegetation season, etc. [6]. Moreover, among a prolonged period of drought and summer heatwave. A decision-tree classifier, such as the Random Forest classifier, was also used on UAV images to segment tree health status and provide predictions of vegetation health in forested landscapes. In this study, Norway spruce stands were monitored at the beginning of vegetation period, after a pre-longed period of drought and summer heatwave and at the end of the vegetation period and were analyzed using different tree vitality groups.

Study Area
Four pure stands of Norway spruce in the Forest Research station Kalsnava in the eastern part of Latvia (56 • 41 3 N, 25 • 50 28 E) were selected for the research (Figure 1). The area is characterized by relatively flat terrain with Scots pine (Pinus sylvestris), Norway spruce-and silver birch (Betula pendula)-dominated forest stands. The study area is around 100 m above sea level. The site has a moderately continental climate (the total annual precipitation is about 700 mm; the mean annual temperature is around 5 • C). The mean length of the vegetation period is around 175 days, which usually extends from late April to October. According to the data of the local weather station (~5 km from the study area), the weather conditions (the distribution of precipitation and temperatures) in the vegetation season of 2021 were not typical of those previously observed. The majority of precipitation was recorded at the beginning of the vegetation period in the first six days of May, followed by 41 days without rain and with relatively high temperatures (15.4 • C mean daily temperature with 33.5 • C maximum mean (hour) temperature). The selected stands were similar by age and growing conditions. Stand 1 was 37 years old. Stand 2 and Stand 3 were 26 years old; these stands were located on drained mineral soil (Myrtillosa mel.). Stand 4 was 26 years old and was located on drained peat soil (forest type Myrtillosa turf. mel.). At the beginning of this study, during the first image acquisitions, we observed an intensive spruce flowering in Stand 1, whereas in the other stands the flowering could be rated as average. The ditches alongside all studied stands were recently cleaned.

Field Sampling
The field sampling was conducted in eight round sample plots (500 m 2 , radius = 12.62 m) in May 2021 (after the first flight campaign). All plots were distributed across stands, ensuring that each plot contained dead trees or trees with visible crown damages, and the distance between the plot centers was at least 26 m. The coordinates of plot centers were measured with the Leica GS16 Global Navigation Satellite System (GNSS) receiver. Precise coordinates of each individual tree within the sample plot were recorded by using a Leica TS06 total station. The total station was positioned in the plot center and pulsed a laser beam to each stem within the plot at 1.3 m height. We excluded smaller trees if they were suppressed by adjacent trees and/or if they were in the shadow of the larger trees. Overall, 800 trees were measured, from which 239 trees were cored.
In each plot, for each tree, the diameter at breast height was measured and tree crown vitality status and types of damage (e.g., ungulate damage, stem, or crown damage (e.g., loss of treetop), and other) were recorded. The vitality of crowns was visually assessed for decolorization and needle density from the ground and divided into three groups-visually healthy tree (individuals with no outward signs of stresses (e.g., drought induced decline), trees with noticeable damage to crown (e.g., yellowish green, yellow needles and brownish treetops) and dead trees (complete loss of green foliage). Each tree was assessed from different angles, and to reduce the bias associated with assessment subjectivity, the same person performed all assessments. The tree age was detected as follows: within each plot, increment cores were taken during the field campaign from 30 trees selected based on the mean NDRE index values for tree crowns after the first flight campaign, (10 trees with the lowest values of NDRE index, 10 trees with medium NDRE index values and 10 trees with the highest NDRE index values). In addition, increment cores from trees with possible root rot infection were also taken (based on visual inspection) Tree increment cores were obtained at breast height (1.3 m) with a Pressler increment borer. The increment cores were processed and measured in the laboratory. Increment cores were air dried and grinded using sandpaper, and a LINTAB 5 (RinnTECH, Heidelberg, Germany) measurement system was used to measure tree ring widths with a precision of 0.01 mm. The second field campaign was performed in October 2021 when we updated the tree crown vitality status and/or other damages.

Field Sampling
The field sampling was conducted in eight round sample plots (500 m 2 , radius = 12.62 m) in May 2021 (after the first flight campaign). All plots were distributed across stands, ensuring that each plot contained dead trees or trees with visible crown damages, and the distance between the plot centers was at least 26 m. The coordinates of plot centers were measured with the Leica GS16 Global Navigation Satellite System (GNSS) receiver. Precise coordinates of each individual tree within the sample plot were recorded by using a Leica TS06 total station. The total station was positioned in the plot center and pulsed a laser beam to each stem within the plot at 1.3 m height. We excluded smaller trees if they were suppressed by adjacent trees and/or if they were in the shadow of the larger trees. Overall, 800 trees were measured, from which 239 trees were cored.
In each plot, for each tree, the diameter at breast height was measured and tree crown vitality status and types of damage (e.g., ungulate damage, stem, or crown damage (e.g., loss of treetop), and other) were recorded. The vitality of crowns was visually assessed for decolorization and needle density from the ground and divided into three groups-visually healthy tree (individuals with no outward signs of stresses (e.g., drought induced decline), trees with noticeable damage to crown (e.g., yellowish green, yellow needles and brownish treetops) and dead trees (complete loss of green foliage). Each tree was assessed from different angles, and to reduce the bias associated with assessment subjectivity, the same person performed all assessments. The tree age was detected as follows: within each

Data Acquisition
The weather conditions were mostly sunny and windless during the flights; to minimize shadows, the flights were conducted in mornings before noon when the lighting conditions were the best. The study area was overflown using a DJI Matrice 210 drone equipped with SlantRange 3PX (SlantRange, San Diego, CA, USA) multispectral sensor, which is composed of single-band cameras (Green, Red, Red Edge and NIR) at 1.2megapixel (1248 × 994 pixels) resolution ( Table 1). The camera was equipped with an ambient light (weather and position of sun) sensor to adjust illumination conditions for each frame, an integrated global positioning system (GPS) and an inertial measurement unit (IMU) system with extended Kalman filter. The flight altitude was set to 75 m above ground (174 m above sea level) with 5.0 ± 1 m s −1 ground speed, with a 70% side and 80% frontal overlap for the images. The ground sampling distance or spatial resolution for each pixel was 2.

Photogrammetric Processing
The acquired raw images were processed using SlantView software (SlantRange, Inc., San Diego, CA, USA). To derive quantitative information, the pixels were converted from false colours to reflectance values. Each image plane that did not fall within the field of view of all sensors were trimmed to a valid content and exported to the Agisoft Metashape Professional (v. 1.6.4.). The Structure from Motion (SfM) photogrammetric method was implemented for orthophoto production [40]. The green spectrum to align photos, build a dense cloud and generate a digital surface model (DSM) with interpolation and orthomosaics was set as the default band. The GCPs were used to optimize the sparse cloud and to transform image orientation into geodetic coordinate system LKS 92 (EPGS:3059). A 4-band multispectral orthomosaics was created and resampled to pixel size 10 × 10 cm with the nearest-neighbour technique, as studies [41] have shown it can reduce image noise and number of pixels at the outcome.

Individual Tree Crown Masks
There are many methods and software solutions for the detection and segmentation of tree crowns [41]. Even though they are suitable for a variety of forest scenarios, most of them are formed from the data obtained in seed orchards, and when implemented in dense forest stands it is problematic to automatically separate adjacent crowns. Therefore, we combined automated separation of high vegetation from other objects in the image by using the multivariate toolset in the ArcGIS 10.5 [42] and manually adjusted it to separate individual crowns from neighbouring trees.
Initially, the Isodata segmentation technique with two tools in ArcGIS 10.5. (Iso Cluster and Maximum Likelihood Classification) was performed. With the Iso Cluster tool, we created two groups-canopy and ground. The resulting signature file was further processed as an input for the Maximum Likelihood Classification. Accordingly, from the created classified raster, the canopy class was converted into vector-based polygons. However, some treetops needed to be manually separated if a large polygon for certain group of adjacent trees were drawn (see in example in Figure 2). Single polygons were created, and ground survey data were combined based on spatial location. These polygons were used as masks to extract the cells of orthomosaics. The extracted cells as a raster file were converted into a vector-based point feature class. Finally, we extracted pixel values (reflectance values of each band) in a point feature to the attribute table, which was exported as a txt file for calculations of vegetation indices and for further statistical analysis.

Vegetation Indices and Statistical Analysis
The relationship between tree health status and the detected reflectance was characterized by using various vegetation indices. The selection of vegetation indices was based on previous studies [22,[43][44][45] and upon the configuration of available sensors. Thus, in this study, the metrics that are based on the greenness and leaf pigment (chlorophyll, anthocyanin) concentrations were used to indicate the early signals of stress and/or level of tree health (Table 2).  [54] To evaluate the early detection of tree stress we observed temporal changes in vegetation indices by using the linear mixed-effect model (LME). The LME, also known as the variance component model, is a statistical method that is widely used to model dependent data structures, such as clustered data and longitudinal data [55]. The LME incorporates two parameters: the fixed effects and random effects [56]. The fixed effects have a common linear relationship for all the data, whereas the random effects can be used to account for the structure of the data [56]. The statistical measures were used to quantify the separability between tree health classes over one vegetation season in four Norway spruce stands. For this purpose, we classified trees into six health classes (Healthy, Crown damage, New crown damage, Dead trees, Stem damage, Root rot) based on our recordings during the field campaigns (Table 3, Figure 3). In the situations when for a single tree two or more classes were possible to be assigned, we preferred Stem damage over Crown damage and Root rot over Stem damage. Table 3. Description of allocated tree health classes.

Tree Health Class Description
Healthy Individuals with no outward signs of stresses (e.g., drought induced decline) Crown damage Trees with noticeable damage to crown (e.g., yellowish green, yellow needles and brownish treetops)

New crown damage
We created a New tree crown damage class for those healthy trees that were newly damaged in the autumn (after visual examination in October 2021) Dead trees Trees with complete loss of green foliage Stem damage Individuals with bark-stripping wounds (greater than 10 cm in width and greater than 20 cm in length) on tree stem and/or with stem cracks

Root rot
Included those trees whose increment cores were decayed and trees with severe dieback symptoms (was determined by performing a control drilling) In the models, the means of vegetation indices for each crown were used as dependent variables, but tree health class, flight campaign and the interaction of both factors were used as independent variables. The stand was used as a random factor to deal with possible pseudo replication. We fitted a linear mixed-effect model by using maximum likelihood estimation by optimization through the functions "nlminb". The root mean square error (RMSE) was used to evaluate the predictive accuracy of the models. To quantify the ability of each model to explain observed variation in the response variable, we calculated marginal and conditional R 2 values [57], where marginal R 2 variance was explained by fixed effects (vegetation indices), and conditional R 2 variance was explained by the entire model. The Akaike information criterion (AIC) was calculated to compare and rank the proposed models by different combination of factors [58,59]. The package "lsmeans" [60] was used to calculate the predicted means for tree health classes. The Kenward-Roger approximation was used to estimate the degrees of freedom, and a 95% confidence interval was recorded. The calculation of vegetation indices and the linear mixed-effects models were computed in the R software 3.6.3 [61]. Remote Sens. 2022, 14, x FOR PEER REVIEW 9 of 19

Validation and Forest Mapping
The classification of vegetation indices (selected basing on the outcome of the "lmer" models) was performed by applying the Random Forest (RF) algorithm, which is a supervised machine-learning technique. RF is a method based on inductive decision trees and can effectively handle high-dimensional, noisy and multi-source datasets without overfitting [62,63]. The main features of RF include speed and flexibility in creating the relationship between input and output functions [62]. The choice of the classifier was based on the simplicity of the model, as RF allows classification of multiple variables and classes without the need of sophisticated models or parameters [62], and it was proved to be a promising method in estimating tree parameters [64]. RF models were constructed,

Validation and Forest Mapping
The classification of vegetation indices (selected basing on the outcome of the "lmer" models) was performed by applying the Random Forest (RF) algorithm, which is a supervised machine-learning technique. RF is a method based on inductive decision trees and can effectively handle high-dimensional, noisy and multi-source datasets without over-fitting [62,63]. The main features of RF include speed and flexibility in creating the relationship between input and output functions [62]. The choice of the classifier was based on the simplicity of the model, as RF allows classification of multiple variables and classes without the need of sophisticated models or parameters [62], and it was proved to be a promising method in estimating tree parameters [64]. RF models were constructed, trained and cross-validated using "randomForest" and "caret" R packages [65,66]. The pixels containing vegetation index values belonging to the tree health classes were classified separately for each stand and flight campaign. We created a balanced training set where 75% of all data was randomly assigned to the training dataset and 25% of all data was used for testing. Due to the fact that Stand 2 and Stand 3 each had only one sample plot, and as the two stands were the same age and located nearby, the data sets of these stands were combined for classification purposes. The model fitting was performed by the default resampling implementation and parameter selection, which provided a quantity of 500 decision trees. We generated confusion matrices along with the Gini index criterion, the out-of-bag (OOB) estimated error rate, overall classification accuracy, Cohen's kappa coefficient and multi-dimensional scaling (MDS) using the "caret" package in R 3.6.3 [61].

Spectral Reflectance of Vegetation Health and Monitoring of One Vegetation Season
The seasonal patterns of 2021 for different tree health classes have been assessed by using different vegetation indices related to greenness (chlorophyll concentration)-Normalized Difference Red-edge Index (NDRE), Normalized Difference Vegetation Index (NDVI), Chlorophyll Index (CI)-and to leaf pigment (anthocyanins)-Red Green Index (RGI). The reflectance of light from Norway spruce needles can change over the growth season, especially during the active period of growth, due to various physiological processes (e.g., canopy chlorophyll content and plant characteristics (green biomass and leaf water content)) [67,68], which makes it more difficult to distinguish between tree health classes. However, according to results of linear mixed-effect models for all selected indices, we found that tree health class, flight campaign and interaction of both factors had a significant effect (p < 0.001) on the means of vegetation indices of individual tree crowns. The random effect (Stand) of the models explained 29 to 45% of the variance in selected vegetation indices values for different tree health classes (Table 4). Our models showed a moderate marginal R 2 and total explanatory power-for the NDRE index, the marginal R 2 was 0.26 and the conditional R 2 was 0.49 (p < 0.001); for the NDVI index, the marginal R 2 was 0.34 and conditional R 2 was 0.60 (p < 0.001); for the RGI index, the marginal R 2 was 0.36 and conditional R 2 was 0.55 (p < 0.001); while for the CI index, the marginal R 2 was 0.27 and conditional R 2 was 0.49 (p < 0.001) (Figure 4, Table 4). These results are lower than in other studies for detecting stress-induced changes [69,70]. Mostly, studies with good model prediction power are based on lower resolution imagery (approx. 1.25-2.4 m). In our study, a very detailed imagery (0.10 m) was used, but the drawback of such data is the high noise, which can occur from the background (due to the conical form of spruce crown), and/or partial crown damage, which had a negative effect on the outcome of the models.  However, we found that each of these indices had different levels of sensitivity in distinguishing certain tree health classes. The only exception was class Dead, which could be easily distinguished by all indices. This was mostly because of the different levels of leaf pigments over the vegetation season [71]. For example, such high variability of chlorophyll content in needles could be caused by needle aging, as chlorophyll content for needles of the current year is lower and it increases until the middle of summer and stays relatively constant until the end of the vegetation season, in contrast to older needles, which have higher and more stable chlorophyll content [72,73]. In our study, we also However, we found that each of these indices had different levels of sensitivity in distinguishing certain tree health classes. The only exception was class Dead, which could be easily distinguished by all indices. This was mostly because of the different levels of leaf pigments over the vegetation season [71]. For example, such high variability of chlorophyll content in needles could be caused by needle aging, as chlorophyll content for needles of the current year is lower and it increases until the middle of summer and stays relatively constant until the end of the vegetation season, in contrast to older needles, which have higher and more stable chlorophyll content [72,73]. In our study, we also found that the NDRE, NDVI and CI indices were differentiated between Healthy trees and Damaged trees already from the beginning of the vegetation season. However, the results of the RGI index, which is sensitive to the anthocyanin pigment content in plant leaves, did not show significant differences between healthy and damaged trees at the beginning of the vegetation season. The RGI index was able to differentiate Root rot class from other classes, indicating significant (p < 0.05) differences. But there are some other limits in the levels of sensitivity of the spectral bands used in calculation of these indices. Moreover, it is possible that the values of vegetation indices in our studied stands at the beginning of the vegetation season were reduced due to the active flowering and pollination of spruces ( Figure 5); as a result, trees were covered in pollen, which might have decreased the levels of green vegetation (at least detectable with the sensor), which in turn can increase the reflected visible light in the red spectrum. This could have affected the quality of data, as the reflectance in red and the red-edge spectrum is very sensitive to changes in chlorophyll content [28,74]. However, we found that trees from the New crown damage class, which initially were visually assessed as healthy trees, with no sign of any damages, were possible to be separated from Healthy tree class already at the beginning of the vegetation season by using the NDRE (p < 0.05) and NDVI (p < 0.01) indices, while RGI (p = 0.76) and CI (p = 0.76) did not show any separation. These results suggest that monitoring of Norway spruces with these indices might provide fairly good information on early onset of stress expression.
can increase the reflected visible light in the red spectrum. This could have affected the quality of data, as the reflectance in red and the red-edge spectrum is very sensitive to changes in chlorophyll content [28,74]. However, we found that trees from the New crown damage class, which initially were visually assessed as healthy trees, with no sign of any damages, were possible to be separated from Healthy tree class already at the beginning of the vegetation season by using the NDRE (p < 0.05) and NDVI (p < 0.01) indices, while RGI (p = 0.76) and CI (p = 0.76) did not show any separation. These results suggest that monitoring of Norway spruces with these indices might provide fairly good information on early onset of stress expression. We noticed that the season of image acquisition was found to have a more relevant impact on the separation of tree health classes. The analysis of the spectral indices showed that in the middle of summer, which coincided with a prolonged period of drought, the values of all indices were significantly (p < 0.001) increased for all chlorophyll-based indices compared to the beginning of the vegetation season. At the same time, we examined the relative differences between the tree health classes, and greater differences between healthy trees and other tree health classes were determined. Our results suggested that the drought impact was more pronounced for trees that were already stressed, and we noticed an increase in relative differences between the Healthy tree class and other classes in the post-drought images compared to what was detected at the beginning of the vegetation season. For example, at the beginning of the vegetation season, the CI index value for Stem damage class was 3% lower than for the Healthy trees, but after the drought period, the difference increased to 8%. An even more pronounced decrease in relative differences was obtained between New crown damages and Healthy trees; for example, the values of CI index for New damage classes were 13% lower than for Healthy trees at the beginning of the vegetation season. However, this index increased to 26% after the drought ( Figure 6). There could be several reasons for such differences; however, the most We noticed that the season of image acquisition was found to have a more relevant impact on the separation of tree health classes. The analysis of the spectral indices showed that in the middle of summer, which coincided with a prolonged period of drought, the values of all indices were significantly (p < 0.001) increased for all chlorophyll-based indices compared to the beginning of the vegetation season. At the same time, we examined the relative differences between the tree health classes, and greater differences between healthy trees and other tree health classes were determined. Our results suggested that the drought impact was more pronounced for trees that were already stressed, and we noticed an increase in relative differences between the Healthy tree class and other classes in the post-drought images compared to what was detected at the beginning of the vegetation season. For example, at the beginning of the vegetation season, the CI index value for Stem damage class was 3% lower than for the Healthy trees, but after the drought period, the difference increased to 8%. An even more pronounced decrease in relative differences was obtained between New crown damages and Healthy trees; for example, the values of CI index for New damage classes were 13% lower than for Healthy trees at the beginning of the vegetation season. However, this index increased to 26% after the drought ( Figure 6). There could be several reasons for such differences; however, the most plausible explanation could be linked to the exceptionally dry and hot June of 2021, which could have weakened the spruces, since Norway spruce is very sensitive to the changes in water availability and to high temperatures [75]. This conforms with the well-known fact that trees with any damage, such as root damage [76] or stem damages [77] may have lower resilience to disturbances, such as drought. It is related to the physiological processes of a tree: when the hydraulic conductivity in a tree is reduced [76], it consequently leads to a decrease in photosynthesis [78], and eventual changes in the tree crown can take place as a result of water stress.
Although the effect of drought continued to be unclear at the end of the vegetation season, there was evidence that trees have begun to adapt to the disturbance. We observed that the relative differences for the Healthy and Crown damage class pairing and also between the Healthy and New crown damage class were reduced, whereas the difference between Healthy and other classes remained at the previous level. The exception was the RGI index, which showed a continuous drop for all classes if looked at as a relative difference to Healthy trees ( Figure 6). A distinct variation in relative differences in values for Crown damage class might be related to other factors that might have affected the quality of the captured images, especially in the NIR spectral band. Namely, in the acquired images at the end of the vegetation season, we found that treetops of Crown damage class were somewhat illuminated, which was reflected in the relatively high values of indices for Crown damage class in comparison to other classes. There might be various factors causing such noise in the data, such as sun-angle dependency, the side-effect of lower needle water content and, as well, higher canopy transmittance, resulting in higher reflectance value at the end of the vegetation season. An alternative explanation for such results may be explained with many bright pixels in the NIR spectrum comprised of trees with crown damages as an indirect side effect of the drought for weakened trees due to the limited water absorption. This may have resulted in an increase in the reflectance of the NIR spectrum [79]. Similarly, researchers [64] found that the mean spectral values of the NIR band for defoliated trees were brighter than for healthy ones. Descriptive statistics of the reference trees are shown in Table 5.

Validation and Mapping of the Vegetation Indices
In further analysis, we used vegetation indices that were identified by the "lmer" models and showed significant possibilities to distinguish between different tree health classes. The forest health classification using the Random Forest algorithm was performed for each stand and field campaign separately to provide a proper comparative analysis. The validation of tree health classes of random tree classification against vegetation indices suggested that the reliability of identifying tree health classes was weak to fair if six classes were assessed, and moderate to substantial if two classes (Crown damage and Healthy trees) were tested ( Table 6). The RF classified the six tree health classes with an overall classification accuracy ranging from 34.4% to 56.8%; the respective kappa coefficients ranged from 0.09 to 0.34, depending on stand and flight campaign ( Table 6). For two classes, the overall classification accuracy ranged from 71.1% to 89.6% and the kappa coefficient from 0.39 to 0.71. Similarly, Kantola et al., 2010 [64] achieved a high accuracy (up to 87.3%) when two classes were used and poorer (38%) accuracy when nine classes were used in feature extraction from an aerial study. The estimated OOB error from 56.2% on average for all tree health classes was reduced to 19.2% on average for two classes only (Crown damage and Healthy trees). According to RF sensitivity, we identified that the tree health classes Root rot, New crown damage and Stem damage had a high impact on classification accuracy ( Table 7). The RF correctly classified only 14% on average for class New crown damages, 18% on average for class Stem damage and 4% on average of class Root rot. Similarly, in other studies [37,80], when predicting the discolouration, the model accuracy was the greatest when distinct classes of physiological classes were used. Our results are consistent with these findings, suggesting that the RF sensitivity might be affected due to the problematic distinction between similar classes (Stem damage, Root rot). In general, the main difficulties to distinguish between these two classes derive from the fact that the stem wounds are known as the most common entry for fungal infections; consequently, in some cases they are perceived as an initial stage of Root rot. Another study has suggested that such classification problems could arise when classes are imbalanced [81], thus resulting in biased classification accuracy. Evans et al., 2011 [82] explains this with the bootstrap over-representing the majority class (in our study Crown damage and Healthy trees), resulting in a deceptive model fit and exhibiting high cross classification error from the minority classes (in our study Root rot, New crown damage, Stem damage and Dead trees). As a result, this can also affect the majority classes (such as Crown damage and Healthy trees) by giving a forecast bias to the majority classes [82]. In our study, this manifested as increased sensitivity values when only two classes were used. Our results suggested that the RF sensitivity for Crown damage class increased from 49% on average to 63% of correctly classified trees (Table 5). It may initially appear that the same issue with the bootstrap approach may also arise in the "lmer" regression model, but as reported by [82], such problems have not been seen in other modelling approaches.
Remote Sens. 2022, 14, x FOR PEER REVIEW 13 of 18 Figure 6. The relationships between vegetation indices and tree health classes; differences are shown as a relative difference to Healthy trees.

Validation and Mapping of the Vegetation Indices
In further analysis, we used vegetation indices that were identified by the "lmer" models and showed significant possibilities to distinguish between different tree health classes. The forest health classification using the Random Forest algorithm was performed for each stand and field campaign separately to provide a proper comparative analysis. The validation of tree health classes of random tree classification against vegetation indices suggested that the reliability of identifying tree health classes was weak to fair if six classes were assessed, and moderate to substantial if two classes (Crown damage and Healthy trees) were tested ( Table 6). The RF classified the six tree health classes with an overall classification accuracy ranging from 34.4% to 56.8%; the respective kappa coefficients ranged from 0.09 to 0.34, depending on stand and flight campaign (

Conclusions
In this study we assessed the separability of tree health classes affected over time by a prolonged drought period in the middle of the vegetation season. UAV-based monitoring has potential in early detection of Norway spruce crown discoloration. Our models showed good results in identifying the indices that can separate tree health classes and, in the best cases, also displayed an early detection of tree stress, but at the same time, models were problematic due to the considerable uncertainties in canopy phenology.
When trying to classify with the RF model, our results were not encouraging when six classes were used. However, the results of the RF model were more accurate when only two classes (Healthy trees and Crown damage) were used. We considered that the overall classification accuracy of the RF model was affected by imbalanced representation of trees within tree health classes, therefore exaggerating the majority classes. Moreover, the overlapping of classes and other factors, such as timing of image acquisition, most likely complicated the classification and separability of early stressed trees from the data of UAV remote sensing. Therefore, future research should address the main challenges to solve limitations of the current study, focusing on how to increase the stability of the RF classifier by reducing the imbalance of the representation between classes. Future research also needs to deal with the damage progression over time in long-term observations. Moreover, uneven distributions of the affected needles within crowns are another set of challenges for future research.