Single Tree Classification Using Multi-Temporal ALS Data and CIR Imagery in Mixed Old-Growth Forest in Poland

Tree species classification is important for a variety of environmental applications, including biodiversity monitoring, wildfire risk assessment, ecosystem services assessment, and sustainable forest management. In this study we used a fusion of three remote sensing (RM) datasets including ALS (leaf-on and leaf-off) and colour-infrared (CIR) imagery (leaf-on), to classify different coniferous and deciduous tree species, including dead class, in a mixed temperate forest in Poland. We used intensity and structural variables from the ALS data and spectral information derived from aerial imagery for the classification procedure. Additionally, we tested the differences in classification accuracy of all the variants included in the data integration. The random forest classifier was used in the study. The highest accuracies were obtained for classification based on both point clouds and including image spectral information. The mean values for overall accuracy and kappa were 84.3% and 0.82, respectively. Analysis of the leaf-on and leaf-off alone is not sufficient to identify individual tree species due to their different discriminatory power. Leaf-on and leaf-off ALS point cloud features alone gave the lowest accuracies of 72% ≤ OA ≤ 74% and 0.67 ≤ κ ≤ 0.70. Classification based on both point clouds was found to give satisfactory and comparable results to classification based on combined information from all three sources (83% ≤ OA ≤ 84% and 0.81 ≤ κ ≤ 0.82). The classification accuracy varied between species. The classification results for coniferous trees were always better than for deciduous trees independent of the datasets. In the classification based on both point clouds (leaf-on and leaf-off), the intensity features seemed to be more important than the other groups of variables, especially the coefficient of variation, skewness, and percentiles. The NDVI was the most important CIR-based feature.


Introduction
Tree species classification is important for a variety of environmental applications, including biodiversity monitoring [1], wildfire risk assessment [2], ecosystem services assessment [3], and sustainable forest management [4]. Mapping tree species through visual interpretation of aerial images by experts in combination with in situ measurements is labour-intensive, time-consuming, and costly. Moreover, the method is not applicable to large forest areas [5]. Accordingly, technological developments have influenced the possibility of mapping the forest species composition using remote sensing data.
Remote sensing data are divided into active and passive. Active sensors emit a signal from the sensor and, after reflection from the object, the signal is received and analysed. Active remote sensing includes light detection and ranging (LiDAR) and synthetic aperture radar (SAR) data. Passive sensors, on the other hand, use the analysis of signals emitted by the observed object. Multispectral and hyperspectral imagery are examples of passive remote sensing and have been used to classify tree species in recent decades [6][7][8][9][10]. However, in the development of solutions based on optical data, it has been found that parts of the crown than in the leaf-on season, and there are more intermediate reflections from the inner part of the crown, providing more information about the vertical structure of the tree and its crown [44]. Therefore, using point clouds from both seasons has the tangible advantage of providing additional information about the vertical crown structure, more than using only one dataset based on discrete return or full-waveform ALS data [45]. In this study we used a fusion of three remote sensing (RM) datasets including ALS (leaf-on and leaf-off) and colour-infrared (CIR) imagery (leaf-on) to classify different coniferous and deciduous tree species, including dead class, in a mixed temperate forest in Poland. We used intensity and structural variables from the ALS data and spectral information derived from aerial imagery for the classification procedure. Our specific objectives were as follows: (1) to evaluate and compare species classification accuracies of all variants contained in the data integration, (2) to identify the most appropriate period of data acquisition for the classification of individual tree species, and (3) to investigate the most important metrics for tree species classification.

Study Area
The Białowieża Forest (BF) is a large forest complex located on the border between Poland and Belarus (52 • 45 29 N, 23 • 46 8 E). A part of the BF has retained its original character, thanks to the protection lasting for several centuries. The Polish part of BF covers approx. 62,000 ha, of which 10,500 ha is a national park; nature reserves cover about 12,000 ha, while the remaining forests are about 39,500 ha ( Figure 1).

ALS Data and CIR Aerial Images
The leaf-on ALS dataset was acquired on 2-5 July 2015, while data from the leaf-off season was acquired 25 and 27 November 2015 and 6-7 December 2015. Both point clouds were acquired using the Riegl LMS-Q680i scanner (wavelength 1550 nm) integrated in a full-waveform laser scanning system. Simultaneously, in the leaf-on season, CIR images were obtained. This allowed the provider to assign spectral values to each point from the point cloud. The point density was around 11 points/m 2 . The flying altitude was approximately 500 m above ground. A total of 135 individual flight lines were recorded with 40% strip overlap. ALS leaf-off data were used to generate the digital elevation model (DEM) and ALS leaf-on data to generate the digital surface model (DSM) with a 0.5 m resolution. Forty-five points on logs were measured using RTK to verify that the points were classified as ground that was included in the generation of the DTM. This comparative procedure is expressed as an RMSE (root mean square error) value of 0.08 m. These models allowed the calculation of a normalized digital surface model (nDSM) [47] on which the segmentation process of individual trees was based. A commonly used marker-controlled watershed segmentation algorithm was chosen for the work. Individual trees and crowns were detected using the method proposed by Stereńczak et al. [48], who attempted to parameterise the segmentation into three height ranges (h ≤ 25 m; 25 m < h ≤ 35 m; h > 35 m). For each height range, the canopy height model is filtered with different settings assigned during the automatic optimisation for each height range. After the filtration process, hierarchical segmentation was performed, starting from the

ALS Data and CIR Aerial Images
The leaf-on ALS dataset was acquired on 2-5 July 2015, while data from the leaf-off season was acquired 25 and 27 November 2015 and 6-7 December 2015. Both point clouds were acquired using the Riegl LMS-Q680i scanner (wavelength 1550 nm) integrated in a fullwaveform laser scanning system. Simultaneously, in the leaf-on season, CIR images were obtained. This allowed the provider to assign spectral values to each point from the point cloud. The point density was around 11 points/m 2 . The flying altitude was approximately 500 m above ground. A total of 135 individual flight lines were recorded with 40% strip overlap. ALS leaf-off data were used to generate the digital elevation model (DEM) and ALS leaf-on data to generate the digital surface model (DSM) with a 0.5 m resolution. Forty-five points on logs were measured using RTK to verify that the points were classified as ground that was included in the generation of the DTM. This comparative procedure is expressed as an RMSE (root mean square error) value of 0.08 m. These models allowed the calculation of a normalized digital surface model (nDSM) [47] on which the segmentation process of individual trees was based. A commonly used marker-controlled watershed segmentation algorithm was chosen for the work. Individual trees and crowns were detected using the method proposed by Stereńczak et al. [48], who attempted to parameterise the segmentation into three height ranges (h ≤ 25 m; 25 m < h ≤ 35 m; h > 35 m). For each height range, the canopy height model is filtered with different settings assigned during the automatic optimisation for each height range. After the filtration process, hierarchical segmentation was performed, starting from the top layer of the tree canopies using a pouring algorithm. Subsequently, the resulting segments were adjusted in a five-step procedure. To obtain a more accurate classification, it is necessary to calibrate the intensity values to reduce the influence of various factors that affect these values, such as the range from the sensor. In our study, the radiometric calibration of the ALS point cloud intensity was performed using a simplified model described in Korpela et al. [32].
CIR aerial images were collected using an UltraCam Eagle Camera. CIR images were acquired from 3040 m, with 0.2 m ground sampling distance (GSD). In total, 1372 images were taken. Coverage between them was 90%/40% along-track/cross-track overlap, respectively. Each point in the ALS data contained spatial information from aerial imagery projected orthogonally to be able to assign CIR values. The acquired images were used in the process of point cloud colourization.

Field Measurement
Field data were collected from July to the end of October 2015. A total of 685 circular sample plots of a 12.62 m radius were measured. Sample plots were distributed throughout the BF. During the fieldwork, a large number of tree-related characteristics were recorded: tree species, tree height, crown length, and diameter at breast height. In addition, for each tree, its visibility from above was determined, i.e., the possibility of its registration on photogrammetric data. The centres of each plot were measured using real-time kinematic (RTK) or a static-mode, geodetic-class global navigation satellite systems receiver. The SD = 0.096 m was the result of the differential pre-processing of row GNSS data. We assumed a similar or better precision for the RTK fix mode, which was much less frequently used due to the dense forest in which the measurements were carried out. Based on the relationship between the tree and sample centre (distance and azimuth), the position of each tree was calculated.
The seven tree species dominant in the Białowieża Forest were considered in the tree species classification: birch (Betula spp.), oak (Quercus spp.), hornbeam (Carpinus betulus L.), lime (Tilia cordata Mill.), alder (Alnus glutinosa Gaertn.), pine (Pinus sylvestris L.), and spruce (Picea abies (L.) H. Karst). In addition, a class of dead trees was selected, containing mainly dead spruce trees. The number of sample trees were selected based on their estimated percent coverage in the study area. A total of 1230 reference trees were selected, subdivided by species (Table 2). Selected trees varied in height and crown width in order to build a model with the highest possible variability. Examples of point cloud cross-sections from both seasons and CIR images for individual tree species are shown in Figure 2. Table 2. Number (n) of selected reference trees divided into particular species. Additionally, information about the minimum (Min), maximum (Max), and mean (with standard deviation (SD)) height of trees in each class is presented.

Extracting ALS and CIR Features
Classification features were derived from height measurements of ALS describing vertical crown structure ("structural features"), from the intensity distribution of ALS ("intensity features"), and from the RGB attribute of the point cloud ("CIR features"). The intensity features were computed for the first echoes (first of many and single echoes). Since the purpose of our work is to distinguish individual trees in the canopy as reliably as possible, the ALS features (structural and intensity) and CIR features were calculated only for points above half the height (H = Hmax/2) of a given segment [25], except for the Remote Sens. 2021, 13, 5101 6 of 20 features marked as "additional" in Table 3. A description of all the features used in the study can be found in Table 3.

Extracting ALS and CIR Features
Classification features were derived from height measurements of ALS describing vertical crown structure ("structural features"), from the intensity distribution of ALS ("intensity features"), and from the RGB attribute of the point cloud ("CIR features"). The intensity features were computed for the first echoes (first of many and single echoes). Since the purpose of our work is to distinguish individual trees in the canopy as reliably as possible, the ALS features (structural and intensity) and CIR features were calculated All features were calculated for ALS point clouds and CIR attributes within segments (the results of the single tree detection procedure). The classification was based on different datasets, and the explanation of the coded symbols on the basis of which each variant was defined is given in Table 4.
For example, CIR_ALS SW means that a variant with features derived from the ALS point cloud from both seasons (SW) and CIR aerial images were used, while W_CanRR means the canopy relief ratio calculated from the point cloud from the leaf-off season.

Classification Strategy
The classification was conducted using the random forests (RF) algorithm [49]. To avoid overfitting the classification model, a 5-fold cross-validation was performed, repeated 20 times, and the mean classification accuracy indices for each classifier were determined. The classification and optimization processes were performed using the Caret package in R [50].

P mean
The ratio of the total number of points above the mean to the total number of all points

P median
The ratio of the total number of points above the median to the total number of all points CanRR Canopy relief ratio of points: (avg(X)-min(X))/(max(X)-min(X))

Additional features P fe_all
The proportion of first returns P single_all The proportion of single returns The standard deviation of reflectance in the near-infrared band R mean Mean value of reflectance in the red band R median Median value of reflectance in the red band R sd The standard deviation of reflectance in the red band G mean Mean value of reflectance in the green band G median Median value of reflectance in the green band G sd The standard deviation of reflectance in the green band The random forests algorithm [49] used in the study is a further development of the classification trees [51]. The algorithm belongs to the ensemble classification methods. Each tree is generated using the bagging method. A large number of trees are generated and the classification result is obtained as a voting result. When a training set contains N Remote Sens. 2021, 13, 5101 8 of 20 cases, N cases are selected randomly but with a replacement to create a tree. The remaining cases (about 37% of the total sample size) are called out-of-bag (OOB) samples and are used for validation. At each node of a tree, m variables are randomly selected from M (the total number of variables) and used to find the best split. The classification algorithm is implemented in the randomForest package for R [52]. The following parameter settings for RF were used in each classification: 500 decision trees were created, with the number of predictors randomly selected at each partitioning equal to sqrt(M).

Accuracy Assessment and Statistical Analysis
The accuracy of the classifications was evaluated using an error matrix and the following indices: overall accuracy (OA), kappa coefficient (κ), producer accuracy (PA), and user accuracy (UA) [53,54]. Additionally, F1-score was calculated for each class according to the following formula: The importance of the variables for each iteration was recorded and the mean importance measure of each feature was calculated to select the best predictors. In addition, the discrimination between classes was tested by the Kruskal-Wallis test. McNemar's test was used to determine if there were statistically significant differences between pairs of classification variants with the different predictor settings [55]. For all statistical tests, the significance level was set at α = 0.05. The methodological framework developed in this study for classifying individual trees is shown in Figure 3.

Classification Results
In general, we can state that all classification results, with different combinations of ALS features and CIR, result in high accuracy ( Figure 4). The highest levels of accuracies were obtained for classification based on both point clouds and including image information (CIR_ALSSW); mean values of overall accuracy and kappa were equal to 84.3% and 0.82, respectively. Slightly worse, though not statistically significant (McNemar's test, p > 0.05) classification results were obtained for both point clouds (ALSSW); the decrease of 0.01 in the kappa coefficient and 1 percentage point in overall classification accuracy were noted without CIR variables (Figure 4). Leaf-on and leaf-off ALS point cloud features alone produced the lowest accuracies with no statistically significant difference (McNemar's test, p > 0.05). The variant ALSs turned out to be the worst option and resulted in OA = 72.1% and κ = 0.67. Image information (CIR variables) improved

Classification Results
In general, we can state that all classification results, with different combinations of ALS features and CIR, result in high accuracy ( Figure 4). The highest levels of accuracies were obtained for classification based on both point clouds and including image information (CIR_ALS SW ); mean values of overall accuracy and kappa were equal to 84.3% and 0.82, respectively. Slightly worse, though not statistically significant (McNemar's test, p > 0.05) classification results were obtained for both point clouds (ALS SW ); the decrease of 0.01 in the kappa coefficient and 1 percentage point in overall classification accuracy were Remote Sens. 2021, 13, 5101 9 of 20 noted without CIR variables (Figure 4). Leaf-on and leaf-off ALS point cloud features alone produced the lowest accuracies with no statistically significant difference (McNemar's test, p > 0.05). The variant ALSs turned out to be the worst option and resulted in OA = 72.1% and κ = 0.67. Image information (CIR variables) improved classification accuracies when comparing classification based on the leaf-on cloud alone, with an increase of 0.09 for the kappa coefficient and 8 percentage points for overall accuracy. CIR variables increased the overall accuracy from 73.2% (leaf-off) to 75.8% (leaf-off with CIR) as well as the kappa coefficient from 0.69 to 0.72, respectively. However, the difference between both variants was not statistically significant (McNemar's test, p > 0.05).

Species Classification
Classification accuracy varied between species ( Figure 5). Regardless of the variant, coniferous trees (living and dead) were always classified with the higher accuracy (≥79% for both PA and UA among all variants) than deciduous (0.43 ≤ UA ≤ 0.87; 0.28 ≤ PA ≤ 0.89). Among all the classes, dead spruce was best classified: it only mixed slightly with living spruce. It is worth noting that the classification results were distinguished by their low variability. The standard deviation (STD) of the overall accuracy and kappa for all variants was less than 2% and 0.027, respectively.

Species Classification
Classification accuracy varied between species ( Figure 5). Regardless of the variant, coniferous trees (living and dead) were always classified with the higher accuracy (≥79% for both PA and UA among all variants) than deciduous (0.43 ≤ UA ≤ 0.87; 0.28 ≤ PA ≤ 0.89). Among all the classes, dead spruce was best classified: it only mixed slightly with living spruce. Remote Sens. 2021, 13, x FOR PEER REVIEW 11 of 22 Statistically significant differences between leaf-on and other variants were noticed for spruce, pine, and dead trees. Additionally, no significant differences were noticed for all variants apart from leaf-on acquisition (UA/PA ≥ 0.9), for which accuracies were the worst (UA/PA ≤ 0.9). In the leaf-on season, conifers blended with birch and additionally alder fell into the conifers.
Among deciduous trees, the lowest classification accuracies were noticed for lime (0.44 ≤ UA ≤ 0.75; 0.28 ≤ PA ≤ 0.43) independent of the analysed datasets. Especially in leaf-off season, lime was often classified as alder or hornbeam. McNemar's test did not indicate significant differences between each pair of the rest of the acquisition variants (Table 5).
Quite good classification results were obtained for alder (0.58 ≤ UA ≤ 0.70; 0.77 ≤ PA ≤ 0.89). The worst accuracies for that species were noticed for both point clouds alone (UA = 0.58-0.63, PA = 0.77-0.78). McNemar's test did not indicate significant differences between each pair of stacked acquisition results (CIR_ALSW, CIR_ALSS, CIR_ALSSW, ALSSW) (Table 5). Generally, it can be stated that alders were mixed with birches, although other deciduous species were also often classified as alder. Additionally, in the leaf-on season, it happened to be classified as spruce or pine.
Similar to alder, significantly lower classification results for leaf-on and leaf-off seasons alone (UA = 0.59-0.60, PA = 0.45-0.47) than the rest of the variants were noticed for birch. The best results for that species were obtained for the three combined datasets (UA = 0.87, PA = 0.88) with no significant differences with the combined leaf-on and leafoff datasets (UA = 0.85, PA = 0.87). Regardless of the variant, birch was mixed with other . Box-and-whisker plot of the producer's (PA) and user's (UA) accuracies of specific tree species for the classification based on both point clouds and image information. The first and third quartiles define the box, the median is shown as "-", and the whisker defines the range of the data without outliers. The ALS is the point cloud from the leaf-on (S) or/and leaf-off (W) season, whereas the CIR is the spectral information derived from the colour-infrared images. Statistically significant differences between leaf-on and other variants were noticed for spruce, pine, and dead trees. Additionally, no significant differences were noticed for all variants apart from leaf-on acquisition (UA/PA ≥ 0.9), for which accuracies were the worst (UA/PA ≤ 0.9). In the leaf-on season, conifers blended with birch and additionally alder fell into the conifers.
Among deciduous trees, the lowest classification accuracies were noticed for lime (0.44 ≤ UA ≤ 0.75; 0.28 ≤ PA ≤ 0.43) independent of the analysed datasets. Especially in leaf-off season, lime was often classified as alder or hornbeam. McNemar's test did not indicate significant differences between each pair of the rest of the acquisition variants (Table 5). Table 5. F1-scores of the classification results for different sets of features. Letters "a, b, c" identify groups with no significant differences (based on McNemar's tests). The ALS is the point cloud from the leaf-on (S) or/and leaf-off (W) season, whereas the CIR is the spectral information derived from the colour-infrared images.  (Table 5). Generally, it can be stated that alders were mixed with birches, although other deciduous species were also often classified as alder. Additionally, in the leaf-on season, it happened to be classified as spruce or pine.
Similar to alder, significantly lower classification results for leaf-on and leaf-off seasons alone (UA = 0.59-0.60, PA = 0.45-0.47) than the rest of the variants were noticed for birch. The best results for that species were obtained for the three combined datasets In the case of hornbeam, the results obtained for leaf-off and leaf-off with CIR data (UA = 0.43, PA = 0.37) were statistically significantly lower than the other variants (0.65 ≤ UA/PA ≤ 0.68), for which McNemar's test did not indicate significant differences between each pair of acquisition results (Table 5). Hornbeam, like oak, did not mix with coniferous. It mixed with all other deciduous, especially under leaf-off acquisition.
F1-scores of the species classification results can be used for ranking possible data combinations. Additionally, McNemar's test determined if there were statistically significant differences between pairs of classification variants with the different predictor settings. Table 5's results indicate that the leaf-off season was more favourable for conifers and oak, while hornbeam and lime exhibited better results in the leaf-on season. The combination of both point clouds was favourable for birch and alder. Image information (CIR variables) significantly improved classification accuracy when comparing classification based on the leaf-on cloud alone for birch and conifers.

Predictors Importance
The most important features for each variant for tree species classification based on measurements provided by the RF algorithm were presented in Table 6. Generally, in the classification based on both point clouds (leaf-on and leaf-off), the intensity features appeared to be more important than the other groups of variables, in particular the coefficient of variation, skewness, and percentiles. In the classification based on point cloud with CIR features, NDVI was among the most important predictors. Table 6. The most important predictors in the classification variants. The ALS is the point cloud from the leaf-on (S) or/and leaf-off (W) season, whereas the CIR is the spectral information derived from the colour-infrared images. The descriptions of the predictors and variants are included in Tables 3  and 4, respectively. NDVI turned out to be an important predictor for distinguishing the tree health conditions and coniferous versus deciduous ( Figure 6). Mean values of NDVI were positive for living trees and negative for dead trees. Additionally, NDVI values for deciduous were statistically and significantly higher than those of the coniferous classes. for the sample trees analysed in the study. The first and third quartiles define the box, the median is shown as "-", the mean is shown as "+", and the whisker defines the range of the data without outliers. Letters a, b, … above the boxes identify groups with no significant differences (based on the Kruskal-Wallis test). The ALS is the point cloud from the leaf-on (S) or/and leaf-off (W) season, whereas the CIR is the spectral information derived from the colourinfrared images. The descriptions of the predictors are included in Tables 3 and 4.

Discussion
This study evaluated and compared results to discriminate different coniferous and deciduous tree species, including dead class, in a mixed temperate forest in Poland on the basis of ALS point cloud data (leaf-on and leaf-off) and CIR imagery (leaf-on). In our research, the highest level of accuracy was obtained with the use of both point clouds and the CIR dataset, and the lowest accuracies were obtained using the ALS leaf-on dataset alone. The difference in the classification performance was minor between leaf-on and leaf-off conditions (72% ≤ OA ≤ 74% and 0.67 ≤ κ ≤ 0.70) while the classification based on both point clouds achieved satisfactory and comparable results to the classification based on combined information from all three sources (83% ≤ OA ≤ 84% and 0.81 ≤ κ ≤ 0.82). Image information (CIR variables) improved classification accuracies when comparing classification based on the leaf-on point cloud alone. The classification accuracy varied between species. The classification results for coniferous trees were always better than for deciduous trees, independent of the datasets (≥79% for both PA and UA).
In the classification based on both point clouds (leaf-on and leaf-off), the intensity features appeared to be more important than the other groups of variables, in particular the coefficient of variation, skewness, and percentiles. The NDVI was the most important CIR-based feature. and (e) W_IP90 for the sample trees analysed in the study. The first and third quartiles define the box, the median is shown as "-", the mean is shown as "+", and the whisker defines the range of the data without outliers. Letters a, b, . . . above the boxes identify groups with no significant differences (based on the Kruskal-Wallis test). The ALS is the point cloud from the leaf-on (S) or/and leaf-off (W) season, whereas the CIR is the spectral information derived from the colour-infrared images. The descriptions of the predictors are included in Tables 3 and 4. Two the most important features under the leaf-on conditions (I CV , I P90 ) distinguished spruce versus pine and birch versus the rest of deciduous trees ( Figure 6). No significant differences between hornbeam, lime, and oak were noticed. The mean value of I CV for dead spruces was the highest one, with significant differences with other classes. I P90 in turn allowed alder to be discriminated from the other classes.

Classification Results
I CV under leaf-off conditions allowed pine and {dead, oak} to be separated from the other classes ( Figure 6). The coefficient of variation of intensity, was the lowest for pine and the highest for dead spruces and oak. I P90 under the leaf-off conditions in turn allowed {dead and alive spruce} vs. {oak, pine} vs. {other deciduous trees} to be discriminated. Mean values of I P90 were significantly higher for coniferous trees than deciduous trees.

Discussion
This study evaluated and compared results to discriminate different coniferous and deciduous tree species, including dead class, in a mixed temperate forest in Poland on the basis of ALS point cloud data (leaf-on and leaf-off) and CIR imagery (leaf-on). In our research, the highest level of accuracy was obtained with the use of both point clouds and the CIR dataset, and the lowest accuracies were obtained using the ALS leaf-on dataset alone. The difference in the classification performance was minor between leaf-on and leaf-off conditions (72% ≤ OA ≤ 74% and 0.67 ≤ κ ≤ 0.70) while the classification based on both point clouds achieved satisfactory and comparable results to the classification based on combined information from all three sources (83% ≤ OA ≤ 84% and 0.81 ≤ κ ≤ 0.82).
Image information (CIR variables) improved classification accuracies when comparing classification based on the leaf-on point cloud alone. The classification accuracy varied between species. The classification results for coniferous trees were always better than for deciduous trees, independent of the datasets (≥79% for both PA and UA).
In the classification based on both point clouds (leaf-on and leaf-off), the intensity features appeared to be more important than the other groups of variables, in particular the coefficient of variation, skewness, and percentiles. The NDVI was the most important CIR-based feature.

Classification Results
Several studies have discussed the influence of canopy seasonal stage on tree properties from ALS data [5,17,25,[41][42][43]. Comparing both point clouds, Kim et al. [5], Kamińska et al. [25], Ørka et al. [41], and Reitberger et al. [42] reported significantly better classification results in leaf-off conditions than in leaf-on conditions (Table 7). Conversely, in Laslier et al. [43], leaf-on acquisition worked better for classification of riparian tree species. Similar to our study, Shi et al. [17] reported only a slight, but not statistically significant, improvement in classification accuracy when using leaf-off data compared to leaf-on data. Our study revealed the necessity of temporal information (leaf-off and leaf-on variables) to classify tree species. This result is concordant with Shi et al. [17] and Kamińska et al. [25], who performed classification in a mixed forest (Table 7). Other studies provided by Kim et al. [5] and Laslier et al. [43] have also confirmed that the integration of ALS data acquired in two seasons under leaf-on and leaf-off conditions improves tree classification accuracy. Leaf-on and leaf-off conditions provide different information about the tree, and therefore are complementary. The penetration of the signal in the leaf-off season gives information about branches' structure, while the penetration of the signal in the leaf-on season provides information about the leaf density.
It is worth pointing out that it is hard to directly compare our accuracies with other studies' results due to differences in material that may result from many factors, e.g., species diversity, sample size, sensor type, and terrain denivelation.

Species Classification
The classification accuracy varied between species. The classification results for coniferous trees were always better than for deciduous trees independent of the datasets (≥79% for both PA and UA). Among deciduous trees, the best results were obtained for birch and alder and the lowest classification accuracies were noticed for lime, independent of the analysed datasets. Considering the combined variants, the classification accuracy was the lowest for lime and hornbeam (Table 5). These species are difficult to distinguish, as presented in Figure 6, where it can be seen that lime and hornbeam form groups with no significant differences in each of the best predictors. In general, deciduous trees were often characterised by similar intensity (e.g., Figure 6b,e) and spectral values (Figure 6a), which makes it difficult to distinguish between these species. In the conifers, significant differences were observed, especially in the coefficient of variation of intensity (Figure 6b,d), which led to a more accurate differentiation of spruce and pine. These species were also characterised by significantly lower intensity values in the leaf-on season than deciduous trees (except birch) (Figure 6c) and conversely by significantly higher values in the leaf-off season (except oak) (Figure 6e). This is in agreement with the findings of Kamińska et al. [25] and Ba et al. [18]. Kamińska et al. [25] reported high classification results for living spruce and pine, regardless of the dataset, as well as for dead spruce. According to machine learning classifiers used in the study by Ba et al. [18], the most separable trees were alders, poplars, and willows (birch was not considered), and lime trees were poorly classified.

Optimal Data Acquisition
Our study revealed the interest in using both leaf-on and leaf-off data to discriminate different coniferous and deciduous tree species in a mixed temperate forest in Poland. The leaf-off season was more favourable for conifers and oak, while hornbeam and lime obtained better results in the leaf-on season. The combination of both point clouds was favourable for birch and alder (Table 5). Our results demonstrate that differentiation between individual tree species can benefit from acquiring data in a species-appropriate period by obtaining species-specific ALS metrics. In most of the studies analysed in the temperate and boreal forests in Europe, the favourable time for acquisition was the leaf-off season [17,25,41,42], with the exception of the study by Laslier et al. [43]. This shows that the optimal conditions for obtaining data for tree species classification can depend on many factors, such as the tree species analysed, the forest type, age (which is related to the different tree structures), the climate zone, or the forest management regimes.
However, if only one ALS acquisition can be done (for economic reasons, for example), leaf-on conditions with image information included should be preferred since CIR variables significantly improved the classification accuracy when comparing classification based on the leaf-on point cloud alone for birch and conifers.

Predictor's Importance
When we use the geometric features based on ALS, we either use more architecturerelated features for data that do not contain leaves (leaf-off data), or more features related to the shape of the tree crowns for data that do contain leaves (leaf-on data). For oak, we map large branches when collecting data in the non-leafed state compared to other tree species. Different tree foliage distribution and canopy branching structures, which are often typical for the specified species, help with their recognition during species classification [28,56]. The shape of the tree crown might be advantageous for species recognition, as it is speciesdependent. This has been confirmed by Ørka et al. [57], who pointed out that the major difference between spruce and birch is the rounder crown of birch and the conical shape of a spruce crown. Holmgren et al. [58] indicated that Scots pine can be separated from spruce and deciduous trees based on the tree's relative crown base height, as they have a higher crown base than other species. However, Holmgren et al. [58] emphasized that the height of the base crown varies in the group of species and may depend on factors not included in the classification process. Axelsson et al. [35], for tree species classification, used geometric features to identify variations between species with large leaves and a dense crown and other species with thinner leaves and a sparser crown. The observed misclassifications when using only geometric features could be due to many factors, e.g., the structure of the forest and the presence of lower layers in the stands, morphological similarities between species, structural variations within the same species, external physical properties (soil, temperature), or the density of the point cloud [17].
We need to keep in mind that many geometric features are affected by tree age and tree height (which is related to age) and other properties, such as crown volume, crown shape, and the interior structure of the tree crown. The vertical structure of trees and their typical architecture can be well reflected in the ALS data when these trees grow in a sufficiently loose density without lower forest layers. When trees grow in a high compactness, there are lower forest layers in the stands and different species grow in a mixture, and thus it is difficult to maintain the typical spatial patterns of the distribution of ALS points for specific tree species after the segmentation of individual trees. This is the case in our study. The Białowieża Forest is a very complex forest, with varied structure and species composition. Since a lot of noise was captured in the ALS data due to the BF structure, the structural variables were not as important in the species-related classification.
In the classification based on both point clouds (leaf-on and leaf-off), the intensity features appeared to be more important than the other groups of variables, in particular the coefficient of variation, skewness, and percentiles. Similar results were obtained by Kamińska et al. [25], where the same intensity features indicated high-grade ratings for the discrimination of spruce, pine, and deciduous trees divided by alive or dead classes for both seasons.
Many studies have presented that the combination of ALS height information with radiometric features could achieve the complementary advantages of different details and generate better results for species classification compared to the results generated with geometric or radiometric features alone [17,29,34,[59][60][61]. The backscattered signal intensity is related to the foliage type, leaf size, leaf orientation, and foliage density, providing additional possibilities for the tree species classification [5,29,30]. Image information (CIR variables) improved classification accuracies when comparing classification based on the leaf-on point cloud alone for coniferous (pine, spruce, dead) and birch. No significant improvement was noted for the rest of the deciduous trees. Image information did not improve classification accuracies under the leaf-off condition. Classification accuracy did not vary between species.
The results obtained allow us to conclude that the use of multi-temporal ALS data gives the possibility of an accurate classification of many tree species, similar to the results with the addition of information from CIR images. We assume that a denser point cloud could serve to extract more detailed structural metrics, such as the branch arrangement of a single tree, which could more reliably contribute to a more accurate tree species classification.

Conclusions
In this study, promising results are reported for the classification of different deciduous and coniferous tree species, including dead class, by using combined information from leaf-off and leaf-on ALS datasets and CIR aerial images. In spite of the complex stand structure and heterogeneity of the BF, the classification accuracy was fairly high.
Analysing the leaf-on and leaf-off datasets alone is not sufficient for the identification of individual tree species due to their different discriminatory power. Leaf-on and leaf-off ALS point cloud features alone produced the lowest accuracies of 72% ≤ OA ≤ 74% and 0.67 ≤ κ ≤ 0.70. It was shown that the classification based on both point clouds achieved satisfactory and comparable results to the classification based on the combined information from all three sources (83% ≤ OA ≤ 84% and 0.81 ≤ κ ≤ 0.82).
In the classification based on both point clouds (leaf-on and leaf-off), the intensity features appeared to be more important than the other groups of variables, in particular the coefficient of variation, skewness, and percentiles. The NDVI was the most important CIR-based feature.
The classification accuracy varied between species. The classification results for coniferous trees were always better than for deciduous trees, independent of the datasets. Among deciduous trees, the best results were obtained for birch and alder, and the lowest one was noticed for lime, independent of the analysed datasets.
The use of multi-temporal ALS data offers great potential in tree species classification, with the possibility of omitting optical data. Further research in this area is recommended, especially using metrics from discrete returns and full-waveform simultaneously. Studies using multi-temporal ALS data during the leaf-on season to investigate phenological changes and their impact on tree species classification are also worth considering.