Detection of European Aspen ( Populus tremula L.) Based on an Unmanned Aerial Vehicle Approach in Boreal Forests

: European aspen ( Populus tremula L.) is a keystone species for biodiversity of boreal forests. Large-diameter aspens maintain the diversity of hundreds of species, many of which are threatened in Fennoscandia. Due to a low economic value and relatively sparse and scattered occurrence of aspen in boreal forests, there is a lack of information of the spatial and temporal distribution of aspen, which hampers efﬁcient planning and implementation of sustainable forest management practices and conservation efforts. Our objective was to assess identiﬁcation of European aspen at the individual tree level in a southern boreal forest using high-resolution photogrammetric point cloud (PPC) and multispectral (MSP) orthomosaics acquired with an unmanned aerial vehicle (UAV). The structure-from-motion approach was applied to generate RGB imagery-based PPC to be used for individual tree-crown delineation. Multispectral data were collected using two UAV cameras: Parrot Sequoia and MicaSense RedEdge-M. Tree-crown outlines were obtained from watershed segmentation of PPC data and intersected with multispectral mosaics to extract and calculate spectral metrics for individual trees. We assessed the role of spectral data features extracted from PPC and multispectral mosaics and a combination of it, using a machine learning classiﬁer—Support Vector Machine (SVM) to perform two different classiﬁcations: discrimination of aspen from the other species combined into one class and classiﬁcation of all four species (aspen, birch, pine, spruce) simultaneously. In the ﬁrst scenario, the highest classiﬁcation accuracy of 84% (F1-score) for aspen and overall accuracy of 90.1% was achieved using only RGB features from PPC, whereas in the second scenario, the highest classiﬁcation accuracy of 86 % (F1-score) for aspen and overall accuracy of 83.3% was achieved using the combination of RGB and MSP features. The proposed method provides a new possibility for the rapid assessment of aspen occurrence to enable more efﬁcient forest management as well as contribute to biodiversity monitoring and conservation efforts in boreal forests.


Introduction
Tree species composition of a forest plays a significant role in maintaining biodiversity. Mixed-species forests can host greater species richness and provide more important ecosystem services compared to monocultures of conifers [1][2][3][4]. In boreal environments, particularly old deciduous trees have been recognized to promote species richness [5,6]. The objective of this study was to assess the possibilities to recognize European aspen at the individual tree level in a boreal forest using high-resolution photogrammetric point cloud and multispectral imagery acquired with UAV. Specifically, we asked the following questions: (1) How accurately can aspen be discriminated from the main tree species, Scots pine, Norway spruce and birches (B. pendula and pubescens) with the commonly used multispectral UAV cameras? (2) What are the most important spectral data features to discriminate aspen trees?
To answer these questions, we assessed the performance of different spectral data features, extracted from high-resolution UAV PPC and multispectral orthomosaics using an SVM machine-learning classifier and studied the effect of feature selection for the model performance.

Study Area and Field Data
The study was conducted in a boreal forest area in the Evo region in the Hämeenlinna municipality (61 • 11 N. 25 • 06 E) located in southern Finland ( Figure 1). The area includes both managed and protected southern boreal forests. The main tree species are Scots pine (Pinus sylvestris L.), Norway spruce (Picea abies (L.) Karst), silver birch (Betula pendula) and downy birch (Betula pubescens L.). European aspen (Populus tremula L.) has a relatively sparse and scattered occurrence in the area. calm conditions to minimize wind and shadow effects on images. RGB images acquired with the eBee and Phantom 4 were further pre-processed for precision geotagging in eMotion 3 software (SenseFly SA, Cheseaux-sur-Lausanne, Switzerland) and DJI software (DJI, Shenzhen, China), respectively. The trajectory correction data were obtained online during the flight from Trimnet VRS network in Finland.
Multispectral data were collected using the eBee equipped with a Parrot Sequoia camera (Parrot Drone SAS, Paris, France) and a multirotor platform DJI Matrice 210 (DJI, Shenzhen, China) equipped with MicaSense RedEdge-M camera (MicaSense, Inc., Seattle, Washington, U.S.). The monochromatic multispectral Sequoia camera produces four 1.2megapixel images in the green (550 nm), red (660 nm), red edge (735 nm) and near infrared (NIR) (790 nm) wavelengths. The MicaSense RedEdge-M sensor (further MicaSense) produces five 1.2-megapixel images in blue (475 nm), green (560 nm), red (668 nm), red edge (717 nm) and NIR (840 nm) wavelengths. In order to improve the radiometric quality of the multispectral data, both cameras were equipped with the irradiance sensors, which were attached on the top of the body to correct the illumination conditions during the flight. Additionally, prior to each flight and right after, the radiometric calibration targets with premeasured reflectance values were used for subsequent radiometric correction of the multispectral images. Ground control points (GCPs) were not used due to the low visibility in the dense forest from the air. Therefore, the absolute spatial accuracy of the multispectral orthomosaics was corrected using manually selected natural objects as GCPs, extracted from the RGB orthomosaics with a typical accuracy of 2 cm in the horizontal direction. The altitude of 140-150 m above ground resulted a GSD of 8.6-14.5 cm for multispectral imagery. The total flight area covered with the UAV is 1819 ha. A full summary of the UAV flight parameters is presented as an Appendix A in Table A1.  Most of the field data were collected simultaneously with the flight campaign in July 2018. Separate field datasets were collected for study areas covered by MicaSense and Sequoia sensors (see chapter 2.2). Field measurements of the main tree species were conducted from 25 circular 9-m radius sample plots situated in semiclosed and fully closed canopy forests. Within each sample plot, tree species, diameter at breast height (DBH), tree height, and tree positions were recorded for all trees with DBH > 45 mm. In order to ensure the crown visibility from the UAV images and to better capture the mature and old-growth aspens, only the living and standing trees with DBH over 200 mm were selected for the analyses. The sparse and clustered occurrence of aspen in the study area led to Remote Sens. 2021, 13, 1723 4 of 18 an unbalanced main tree species proportion in the dataset for an analysis. To ensure a sufficient sample size of main trees species, we measured within the study sites more individual tree species in autumn 2019 as complementary data ( Table 1). The locations of trees in the field plots and complementary dataset were measured using Trimble R10 RTK-GPS device using VRS Trimnet network in Finland.

UAV Data
The complete workflow is presented in Figure 1. UAV RGB and multispectral imagery for the analysis were obtained over the 9 test sites within the study area in July 2018 with leaf-on canopy conditions ( Figure 2). The research sites were selected based on visual detection of aspen hotspots in the area, accessibility to the roads, and proximity to the landing spots of the UAVs.

Calculation of Dense Point Clouds and Image Orthomosaics
The analysis workflow of this study is presented in Figure 2. Processing of the UAV images for generating 3D point clouds and producing orthomosaics was carried out using the Agisoft Metashape software (Agisoft LLC, St. Petersburg, Russia), which is a commercial photogrammetry processing software that uses the Structure from Motion (SfM) approach to reconstruct the environment in 3D from a set of overlapping images and allows generation of dense 3D point clouds and orthomosaics. The standard photogrammetric workflow starts from the image alignment procedure, where images are processed using camera with a field of view of 84 • and 1/2.3 CMOS sensor that can capture red-green-blue (RGB) spectral information. The altitude of 130-150 m above ground level resulted in a ground sampling distance (GSD) of 3.9-4.9 cm for the RGB imagery. Images were collected in TIFF format with the camera set in automatic mode at noon under clear sky and calm conditions to minimize wind and shadow effects on images. RGB images acquired with the eBee and Phantom 4 were further pre-processed for precision geotagging in eMotion 3 software (SenseFly SA, Cheseaux-sur-Lausanne, Switzerland) and DJI software (DJI, Shenzhen, China), respectively. The trajectory correction data were obtained online during the flight from Trimnet VRS network in Finland.
Multispectral data were collected using the eBee equipped with a Parrot Sequoia camera (Parrot Drone SAS, Paris, France) and a multirotor platform DJI Matrice 210 (DJI, Shenzhen, China) equipped with MicaSense RedEdge-M camera (MicaSense, Inc., Seattle, WA, USA). The monochromatic multispectral Sequoia camera produces four 1.2megapixel images in the green (550 nm), red (660 nm), red edge (735 nm) and near infrared (NIR) (790 nm) wavelengths. The MicaSense RedEdge-M sensor (further MicaSense) produces five 1.2-megapixel images in blue (475 nm), green (560 nm), red (668 nm), red edge (717 nm) and NIR (840 nm) wavelengths. In order to improve the radiometric quality of the multispectral data, both cameras were equipped with the irradiance sensors, which were attached on the top of the body to correct the illumination conditions during the flight. Additionally, prior to each flight and right after, the radiometric calibration targets with premeasured reflectance values were used for subsequent radiometric correction of the multispectral images. Ground control points (GCPs) were not used due to the low visibility in the dense forest from the air. Therefore, the absolute spatial accuracy of the multispectral orthomosaics was corrected using manually selected natural objects as GCPs, extracted from the RGB orthomosaics with a typical accuracy of 2 cm in the horizontal direction. The altitude of 140-150 m above ground resulted a GSD of 8.6-14.5 cm for multispectral imagery. The total flight area covered with the UAV is 1819 ha. A full summary of the UAV flight parameters is presented as an Appendix A in Table A1.

Calculation of Dense Point Clouds and Image Orthomosaics
The analysis workflow of this study is presented in Figure 2. Processing of the UAV images for generating 3D point clouds and producing orthomosaics was carried out using the Agisoft Metashape software (Agisoft LLC, St. Petersburg, Russia), which is a commercial photogrammetry processing software that uses the Structure from Motion (SfM) approach to reconstruct the environment in 3D from a set of overlapping images and allows generation of dense 3D point clouds and orthomosaics. The standard photogrammetric workflow starts from the image alignment procedure, where images are processed using their full resolution (corresponds to "high quality" settings) to find the orientation and produce the sparse point cloud. During the image alignment, the software performs the camera calibration based on Brown's distortion model [65]. Based on estimated image positions the software calculates depth information for each image to be combined into a single dense point cloud. During this stage, the software generates PPC using downscaled images to the factor of 4 (corresponds to "high quality" settings) together with a depth filtering mode "Mild" to obtain detailed and accurate geometry without outliers among the points [44,66]. Resulting point cloud included XYZ coordinates and spectral information (i.e., red, green, and blue) for each point. Orthomosaics, geometrically corrected aerial images that are composed from individual still images and stitched together, were produced from multispectral and RGB images, respectively, using the same software and procedure described above. Once the point cloud was generated, it was further used to create orthophotos. RGB orthomosaics were used only for visual verification of field-measured trees and the quality of further individual tree-crown segmentation results. The dense point clouds and multispectral orthomosaics were exported in LAS and TIF formats, respectively, for further processing.

Individual Tree Detection and Spectral Data Features Extraction
The photogrammetric point cloud from each site was normalized with the publicly available ALS-based digital terrain model (2m) provided by the National Land Survey of Finland. The canopy height model was created using lastools [67]. Individual tree detection (ITD) was performed in R-software [68] using R-package rlidar [69] to detect the location and height of individual trees within the PPC-derived Canopy Height Model (CHM). The algorithm implemented in this function was local maxima with a fixed window size. The maximum value of crown diameter was set to 3 m to capture only the tree top and reduce mixing with nearby tree crowns. Then the created tree segments were overlaid with the multispectral mosaics and field-measured trees to further extract the spectral features of all bands for each corresponding tree top and field measured tree. Only pixels located inside the tree-crown segment were taken into analysis. Some of the tree segments contained multiple field measurements. In case the measurements represented only one tree species, they were included in the analysis. All segments with measurements from multiple tree species were excluded from the analysis. Results of the segmentation were visually assessed against the RGB orthomosaics ( Figure 3). In total 30 and 36 spectral features for Sequoia and MicaSense datasets, respectively, were extracted for each individual tree segment and mean, min, max, median and 25%, 75% percentiles of each spectral band were calculated for each segment; a full list of the spectral features is presented as an appendix in Table A2. Before classification, the spectral variables were normalized by dividing the brightness at each band by the sum of all brightness values observed for the same point [70].
3, x FOR PEER REVIEW 6 of 18 (ITD) was performed in R-software [68] using R-package rlidar [69] to detect the location and height of individual trees within the PPC-derived Canopy Height Model (CHM). The algorithm implemented in this function was local maxima with a fixed window size. The maximum value of crown diameter was set to 3 meters to capture only the tree top and reduce mixing with nearby tree crowns. Then the created tree segments were overlaid with the multispectral mosaics and field-measured trees to further extract the spectral features of all bands for each corresponding tree top and field measured tree. Only pixels located inside the tree-crown segment were taken into analysis. Some of the tree segments contained multiple field measurements. In case the measurements represented only one tree species, they were included in the analysis. All segments with measurements from multiple tree species were excluded from the analysis. Results of the segmentation were visually assessed against the RGB orthomosaics ( Figure 3). In total 30 and 36 spectral features for Sequoia and MicaSense datasets, respectively, were extracted for each individual tree segment and mean, min, max, median and 25%, 75% percentiles of each spectral band were calculated for each segment; a full list of the spectral features is presented as an appendix in Table A2. Before classification, the spectral variables were normalized by dividing the brightness at each band by the sum of all brightness values observed for the same point [70].

Figure 3.
An example of the identified tree-crown segments with the field-measured trees (green star cross) on RGB (left) and multispectral (right) orthomosaics.

Statistical Analysis
The segment data derived from the RGB point clouds, MicaSense and Sequoia mosaics were imported into statistical software R for classification with the machine learning method SVM. The SVM is implemented in the package caret [71]. We implemented SVM with radial basis function kernel that separates different classes by projecting the original data into new dimensionality such that it becomes easier to separate through linear hyperplanes. SVM is a commonly used method due to its ability to generalize well, even with smaller training samples [72].
Before fitting the SVM models, we applied recursive feature elimination (RFE), a wrapper feature selection algorithm implemented in caret to avoid model overfitting by eliminating unimportant and potentially noisy features [73]. The model was first fit to all predictors in the feature set and predictors were ranked based on their importance to the model. The least important predictor in the feature set was eliminated, and a second model was fitted with the remaining predictors. The iteration continued until no more improvement in model performance could be observed. We used external validation with 10-fold cross-validation as an outer resampling method to the RFE algorithm. Feature selection was applied separately for three different feature sets: spectral features extracted

Statistical Analysis
The segment data derived from the RGB point clouds, MicaSense and Sequoia mosaics were imported into statistical software R for classification with the machine learning method SVM. The SVM is implemented in the package caret [71]. We implemented SVM with radial basis function kernel that separates different classes by projecting the original data into new dimensionality such that it becomes easier to separate through linear hyperplanes. SVM is a commonly used method due to its ability to generalize well, even with smaller training samples [72].
Before fitting the SVM models, we applied recursive feature elimination (RFE), a wrapper feature selection algorithm implemented in caret to avoid model overfitting by eliminating unimportant and potentially noisy features [73]. The model was first fit to all predictors in the feature set and predictors were ranked based on their importance to the model. The least important predictor in the feature set was eliminated, and a second model was fitted with the remaining predictors. The iteration continued until no more improvement in model performance could be observed. We used external validation with 10-fold cross-validation as an outer resampling method to the RFE algorithm. Feature selection was applied separately for three different feature sets: spectral features extracted from PPC (hereafter referred as RGB), spectral features extracted from multispectral mosaics (hereafter MSP), and the combination of the two (RGB+MSP). Due to the stochastic nature of the RFE algorithm, we iterated it 21 times to select the final number of features for each model based on median overall accuracy value.
Data partitioning was applied to divide the total of 306 samples of MicaSense and 465 samples of Sequoia datasets into 70% training and 30% testing sets using proportional stratified random sampling [71]. The first set was used only for model training, while the second set was used as an independent hold-out test set for accuracy assessment and feature importance calculations.
After feature selection and data partitioning, separate SVM models were fit with the three alternative feature sets, which were centered and scaled for model training. Furthermore, SVM hyperparameters cost and sigma were optimized with grid search [73] from 15 values ranging from 0.25 to 4096 for cost, and 0.01 to 0.35 for sigma. We iterated training and prediction steps 21 times for each SVM model to get more stable results due to the stochastic nature of the machine-learning algorithm.
We used model class reliance (MCR) implemented in iml r-package [74] to examine which spectral features are important to discriminate aspen from the other common tree species. The MCR is "the highest and lowest degree to which any well-performing model within a given class may rely on a variable of interest for prediction accuracy" [75]. The importance of each feature was assessed by calculating the increase in the model's classification error after permuting the feature and repeating the permutations 20 times to obtain more stable results. The results can be interpreted as follows. If permuting a feature's values increases the error, the feature can be considered "important", but if permutations leave the value unchanged, it is "unimportant". This is because in the first case, the model relied on the feature for the prediction, and in the latter, because the model ignored it [74]. In practical terms, the MCR value of 2 can be interpreted as heavy model reliance on the feature, while a value closer to 1 signifies less or no reliance [74].
The F1-score, confusion matrices and Cohen's kappa coefficients were calculated to measure the classification accuracy. F1 score considers harmonic mean of both precision and recall, equivalent to the user's (UA) and producer's accuracy (PA) [73], and calculated using following equation: the values range between 0 and 1, with the best value being 1 and the worst being 0.

Accuracy Assessment of the 2-Class Classification Model
The accuracy assessment of the model performance for a 2-class classification approach (aspen vs. other species) after feature selection is presented in Table 2. For the MicaSense dataset, similar results were achieved with only RGB features and a combination of RGB and MSP features (OA = 90.1%, Kappa = 0.76-0.77, F1-score for aspen = 82-84%). Model fitted with only MSP features produced slightly less accurate results (OA = 87.9%, Kappa = 0.72, F1-score for aspen = 81%). The feature selection did not find any features to eliminate from the models fitted with either RGB or MSP features, but for the RGB+MSP model some simplification was observed with four eliminated spectral features. In a similar fashion, the Sequoia dataset also yielded equally good results with models including only RGB features or a combination of RGB and MSP features (OA = 89.2-89.9%, Kappa = 0.70-0.73, F1-score for aspen = 77-79%). Again, accuracy metrics of the MSP model were considerably worse compared to the other two models (OA = 84.1%, Kappa = 0.57, F1-score for aspen = 68%). The feature selection algorithm eliminated three features from MSP and one feature from RGB+MSP feature sets.
The confusion matrices for the best SVM model using MicaSense and Sequoia datasets are shown on Table 3.  Table 4 shows the accuracy assessment of the model performance for a 4-class classification approach (all species as separate classes). In general, the models trained with all spectral data features provided the best results for both datasets (Kappa = 0.75-0.78, OA = 81.3-83.3% and F1-score of aspen = 80-86%). Similar to the 2-class models, the initial feature count was kept for MicaSense and Sequoia models fitted with only RGB features. The feature selection did have an impact on simplifying the models fitted with MSP and RGB+MSP features. Six spectral features were eliminated from the MicaSense MSP model, and seven from the Sequoia model with the same feature set. Eight features were eliminated from the MicaSense RGB+MSP model and four from the Sequoia RGB+MSP model. The confusion matrices for the best SVM model for MicaSense and Sequoia datasets (RGB+MSP features) are shown on Table 5.

Spectral Feature Importance
The permutation feature importance plots with the 10 most important features in the SVM models for the MicaSense and Sequoia datasets are shown in Figures 4 and 5. The importance is measured as the factor by which the model's classification error increases compared to the unpermuted classifications when the feature is permuted. Each permutation was repeated 20 times. Here, the horizontal line denotes the 5% and 95% quantiles of importance values, and point denotes median importance. When median classification error is close to 2 the model relies heavily on the feature. Different multispectral features had major importance for most of the species in the 4-class classification models with the MicaSense dataset. We did not find notable differences in feature importance of aspen and pine with the MicaSense dataset: classification error scores of the top-10 features are close to 1, signifying rather low model reliance on these features. For spruce, Gmean (mean value of the green band from PPC) showed heavy model reliance with a classification error close to 2, while RE_25 (25% percentile of red-edge band) also stood out from the other features. Moreover, the Gmean was in top-3 among most important features for birch, pine, and spruce. multispectral features played a less important role. The mean value of the blue band (Bmean) was the most important feature for all four species, with classification error close to or above 2 ( Figure 5). We also found that NIR_25 (25% percentile of near infrared band) and Rsd (standard deviation of red band extracted from PPC) played an important role in discriminating deciduous trees (aspen, birch) from coniferous trees (pine, spruce) in the Micasense and Sequoia datasets, respectively.

Discussion
In this study we tested the performance of spectral features for discriminating European aspen from the three main tree species (Scots Pine, Norway spruce, and birch) in mature boreal forest. For this we used PPC and multispectral orthomosaics acquired from (Bmean) was the most important feature for all four species, with classification error close to or above 2 ( Figure 5). We also found that NIR_25 (25% percentile of near infrared band) and Rsd (standard deviation of red band extracted from PPC) played an important role in discriminating deciduous trees (aspen, birch) from coniferous trees (pine, spruce) in the Micasense and Sequoia datasets, respectively.

Discussion
In this study we tested the performance of spectral features for discriminating European aspen from the three main tree species (Scots Pine, Norway spruce, and birch) in mature boreal forest. For this we used PPC and multispectral orthomosaics acquired from In contrast, the classification accuracy of the model trained with the Sequoia dataset relied much more on the spectral features extracted from PPC for all species, while the multispectral features played a less important role. The mean value of the blue band (Bmean) was the most important feature for all four species, with classification error close to or above 2 ( Figure 5). We also found that NIR_25 (25% percentile of near infrared band) and Rsd (standard deviation of red band extracted from PPC) played an important role in discriminating deciduous trees (aspen, birch) from coniferous trees (pine, spruce) in the Micasense and Sequoia datasets, respectively.

Discussion
In this study we tested the performance of spectral features for discriminating European aspen from the three main tree species (Scots Pine, Norway spruce, and birch) in mature boreal forest. For this we used PPC and multispectral orthomosaics acquired from UAVs. We investigated if including only the spectral data features into the classification models is enough to accurately discriminate aspen from pine, spruce, and birch.
The results show that the 2-class and 4-class SVM models perform well in separating aspen from the main tree species for both UAV datasets utilized in the study. We observed that the highest classification accuracy of aspen was slightly improved in the 4-class model, trained with RGB and MSP features, compared to the 2-class model with only RGB features included (Aspen's F1-score 86% vs. 84% for MicaSense and 80% vs. 79% for Sequoia datasets, respectively). Therefore, for the purpose of rapid and low-cost inventory of mature and old-growth aspen trees, a UAV platform equipped with only RGB sensor is sufficient. However, compared to lower-priced RGB sensors, MSP sensors are proven to be more stable over time and remain less affected by changes in environmental conditions (e.g., sunlight angle and cloud cover) due to their irradiance sensor [76][77][78]. Moreover, in the 4-class model, spruce is well-separated from pine, which makes the proposed method also suitable for applications in commercial forestry focusing on more valuable coniferous tree species.
We found that the most important spectral features in classifying aspen in the Mi-caSense dataset are different from those in the Sequoia dataset. In MicaSense, the best performing model relied more on the multispectral features compared to the best performing model built with Sequoia dataset, which relied more on the PPC features. Eight of the 10 most important features in MicaSense for aspen classification are multispectral ( Figure 4). It is also worth mentioning that the classification did not rely on any particular feature and all the spectral features performed quite equally. In contrast to the MicaSense dataset, in the classification model of aspen based on the Sequoia dataset, the three most important features were from PPC ( Figure 5), although different multispectral features had a positive impact on the model performance as well. The reason for that could be related to a difference in spatial resolution of the multispectral data. MicaSense data with the spatial resolution of 8.6-9.2 cm most likely was more sensitive to detect more detailed species-specific spectral information than Sequoia data with 13.5-14.5 cm.
It is challenging to compare the classification accuracy results with other studies due to, e.g., the different vegetation types, number of species, forest structure, and spatial distribution of the species. Compared to the recent European aspen review by Kivinen et al. [10], where aspen detection accuracy from various remote sensing methods ranged from 56% to 86% (user's accuracy) and from 24% to 71% (producer's accuracy), our results are promising. In our study we obtained the highest F1-score for Aspen = 86% and overall accuracy = 83.3%, which are close to accuracy results reported by Viinikka et al. [31] and Mäyrä et al. [37] utilizing the airborne hyperspectral and airborne laser scanning data in detecting European aspen in the same study area (F1-score for aspen 92%, overall accuracy = 84% and F1-score for aspen 91%, overall accuracy = 87%, respectively). Our study was done by including only the spectral data features into the classification model, and according to the previous studies, including height features from PPC, vegetation indices and texture features might improve the classification results [62,63].
There are several potential sources that bring uncertainty in our study. In general, the delineation accuracy of deciduous trees is a common problem in remote sensing [79,80]. Crowns of deciduous trees are often more complex than the crowns of conifers, which results in, e.g., multiple height peaks within single tree crowns [81,82]. In terms of datarelated challenges, one of the key issues is combining datasets with different spatial resolution. Here, the high-resolution PPC was fused with the multispectral mosaics with coarser spatial resolution for extracting the spectral features. In data fusion, the difference in spatial resolution often leads to a mismatch between data sources. The GSD of the multispectral data was twice as big as that of the RGB data used for the ITD algorithm (8.6-14.5 and 3.9-4.9 cm, respectively). Even though the resulting tree-crown segments were well aligned with the field-measured trees (see Figure 3), there were still inclusions of the tree-crown background into segments that may affect the spectral signature of the whole segment. One possibility to avoid this is to replace the multispectral mosaic with a multispectral point cloud, produced from high-resolution multispectral images, and implement the individual tree detection algorithm based on it, then filter the points based on the specific height to select only the points belonging to the tree crown and exclude the surroundings of the tree crown. Enhancing spatial resolution of the multispectral data would allow more detailed investigation of species-specific spectral attributes.
Another way to improve the aspen detection accuracy and separation from birch species is to consider the seasonal phenology response of deciduous tree species to spectral composition during the UAV data acquisition [83,84] when, for example, birches have earlier leaf flush in spring than aspen trees in boreal forests [85]. It has been previously studied that inclusion of the near-infrared information remains more sensitive to leaf development than just RGB-based information [86,87].
Detection of the smaller trees under the dominant canopy layer is challenging from PPC because passive aerial images only capture the outer canopy envelope without penetrating it as LiDAR sensors do. In our study, we focused on the mature and old-growth aspens, which are easier to detect. Our approach suits well for detecting aspen-associated biodiversity values, as old large-diameter aspens are ecologically valuable [10,23].
Aspen tree occurrences derived from UAV imagery enable estimates of the ability of the forest area to support viable populations of aspen-associated species. Such information is crucial, because some species may persist in the small remaining patches of host trees for some time but ultimately become threatened as the number of aspen trees and their connectivity further decrease in the landscape [26,88,89]. Integration of various species data with information of the occurrence of individual aspen trees and their spatial patterns in the forest landscape could significantly increase our understanding of the landscapelevel prerequisites for the occurrences of aspen-related species. Detecting aspen dynamics using multitemporal UAV imagery could also provide essential information for forest management and conservation purposes of future status of aspen-related species [10].
Our results highlight the potential of commonly used multispectral cameras to provide accurate information of boreal tree species at a local scale. Acquiring data over larger geographical regions with UAVs is often costly and time-consuming. Combined use of UAV data with other remote-sensing data with wider coverage but lower spatial resolution provides opportunities to significantly expand the extent of the mapped area. UAV-based orthoimages have been utilized as substitutes for labor-intensive traditional field surveys in estimating fractional coverage of target objects using optical satellite imagery [90][91][92]. Upscaling information of the aspen occurrence over wide geographical areas by using UAV-acquired imagery and high-resolution satellite imagery could provide a highly useful biodiversity indicator for assessing the state of biodiversity in boreal forests.

Conclusions
In this paper, we presented a workflow for a European aspen-recognition method based on individual tree detection approach using high-resolution photogrammetric point cloud and multispectral images, acquired from fixed-wing and multirotor UAVs. We compared the performance of a SVM classifier in combination with different spectral features derived from UAV data to discriminate aspen from birch, pine, and spruce species in boreal forests. The results of the study show that a combination of RGB and MSP features from PPC and multispectral orthomosaics provides the highest accuracy for aspen classification (F1-score = 86 %) from the dominant tree species in the study area (overall accuracy = 83.3 %). The proposed method can be used for rapid inventory of mature and old-growth aspen in desired areas for biodiversity assessment needs, but also for commercial forestry operations in boreal forests, where species-specific information is required, and provide valuable, precise, and low-cost information on demand compared to the traditional remote-sensing techniques.

Acknowledgments:
The authors wish to thank Aleksi Ritakallio and Max Stranden for assisting in field data collection. We also want to thank the Evo campus of the Häme University of Applied Science for providing accommodation and support during the fieldwork.

Conflicts of Interest:
The authors declare no conflict of interest.