Tree Crown Detection and Delineation in a Temperate Deciduous Forest from UAV RGB Imagery Using Deep Learning Approaches: Effects of Spatial Resolution and Species Characteristics

: The automatic detection of tree crowns and estimation of crown areas from remotely sensed information offer a quick approach for grasping the dynamics of forest ecosystems and are of great signiﬁcance for both biodiversity and ecosystem conservation. Among various types of remote sensing data, unmanned aerial vehicle (UAV)-acquired RGB imagery has been increasingly used for tree crown detection and crown area estimation; the method has efﬁcient advantages and relies heavily on deep learning models. However, the approach has not been thoroughly investigated in deciduous forests with complex crown structures. In this study, we evaluated two widely used, deep-learning-based tree crown detection and delineation approaches (DeepForest and Detectree2) to assess their potential for detecting tree crowns from UAV-acquired RGB imagery in an alpine, temperate deciduous forest with a complicated species composition. A total of 499 digitized crowns, including four dominant species, with corresponding, accurate inventory data in a 1.5 ha study plot were treated as training and validation datasets. We attempted to identify an effective model to delineate tree crowns and to explore the effects of the spatial resolution on the detection performance, as well as the extracted tree crown areas, with a detailed ﬁeld inventory. The results show that the two deep-learning-based models, of which Detectree2 (F1 score: 0.57) outperformed DeepForest (F1 score: 0.52), could both be transferred to predict tree crowns successfully. However, the spatial resolution had an obvious effect on the estimation accuracy of tree crown detection, especially when the resolution was greater than 0.1 m. Furthermore, Dectree2 could estimate tree crown areas accurately, highlighting its potential and robustness for tree detection and delineation. In addition, the performance of tree crown detection varied among different species. These results indicate that the evaluated approaches could efﬁciently delineate individual tree crowns in high-resolution optical images, while demonstrating the applicability of Detectree2, and, thus, have the potential to offer transferable strategies that can be applied to other forest ecosystems.


Introduction
Accurate tree crown detection and delineation are critical for compiling precise forest inventories and enabling the timely detection of forest dynamics required by various conservation strategies [1][2][3].Among various attempts to date, remote sensing techniques provide reliable approaches to obtain timely, accurate, and complete information and have been increasingly applied to tree crown detection and delineation [4][5][6][7][8].However, previous studies focusing on mapping tree crowns have generally involved the manual delineation and visual interpretation of remote sensing imagery.The aforementioned approach is laborious and time consuming and, therefore, may only be practical for small areas.Instead, automatic tree detection and delineation from remote sensing imagery can help to overcome these limitations.
Automatic tree detection and delineation from remote sensing imagery have been attempted in recent years, ranging from relatively simple image processing methods to rather complicated machine learning and deep-learning-based approaches [9,10].Among these, both the image processing and machine-learning-based methods may face difficulties in detecting dense tree crowns accompanied by complex backgrounds and the need to determine individual features, respectively [9,10].In comparison, the deep learning approach can yield information about a higher level and extract features from raw data by learning procedures rather than human designs, offering high levels of flexibility [11].
Recent advances in deep-learning-based tree crown detection and delineation rely heavily on convolutional neural networks (CNNs) to segment images or enhance treetop detection [12][13][14][15][16]. Nonetheless, they are more effective and capable of outperforming other approaches [17,18].DeepForest [19] and Detectree2 [20] are two recently developed CNN-rooted deep learning models used for the detection and delineation of tree crowns.Specifically, DeepForest was developed and pre-trained using data from the National Ecological Observatory Network (NEON) with an unsupervised, LiDAR-based algorithm and hand annotations of airborne RGB imagery to detect tree crowns using bounding boxes [19,21].On the other hand, Detectree2 was built on the Mask R-CNN, an end-to-end and self-training convolutional neural network [22], to recognize the irregular edges of individual tree crowns from airborne RGB imagery [20].The latter model can detect the specific edges of tree crowns and may, thus, provide information on tree crown areas as well.These two deep-learning-based models allow the automatic and accurate detection of tree crowns from accessed RGB imagery and have become representative tree detection tools.For instance, DeepForest has a wide range of applications for orchard trees and boreal forests [23][24][25][26], whereas Detectree2 has been primarily applied to study tropical forests [20,27].However, the use of these two methods has not yet been thoroughly investigated in temperate deciduous forests.
It is well known that the main issue with CNNs is that their application requires a large training set [28].Luckily, trained CNN models are highly transferable; the layer activation patterns learned by a CNN, stored in a single file, can be used to initialize the training of a new CNN and applied to a secondary task, a process termed transfer learning [28,29].The transfer learning method can, therefore, overcome the limitations of small datasets and facilitate the practical application of CNN techniques in cases where less data are available [30,31].The two aforementioned pre-trained models are reported to have the potential to offer a transferable means of prediction for tree detection and delineation [3,13].However, so far, no studies have tested whether these two methods can be transferred readily to deciduous forests characterized by closed and structurally complex canopies with obvious phenological changes and complicated species composition information.
In terms of base remote sensing data for tree detection and delineation, the unmanned aerial vehicle (UAV) platform may provide high spatial and temporal resolution imageries with lower operational costs and less complexity relative to other remote sensing platforms [32][33][34] and have, hence, been extensively used in forest precision management.Several studies have reported that tree crowns can be detected and delineated with promising accuracy by utilizing UAV-based image-capture techniques [13, [35][36][37], in which red-greenblue (RGB) imageries are gradually achieved to enable tree crown detection and delineation with feasible and low-cost features [38,39].As a result, the application of deep learning models based on CNNs applied to UAV-acquired RGB imageries has emerged as a prompt and affordable way of detecting and delineating tree crowns [35,[40][41][42].For example, Chadwick et al. [39] investigated the potential of Mask R-CNN for automatically delineating individual tree crowns from RGB images generated by UAVs in a conifer forest.Recently, Yu et al. [43] detected tree crowns using Mask R-CNN in a plantation forest (Chinese fir) with UAV-acquired RGB imagery.Unfortunately, to date, studies of tree crown detection and delineation from UAV-derived RGB imageries have largely been limited to a single species or forests with a uniform structure, such as coniferous forests [44][45][46].To the best of our knowledge, there have been relatively few studies considering tree crown delineation and crown area estimation from UAV-generated RGB images in deciduous forests with diverse and complex structures.Furthermore, the importance of the spatial resolution of the tree crown detection accuracy in deciduous forests using deep-learning-based methods has rarely been investigated, although several previous studies considering this have been carried out in coniferous forests or plantations [35,36].
The primary purpose of this study is, thus, to identify effective, deep-learning-based tree detection and delineation approaches from UAV-based RGB imagery in a dense and diverse, temperate deciduous forest.More specifically, the objectives are to: (1) evaluate the representative potentials of the DeepForest and Detectree2 models for tree crown detection and delineation in an alpine, temperate forest with complex topography and species compositions; (2) explore their performance in extracting the tree crown areas from RGB imagery with different spatial resolutions; and (3) reveal the effects of spatial resolution and canopy complexity on detection accuracy.

Study Area
This study was conducted on the Nakakawane site (138 • 06 E, 35 • 04 N), a temperate deciduous forest located at one of Shizuoka University's research forests in Japan (Figure 1).The climate of the area is a typical alpine, cold-temperate climate, with an average annual temperature of 16 • C and mean annual precipitation of 2500 mm [47,48].The forest is dominated by diverse deciduous species, such as Fagus crenata, Betula grossa, Carpinus tschonoskii, Stewartia monadelpha, Acer shirasawanum, Acer nipponicum, and Fraxinus lanuginosa.
in a plantation forest (Chinese fir) with UAV-acquired RGB imagery.Unfortunately date, studies of tree crown detection and delineation from UAV-derived RGB image have largely been limited to a single species or forests with a uniform structure, suc coniferous forests [44][45][46].To the best of our knowledge, there have been relatively studies considering tree crown delineation and crown area estimation from UAV-ge ated RGB images in deciduous forests with diverse and complex structures.Furtherm the importance of the spatial resolution of the tree crown detection accuracy in decidu forests using deep-learning-based methods has rarely been investigated, although sev previous studies considering this have been carried out in coniferous forests or pla tions [35,36].
The primary purpose of this study is, thus, to identify effective, deep-learning-ba tree detection and delineation approaches from UAV-based RGB imagery in a dense diverse, temperate deciduous forest.More specifically, the objectives are to: (1) eval the representative potentials of the DeepForest and Detectree2 models for tree crown tection and delineation in an alpine, temperate forest with complex topography and cies compositions; (2) explore their performance in extracting the tree crown areas f RGB imagery with different spatial resolutions; and (3) reveal the effects of spatial res tion and canopy complexity on detection accuracy.

Study Area
This study was conducted on the Nakakawane site (138°06′ E, 35°04′ N), a tempe deciduous forest located at one of Shizuoka University's research forests in Japan (Fig 1).The climate of the area is a typical alpine, cold-temperate climate, with an average nual temperature of 16 °C and mean annual precipitation of 2500 mm [47,48].The fo is dominated by diverse deciduous species, such as Fagus crenata, Betula grossa, Carp tschonoskii, Stewartia monadelpha, Acer shirasawanum, Acer nipponicum, and Fraxinus lan nosa.

Analysis Overview
To validate the accuracy of the algorithms for crown segmentation, we prepared a crown projection map in polygons for all trees in the canopy layer of the entire 1.5 ha study plot based on the following procedures: first, we produced a georeferenced orthophoto of the study plot, using UAV photographs acquired in September 2018 as a base map, and manually delineated the boundaries of all crowns in the field.We then digitized the fieldcorrected crown map and linked it with inventory data, including tree IDs, species names, and diameters at breast height.In total, 499 digitized tree crowns with corresponding, accurate inventory data were used in the further analysis.
The evaluation pipeline for individual tree crown detection and delineation from UAV RGB images in the deciduous forest, including the main steps and analysis, is summarized in Figure 2. The workflow included the preprocessing, model tuning, and evaluation of sections.In brief, the imagery data acquisition for the whole study area was conducted and processed into orthophotos, and the ground truth polygons from manually annotated tree crowns based on orthophotos were further assessed in the field, resulting in multiple datasets for model training and evaluation.Next, two deep-learning-based methods were introduced for crown detection and delineation, for which transfer learning was used to train a finer model in advance before both the pre-trained models and transfer-trained models were used, and the crown information was predicted.The influence of multiple spatial resolutions of UAV RGB imagery and canopy complexity for tree crown information detection and delineation was further evaluated.

Analysis Overview
To validate the accuracy of the algorithms for crown segmentation, we prepa crown projection map in polygons for all trees in the canopy layer of the entire 1 study plot based on the following procedures: first, we produced a georeferenced o photo of the study plot, using UAV photographs acquired in September 2018 as a map, and manually delineated the boundaries of all crowns in the field.We then digi the field-corrected crown map and linked it with inventory data, including tree IDs cies names, and diameters at breast height.In total, 499 digitized tree crowns with c sponding, accurate inventory data were used in the further analysis.
The evaluation pipeline for individual tree crown detection and delineation UAV RGB images in the deciduous forest, including the main steps and analysis, is marized in Figure 2. The workflow included the preprocessing, model tuning, and e ation of sections.In brief, the imagery data acquisition for the whole study area was ducted and processed into orthophotos, and the ground truth polygons from man annotated tree crowns based on orthophotos were further assessed in the field, resu in multiple datasets for model training and evaluation.Next, two deep-learning-b methods were introduced for crown detection and delineation, for which transfer lea was used to train a finer model in advance before both the pre-trained models and t fer-trained models were used, and the crown information was predicted.The influen multiple spatial resolutions of UAV RGB imagery and canopy complexity for tree c information detection and delineation was further evaluated.

Image Acquisition and Preprocessing
The UAV-based imagery of this study area was acquired on 18 May and 25 May using a DJI Zenmuse P1 (DJI, Shenzhen, China) mounted on a DJI Matrice 300 RTK rotor aircraft (DJI, Shenzhen, China).The image sensor in the Zenmuse P1 provid megapixels with an 8192×5490 image resolution.The flight patterns were program automatically by DJI Pilot to achieve an 85% forward overlap rate and 80% side ov rate with a 60 m flight height above the relative take-off point.To ensure and mai flight accuracy, the DJI D-RTK 2 (DJI, Shenzhen, China) high-precision GNSS mobil tion for Matrice 300 RTK was set at a fixed point and used to obtain highly acc

Image Acquisition and Preprocessing
The UAV-based imagery of this study area was acquired on 18 May and 25 May 2022 using a DJI Zenmuse P1 (DJI, Shenzhen, China) mounted on a DJI Matrice 300 RTK four-rotor aircraft (DJI, Shenzhen, China).The image sensor in the Zenmuse P1 provided 45 megapixels with an 8192 × 5490 image resolution.The flight patterns were programmed automatically by DJI Pilot to achieve an 85% forward overlap rate and 80% side overlap rate with a 60 m flight height above the relative take-off point.To ensure and maintain flight accuracy, the DJI D-RTK 2 (DJI, Shenzhen, China) high-precision GNSS mobile station for Matrice 300 RTK was set at a fixed point and used to obtain highly accurate location information in both vertical and horizontal directions.The imagery data were acquired on sunny and cloudless days, with a total of 1010 images being collected.All the images were continuously input into DJI Terra (DJI, Shenzhen, China) and processed with highquality parameters, generating two orthophotos of the study area with a 0.007 m original resolution, termed as 0518 and 0525 datasets, respectively.The 0518 dataset was set as a training dataset for model tuning, and the 0525 dataset, as well as its resampled images at resolutions of 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.10, 0.20, 0.30, 0.40, and 0.50 m, was used for predictions and evaluations.

Tree Crown Detection and Delineation Using DeepForest and Detectree2
The two open-sourced, deep-learning-based DeepForest and Detectree2 methods were used for individual tree crown detection and delineation.DeepForest is a Python package developed from a semi-supervised deep learning neural network using the NEON Airborne Observation Platform [19].In turn, DeepForest aims to detect the individual tree crown location from airborne RGB imagery and is easy to extend into different scenarios as it provides a pre-trained model in which users can conduct transfer training based on local datasets.The training data for transfer learning with the pre-trained model are shown in Figure 1a; the hyperparameters for model tuning were not changed in the transfer learning steps.Both the pre-trained model and transfer-trained model were used to conduct individual tree crown detection procedures and were evaluated at different image resolutions.
Detectree2 is built on Mask R-CNN, a Faster R-CNN [49] extension involving the inclusion of a new branch to perform instance segmentation [22].The Mask R-CNN stands out within CNN architectures and can obtain excellent results relative to other architectures for instance segmentation tasks.Similar to DeepForest, both the pre-trained and transfer-trained models were used for tree crown delineation, and no hyperparameters were changed in the transfer learning steps.In addition, for each predicted bounding box and polygon, a confidence score value (0-1) was returned by DeepForest and Detectree2.
The prediction results were further divided into a pre-trained group and a transfertrained group for each method; scales were obtained for all 15 resolutions.The location information for individual tree crowns was first evaluated and compared for the pre-trained versus transfer-trained results of each method before the comparison across the two models was conducted.Furthermore, the predicted results were connected to the most closed ground truth data using a nearest neighbor algorithm with a certain radius and then a simple linear regression model was applied to evaluate the tree crown areas.
In addition, four dominant species with enough samples were chosen to further evaluate the species-specific performance of transfer training by DeepForest and Detectree2.We also investigated the effects of topography on tree crown detection based on the slope information calculated from the five-meter resolution DEM (digital elevation model) data (The Geospatial Information Authority of Japan, GSI).

Accuracy Assessment
The detection accuracy of the tree crowns using both models was evaluated by the following metrics.The intersection over union (IoU), which is determined by the intersection area between the predicted and ground truth tree crowns divided by the sum of the area contained in both, was first used to assess the agreement between the predicted and ground truth tree crowns [35].The precision, recall, and F1 score [50] were further calculated at an IoU threshold of 0.5.Precision and recall represent the ratio of correctly detected tree crowns of the model detection and the test set, respectively.The F1 score describes the overall accuracy considering both the precision and recall.These three metrics were calculated from the true positive (TP, tree crown is correctly detected), false positive (FP, tree crown is erroneously detected), and false negative (FN, tree crown is omitted).The equations of precision, recall, and F1 score are defined as: where TP, FP, and FN represent true positive, false positive, and false negative, respectively.
Remote Sens. 2023, 15, 778 6 of 15 In addition, the tree crown areas extracted from the deep-learning-based methods were compared with the ground truth tree crown areas.Linear regression analysis was employed to describe the relationships between them, which were represented by the widely used statistical criteria of the coefficient of determination (R 2 ) and root-mean-square error (RMSE) and were calculated as: where y i and ŷi represent the reference, and estimated value y i and n indicate the average value and the number of samples, respectively.

Tree Crown Detection Using DeepForest and Detectree2: Pre-Trained vs. Transfer-Trained
The detailed assessment results for the tree crown detection and delineation of the DeepForest method from UAV-based RGB imagery are presented in Figure 3.The precision, recall, and F1 score of the pre-trained DeepForest tree crown detection were very low, with values of 0.18, 0.28, and 0.22, respectively.In comparison, the transfer-trained DeepForest tree crown detection exhibited significantly higher accuracy, with a precision of 0.59 and recall of 0.46.Furthermore, the F1 score of the transfer-trained DeepForest tree crown detection was 0.52 and, therefore, higher than that of the pre-trained DeepForest method.Figure 4 shows the detection accuracy for tree crowns using the Detectree2 method, including the precision, recall, and F1 score.Specifically, the pre-trained Detectree2 method for tree crown detection yielded a precision of 0.71, a recall of 0.42, and an F1 score of 0.53.The transfer-trained Detectree2 method had a relatively higher recall (0.50) and F1 score (0.57) than the pre-trained Detectree2 one, although the precision of the transfer-trained Detectree2 method (0.66) was slightly lower than that of the pre-trained Detectree2 one.

Accuracies of Tree Crown Detection Using Images with Different Spatial Resolutions
The effects of different image spatial resolutions on the detection accuracy of crowns using both pre-trained and transfer-trained DeepForest from UAV RGB ima are illustrated in Figure 5.In detail, the precision, recall, and F1 score of the pre-tra DeepForest method were low at 0.007 and 0.01 m resolutions but increased at a resolu of 0.02 m before varying slightly at resolutions ranging from 0.02 to 0.1 m.However, f resolutions of 0.1 to 0.5 m, the precision, recall, and F1 score declined rapidly and con uously.For the transfer-trained DeepForest method, the precision ranged from 0.49 to

Accuracies of Tree Crown Detection Using Images with Different Spatial Resolutions
The effects of different image spatial resolutions on the detection accuracy of tree crowns using both pre-trained and transfer-trained DeepForest from UAV RGB imagery are illustrated in Figure 5.In detail, the precision, recall, and F1 score of the pre-trained DeepForest method were low at 0.007 and 0.01 m resolutions but increased at a resolution of 0.02 m before varying slightly at resolutions ranging from 0.02 to 0.1 m.However, from resolutions of 0.1 to 0.5 m, the precision, recall, and F1 score declined rapidly and continuously.For the transfer-trained DeepForest method, the precision ranged from 0.49 to 0.61 within resolution ranges of 0.007 to 0.5 m, with the highest and lowest precisions noted for the 0.01 and 0.3 m resolutions.The corresponding recall and F1 score of the transfertrained DeepForest method decreased more or less when the resolution exceeded 0.05 m.
The accuracy (precision, recall, and F1 score) of both pre-trained and transfer-trained Detectree2 for tree crown detection at resolutions ranging from 0.007 to 0.5 m is shown in Figure 6.The precision, recall, and F1 score of the pre-trained Detectree2 method exhibited similar trends, with the values varying with different extents from the resolution of 0.007 to 0.1 m (precision: 0.67-0.71,recall: 0.39-0.45,F1 score: 0.49-0.55); the values then decreased continuously between resolutions of 0.1 and 0.5 m.Moreover, the precision, recall, and F1 score were relatively greater from 0.007 to 0.05 m resolutions, followed by resolutions of 0.06 to 0.1 m and ones of 0.2 to 0.5 m.For the transfer-trained Detectree2 method, the precision ranged from 0.62 to 0.66 at resolutions of 0.007 to 0.08 m, accompanied by the recall ranging from 0.49 to 0.52 and F1 score ranging from 0.55 to 0.58.Then, the precision decreased continuously at resolutions of 0.09 to 0.5 m.Similarly, the recall and F1 score also declined substantially at resolutions from 0.09 to 0.5 m.On the other hand, the recall and F1 score of the transfer-trained Detectree2 method were greater than those of the pre-trained Detectree2 one for the tree crown detection at fine resolutions.
within resolution ranges of 0.007 to 0.5 m, with the highest and lowest precisions noted for the 0.01 and 0.3 m resolutions.The corresponding recall and F1 score of the transfertrained DeepForest method decreased more or less when the resolution exceeded 0.05 m.The accuracy (precision, recall, and F1 score) of both pre-trained and transfer-trained Detectree2 for tree crown detection at resolutions ranging from 0.007 to 0.5 m is shown in Figure 6.The precision, recall, and F1 score of the pre-trained Detectree2 method exhibited similar trends, with the values varying with different extents from the resolution of 0.007 to 0.1 m (precision: 0.67-0.71,recall: 0.39-0.45,F1 score: 0.49-0.55); the values then decreased continuously between resolutions of 0.1 and 0.5 m.Moreover, the precision, recall, and F1 score were relatively greater from 0.007 to 0.05 m resolutions, followed by resolutions of 0.06 to 0.1 m and ones of 0.2 to 0.5 m.For the transfer-trained Detectree2 method, the precision ranged from 0.62 to 0.66 at resolutions of 0.007 to 0.08 m, accompanied by the recall ranging from 0.49 to 0.52 and F1 score ranging from 0.55 to 0.58.Then, the precision decreased continuously at resolutions of 0.09 to 0.5 m.Similarly, the recall and F1 score also declined substantially at resolutions from 0.09 to 0.5 m.On the other hand, the recall and F1 score of the transfer-trained Detectree2 method were greater than those of the pre-trained Detectree2 one for the tree crown detection at fine resolutions.

Estimation of Tree Crown Area Using Detectree2
In addition, the tree crown areas estimated with the pre-trained and transfer-trained Detectree2 methods were evaluated using the reference tree crown areas that were measured during the field survey (Figure 7).The tree crown areas varied from 6.92 to 174.52 m 2 for the reference tree crown areas, from 12.18 to 184.40 m 2 for the extracted tree crown areas of pre-trained Detectree2, and from 6.61 to 150.75 m 2 for the extracted tree crown

Estimation of Tree Crown Area Using Detectree2
In addition, the tree crown areas estimated with the pre-trained and transfer-trained Detectree2 methods were evaluated using the reference tree crown areas that were measured during the field survey (Figure 7).The tree crown areas varied from 6.92 to 174.52 m 2 for the reference tree crown areas, from 12.18 to 184.40 m 2 for the extracted tree crown areas of pre-trained Detectree2, and from 6.61 to 150.75 m 2 for the extracted tree crown areas of transfer-trained Detectree2.The relationship between the tree crown areas from the reference and the pre-trained Detectree2 yielded an R 2 of 0.68 and an RMSE of 6.64 m 2 (Figure 7a).Obviously, the transfer-trained Detectree2 method showed a relatively better performance than the pre-trained Detectree2 one with an R 2 of 0.71 and RMSE of 4.75 m 2 (Figure 7b).

Performance of Both Models for Detecting Crown in Terms of Different Species and Topography
The detection accuracies obtained with the transfer-trained DeepForest and D tree2 methods for the crown detection of different tree species were investigated (Fi 9).A total of four tree species with enough samples were considered, namely, Acer ponicum, Acer shirasawanum, Betula grossa, and Fraxinus lanuginose.
The accuracy varied dramatically among these species, irrespective of whethe transfer-trained DeepForest or Detectree2 method was used.For the transfer-tra DeepForest method, A. nipponicum exhibited the highest overall accuracy (precision = recall = 0.43, F1 score = 0.41), followed by B grossa (precision = 0.48, recall = 0.40, F1 s = 0.38), while A. shirasawanum and F. lanuginose yielded poor accuracies with an F1 s of less than 0.30.Nevertheless, with the transfer-trained Detectree2 method, A. s sawanum had the best prediction, with an F1 score of 0.51 (precision = 0.52, recall = 0 The accuracy of the tree crown area estimation using the pre-trained and transfer-trained Detectree2 methods varied with different resolutions (Figure 8).The R 2 computed between the measured and predicted crown areas ranged from 0.47 to 0.76 (RMSE: 4.91-14.27m 2 ) for pre-trained Detectree2 and from 0.19 to 0.76 (RMSE: 3.26-15.07m 2 ) for transfer-trained Detectree2 with resolutions of 0.001 to 0.5 m.Higher R 2 values were observed at resolutions of 0.01 to 0.1 m, along with lower RMSE values, for the tree crown area estimation.

Performance of Both Models for Detecting Crown in Terms of Different Species and Topography
The detection accuracies obtained with the transfer-trained DeepForest and D tree2 methods for the crown detection of different tree species were investigated (Fi 9).A total of four tree species with enough samples were considered, namely, Acer

Performance of Both Models for Detecting Crown in Terms of Different Species and Topography
The detection accuracies obtained with the transfer-trained DeepForest and Detectree2 methods for the crown detection of different tree species were investigated (Figure 9).A total of four tree species with enough samples were considered, namely, Acer nipponicum, Acer shirasawanum, Betula grossa, and Fraxinus lanuginose.In addition, we investigated the confidence score of the transfer-trained DeepFo and Detectree2 methods for different slopes (Figure 10).The mean confidence scores u transfer-trained DeepForest ranged from 0.40 to 0.56, accompanied by standard de tions (sd) of 0.14 to 0.23.In comparison, the confidence scores of transfer-trained D tree2 were much higher, with the mean values exceeding 0.67, in which they were low slopes of 15-20° and 40-45°.

Performance of DeepForest and Detectree2 for Detecting Tree Crowns in Deciduous For with Complex Species Compositions and Topographical Conditions
This study focused on the full evaluation and comparison of the application transferability of two commonly used, deep-learning-based CNN tree crown detec and delineation approaches in a dense and diverse deciduous forest using very-high olution, UAV-derived imagery.Our results demonstrated that the DeepForest and D tree2 methods can be successfully transferred to deciduous forests for the detection of crowns, taking advantage of UAV-based RGB images with precisions of 0.59 (recall: F1 score: 0.52) and 0.66 (recall: 0.50; F1 score: 0.57), respectively.The accuracy of these transferred models was relatively lower than the results reported for crown detectio Fromm et al. [35] and Chadwick et al. [39], who considered coniferous forest areas u UAV-derived RGB images, yielding a precision greater than 0.80.Generally speaking, The accuracy varied dramatically among these species, irrespective of whether the transfer-trained DeepForest or Detectree2 method was used.For the transfer-trained DeepForest method, A. nipponicum exhibited the highest overall accuracy (precision = 0.58, recall = 0.43, F1 score = 0.41), followed by B grossa (precision = 0.48, recall = 0.40, F1 score = 0.38), while A. shirasawanum and F. lanuginose yielded poor accuracies with an F1 score of less than 0.30.Nevertheless, with the transfer-trained Detectree2 method, A. shirasawanum had the best prediction, with an F1 score of 0.51 (precision = 0.52, recall = 0.50), while A. nipponicum was poorly predicted with the lowest accuracy (precision = 0.07, recall = 0.25, F1 score = 0.11).The B grossa and F. lanuginose species were moderately delineated, as indicated by the same F1 score of 0.40.
In addition, we investigated the confidence score of the transfer-trained DeepForest and Detectree2 methods for different slopes (Figure 10).The mean confidence scores using transfer-trained DeepForest ranged from 0.40 to 0.56, accompanied by standard deviations (sd) of 0.14 to 0.23.In comparison, the confidence scores of transfer-trained Detectree2 were much higher, with the mean values exceeding 0.67, in which they were low for slopes of 15-20  In addition, we investigated the confidence score of the transfer-trained DeepFore and Detectree2 methods for different slopes (Figure 10).The mean confidence scores usin transfer-trained DeepForest ranged from 0.40 to 0.56, accompanied by standard devia tions (sd) of 0.14 to 0.23.In comparison, the confidence scores of transfer-trained Dete tree2 were much higher, with the mean values exceeding 0.67, in which they were low fo slopes of 15-20° and 40-45°.This study focused on the full evaluation and comparison of the application and transferability of two commonly used, deep-learning-based CNN tree crown detection and delineation approaches in a dense and diverse deciduous forest using very-highresolution, UAV-derived imagery.Our results demonstrated that the DeepForest and Detectree2 methods can be successfully transferred to deciduous forests for the detection of tree crowns, taking advantage of UAV-based RGB images with precisions of 0.59 (recall: 0.46; F1 score: 0.52) and 0.66 (recall: 0.50; F1 score: 0.57), respectively.The accuracy of these two transferred models was relatively lower than the results reported for crown detection by Fromm et al. [35] and Chadwick et al. [39], who considered coniferous forest areas using UAV-derived RGB images, yielding a precision greater than 0.80.Generally speaking, heterogeneous forest conditions, for example, those involving diverse species and tree shapes, have a negative influence on tree crown detection, as reported in previous studies [38,51].Nevertheless, the results obtained here were better than those associated with the detection of other broadleaf species [38], indicating that these two transfer-trained methods have the capability to automatically and accurately detect tree crowns in temperate deciduous forests.
The results of this study further demonstrated that Detectree2 is better at recognizing tree crowns than DeepForest, revealing a strong generalization ability for tree crown detection and delineation.Mask R-CNN is commonly employed in the Detectree2 method to conduct instance segmentation by integrating both object detection tasks and semantic segmentation tasks [22].Previous studies have demonstrated that Mask R-CNN is a stateof-the-art model among CNN architectures, and an excellent performance for the detection of tree crowns has recently been reported [13,16]; our results agree well with those of the abovementioned studies.
The performance for tree crown detection differed across different tree species, which may have been attributed to the distinctive shapes of the species.As reported in previous studies [51,52], the accuracy of tree crown detection depends on the tree crown shape.The Acer shirasawanum species, which generally has spread-out crowns, had the highest overall accuracy (F1 score = 0.51) when using the transfer-trained Detectree2 method, indicating the potential of Detectree2 for detecting broad tree crowns.However, this model predicted Acer nipponicum poorly, with an extremely low accuracy.These two species belong to the same family and genus but have different morphological characteristics [53,54], such as the diameter at breast height, which somewhat influenced the detection accuracy of tree crowns using UAV RGB imagery.Furthermore, the study of Budianti et al. [53] revealed that the phenological transition dates of these two species are different, and such differences in phenological information may also have affected the accuracy of their crown detection.
As expected, we found that topographic characteristics have effects on the detection accuracy of tree crowns, which is in line with the observations of Khosravipour et al. [55] and Nie et al. [56], who carried out treetop detection using canopy height models derived from LiDAR.Alexander et al. [57] also found that topography influences tree detection and height estimations from LiDAR canopy height models in tropical forests.However, a general rule of slope effects on tree crown detection accuracy was unable to be achieved in this study, and further studies are required to ascertain the influence of slope on tree crown detection accuracy.

Effects of the Spatial Resolutions of UAV Images on Tree Crown Detection
The results obtained in this study suggest that the image spatial resolution has an obvious influence on tree crown detection and delineation from UAV-acquired RGB imagery when using deep-learning-based methods.The Detectree2 method, which performed best for tree crown detection from UAV-based RGB imagery, had a better accuracy (between 0.007 and 0.1 m), which was noticeably higher than the accuracy obtained with resolutions exceeding 0.1 m.This implies that the Detectree2 method exhibits a good predictive ability for tree crown detection when the image resolution is high.The results are consistent with those of previous studies, which showed that a higher spatial resolution generally improves the detection accuracy for CNN-based models.For example, Fromm et al. [35] concluded that an image resolution of 0.3 cm yielded the highest average precision (0.81) for the detection of conifer seedlings when compared to resolutions of 1.5, 2.7, and 6.3 cm.There was no significant difference in accuracy between the resolutions of 0.007 and 0.1 m, as indicated in this study.
In addition, the accuracy of the Detectree2 method declined when the resolution exceeded 0.1 m, and it then had a poor predictive performance, implying that the detection accuracy of tree crowns was impacted by the coarse spatial resolution of the image.The study of Yin and Wang [58] suggested that a 0.25 m resolution was the optimal choice for the detection of individual mangrove crowns from UAV-based LiDAR data using the seeded region growing (SRG) algorithm and marker-controlled watershed segmentation (MCWS) algorithm when compared to resolutions of 0.10, 0.50, and 1 m.Furthermore, Miraki et al. [36] indicated that the highest overall accuracy for the delineation of individual tree crowns using region growing (RG) and inverse watershed segmentation (IWS) was achieved at a spatial resolution of 100 cm when considering resolutions ranging from 5 to 140 cm.One possible reason for the differences between these studies could be attributed to the employed data sources and predictive methods.

Estimation of Tree Crown Areas
As for the tree crown area determination, Dong et al. [59] estimated a tree canopy area with R 2 values of 0.87 and 0.81 for apple trees and pear trees, respectively, using imageprocessing-based algorithms from high-resolution UAV standard RGB images in an orchard.Mu et al. [60] also obtained very good results for tree crown area estimation using UAV RGB imagery of peach trees.Nevertheless, these studies were conducted on specific species in an orchard with a simple structure using image processing techniques.Alternatively, the best performing Detectree2 method has the advantage of recognizing tree crowns by delineating irregular tree crown shapes and can, thus, be used to distinguish between adjacent tree crowns, with the potential to be further applied to extract tree crown areas.Our results indicate that the tree crown areas could be assessed with both the pre-trained and transfer-trained Detectree2 methods, with R 2 values of 0.68 and 0.71, respectively.However, our results were inferior to those of Braga et al. [12], who reported that the relationship between the tree crown area extracted from Mask R-CNN delineation and an evaluation set had an R 2 of 0.93, based on high-resolution satellite images of tropical forests.Even so, this study also achieved promising results regarding deciduous forests, again indicating the robustness of deep-learning-based methods through Mask R-CNN when estimating tree crown areas.Furthermore, the transfer-trained Detectree2 method performed better than the pre-trained Detectree2 one for the extraction of tree crown areas, indicating that the transfer-trained Detectree2 method had a strong ability and potential for estimating the area of tree canopies in temperate deciduous forests.Additionally, the image resolution also affected the accuracy of crown area estimation, in particular when the resolution was greater than 0.1 m.

Limitations and Perspectives
This study challenged the automatic detection and delineation of tree crowns in a temperate deciduous forest which is densely linked.Previous studies have demonstrated that detecting and delineating tree crowns from a closed canopy may result in uncertainty and errors in the predictive ability when compared to areas with isolated trees or uniformly planted and distributed trees [16].This study also showed that image resolution has an important influence on the accuracy of tree crown detection and delineation using deeplearning-based methods.Moreover, we suggest that the edges of tree crowns are not clear and can decrease a method's prediction accuracy.
To improve the estimation accuracy of deep-learning-based methods, future studies should, on the one hand, take full advantage of the available information contained in high-resolution UAV imagery, such as textural information.On the other hand, this study was conducted in a temperate deciduous forest which exhibited obvious phenology signals.As a result, the phenological variability of individual trees and/or adjacent trees should be exploited as it could increase the detection and delineation accuracy of deep-learning-based methods from UAV-acquired RGB imagery.Future research in this direction could improve individual tree crown delineation from high-resolution remote sensing imagery.

Conclusions
The evaluation of deep-learning-based methods for the automatic detection and delineation of tree crowns using UAV-based RGB imagery in an alpine, temperate deciduous forest indicated that the initial training on UAV RGB imagery for pre-trained, deep-learningbased models improved the detection results, in which the transfer-trained Detectree2 method was more suitable and robust for automatically delineating individual tree crowns in temperate deciduous forests.This method exhibited a relatively good and stable performance for tree crown detection and crown area estimation at fine resolutions.This study finally confirmed and highlighted that deep-learning-based methods could represent a powerful tool for tree crown detection and serve as a foundation for the automated monitoring of forest ecosystems when high-resolution UAV images are available.

Figure 1 .
Figure 1.The location of the study area: (a) the base data used for transfer training; (b) the loca of the Nakakawane site, Japan.Figure 1.The location of the study area: (a) the base data used for transfer training; (b) the location of the Nakakawane site, Japan.

Figure 1 .
Figure 1.The location of the study area: (a) the base data used for transfer training; (b) the loca of the Nakakawane site, Japan.Figure 1.The location of the study area: (a) the base data used for transfer training; (b) the location of the Nakakawane site, Japan.

Figure 2 .
Figure 2. Flowchart of the main steps and analysis for the evaluation of deep-learning-based ods for tree crown detection and delineation.

Figure 2 .
Figure 2. Flowchart of the main steps and analysis for the evaluation of deep-learning-based methods for tree crown detection and delineation.

7 Figure 3 .
Figure 3.The tree crown detection of the pre-and transfer-trained DeepForest methods der from UAV-based RGB imagery in the studied temperate, deciduous forest.The green and or bounding boxes represent the predicted and ground truth tree crowns.

Figure 3 .
Figure 3.The tree crown detection of the pre-and transfer-trained DeepForest methods derived from UAV-based RGB imagery in the studied temperate, deciduous forest.The green and orange bounding boxes represent the predicted and ground truth tree crowns.

Figure 3 .
Figure 3.The tree crown detection of the pre-and transfer-trained DeepForest methods der from UAV-based RGB imagery in the studied temperate, deciduous forest.The green and or bounding boxes represent the predicted and ground truth tree crowns.

Figure 4 .
Figure 4.The tree crown detection of the pre-and transfer-trained Detectree2 methods derived UAV-based RGB imagery in the studied temperate, deciduous forest.The green and orange bo ing outlines indicate the predicted and ground truth tree crowns.

Figure 4 .
Figure 4.The tree crown detection of the pre-and transfer-trained Detectree2 methods derived from UAV-based RGB imagery in the studied temperate, deciduous forest.The green and orange bounding outlines indicate the predicted and ground truth tree crowns.

Figure 5 .
Figure 5.The effects of spatial resolution in the evaluation of tree crown detection from UAV-based RGB imagery using the pre-and transfer-trained DeepForest methods as illustrated by the precision (a), recall (b), and F1 score (c).

Figure 5 . 16 Figure 6 .
Figure 5.The effects of spatial resolution in the evaluation of tree crown detection from UAV-based RGB imagery using the pre-and transfer-trained DeepForest methods as illustrated by the precision (a), recall (b), and F1 score (c).Remote Sens. 2023, 15, 778 9 of 16

Figure 6 .
Figure 6.The effects of spatial resolution on the evaluation of tree crown detection from UAV-based RGB imagery using the pre-and transfer-trained Detectree2 methods as illustrated by the precision (a), recall (b), and F1 score (c).

Figure 7 .
Figure 7. Relationships between the measured tree crown areas and the predicted tree crown from the pre-trained (a) and transfer-trained (b) Detectree2 methods.The gray, solid line repre the 1:1 line.

Figure 8 .
Figure 8.The R 2 (coefficient of determination) (a) and RMSE (root-mean-square error) (b) of th crown area estimation using the pre-trained and transfer-trained Detectree2 methods.

Figure 7 .
Figure 7. Relationships between the measured tree crown areas and the predicted tree crown areas from the pre-trained (a) and transfer-trained (b) Detectree2 methods.The gray, solid line represents the 1:1 line.

Figure 7 .
Figure 7. Relationships between the measured tree crown areas and the predicted tree crown from the pre-trained (a) and transfer-trained (b) Detectree2 methods.The gray, solid line repre the 1:1 line.

Figure 8 .
Figure 8.The R 2 (coefficient of determination) (a) and RMSE (root-mean-square error) (b) of th crown area estimation using the pre-trained and transfer-trained Detectree2 methods.

Figure 8 .
Figure 8.The R 2 (coefficient of determination) (a) and RMSE (root-mean-square error) (b) of the tree crown area estimation using the pre-trained and transfer-trained Detectree2 methods.

Figure 10 .
Figure 10.Confidence scores for different slopes using the transfer-trained DeepForest (a) and tectree2 (b) methods.

Figure 9 .
Figure 9.The specific accuracy of tree crown detection for different tree species using the transfertrained DeepForest (a) and Detectree2 (b) methods.AN, AS, BG, and FL represent Acer nipponicum, Acer shirasawanum, Betula grossa, and Fraxinus lanuginose, respectively.

Figure 9 .
Figure 9.The specific accuracy of tree crown detection for different tree species using the transfe trained DeepForest (a) and Detectree2 (b) methods.AN, AS, BG, and FL represent Acer nipponicum Acer shirasawanum, Betula grossa, and Fraxinus lanuginose, respectively.

Figure 10 .
Figure 10.Confidence scores for different slopes using the transfer-trained DeepForest (a) and D tectree2 (b) methods.

1 .
Performance of DeepForest and Detectree2 for Detecting Tree Crowns in Deciduous Forests with Complex Species Compositions and Topographical Conditions This study focused on the full evaluation and comparison of the application an

Figure 10 .
Figure 10.Confidence scores for different slopes using the transfer-trained DeepForest (a) and Detectree2 (b) methods.

1 .
Performance of DeepForest and Detectree2 for Detecting Tree Crowns in Deciduous Forests with Complex Species Compositions and Topographical Conditions • and 40-45 • .