An Operational Workflow of Deciduous-Dominated Forest Species Classification : Crown Delineation , Gap Elimination , and Object-Based Classification †

Recent advances in remote sensing technology provide sufficient spatial detail to achieve species-level classification over large vegetative ecosystems. In deciduous-dominated forests, however, as tree species diversity and forest structural diversity increase, the frequency of spectral overlap between species also increases and our ability to classify tree species significantly decreases. This study proposes an operational workflow of individual tree-based species classification for a temperate, mixed deciduous forest using three-seasonal WorldView images, involving three steps of individual tree crown (ITC) delineation, non-forest gap elimination, and object-based classification. The process of species classification started with ITC delineation using the spectral angle segmentation algorithm, followed by object-based random forest classifications. A total of 672 trees was located along three triangular transects for training and validation. For single-season images, the late-spring, mid-summer, and early-fall images achieve the overall accuracies of 0.46, 0.42, and 0.35, respectively. Combining the spectral information of the early-spring, mid-summer, and early-fall images increases the overall accuracy of classification to 0.79. However, further adding the late-fall image to separate deciduous and coniferous trees as an extra step was not successful. Compared to traditional four-band (Blue, Green, Red, Near-Infrared) images, the four additional bands of WorldView images (i.e., Coastal, Yellow, Red Edge, and Near-Infrared2) contribute to the species classification greatly (OA: 0.79 vs. 0.53). This study gains insights into the contribution of the additional spectral bands and multi-seasonal images to distinguishing species with seemingly high degrees of spectral overlap.

automated individual-tree segmentation and classification tools is paramount. While other remote sensing products like LiDAR can help with structural attributes (i.e., height, volumes, and densities) of forest inventories, individual tree species remains one of the most important forest attributes for tactical forest management [1][2][3]. Accurate information on the species identity of individual trees, or groups of trees, is critical for sustainable forest practices from both the economic and ecological perspectives [4]. Knowing what species are present is important in determining which forest products can be recovered from a given stand. Knowing how those trees are distributed within a stand can inform forest managers about the type of management to employ, which can in turn have significant impacts on profitability [2]. Beyond the economic considerations, tree species maps provide information about tree species diversity [4]. Whether the goal is to manage the forest for economic gain, biodiversity needs, or both, the ability to automatically classify individual tree species in a forest ecosystem is extremely valuable.
In very high spatial resolution images, a tree crown may contain many pixels with similar spectral features. Because of the increasing availability of low cost high spatial resolution optical imagery, individual tree crown (ITC) analysis has become increasingly feasible for tree species classification studies and forest management [5,6], and many ITC algorithms have been developed. As reviewed by Ke and Quackenbush [7], most of the algorithms for ITC delineation can be categorized into three main methods: (1) valley following methods delineating tree crowns by following local minima in the shaded regions between crowns; (2) region growing methods first identifying local maxima and then expanding the corresponding polygon to the crown boundary; (3) watershed segmentation methods viewing pixel values as the topographic relief and finding the lines that run along the tops of ridges. The entire image can be divided into a series of catchments (i.e., ITCs) as influential zones for each local minimum. Compared to the other two methods, watershed segmentation is more sensitive to the edge information of the tree crown [8]. Under this context, watershed segmentation is believed to have the capacity to detect indistinct edges of deciduous crowns [1,7] and has therefore been increasingly used to delineate deciduous species [8][9][10].
To map tree species from ITCs, many studies have demonstrated that species-specific differences in crown structure can be effectively captured using the average pixel values within an ITC, despite within-crown spectral variation. For example, for a conical coniferous crown, the sunlit pixels usually appear at the region exposed to the sun while the other pixels appear relatively dark. For a large and flat deciduous crown, huge within-crown brightness variation is usually attributed to the non-conical crown shape [7]. In addition to the average pixel value, ITC analysis also commonly employs information on texture, contextual, shape/geometric features for species classification [11,12]. As a result, higher accuracies have almost always been reported when comparing individual tree-based species classification to traditional pixel-based classification [13][14][15][16][17]. However, the accuracy of individual tree-based species classification depends strongly upon 1) the quality of automated ITC delineation (i.e., the goodness-of-fit between segmented and real ITCs) which, despite its rapid development, still requires further research, particularly in deciduous stands [8,18,19]; and (2) the selection of appropriate features for classifying ITCs into different species. Random Forest (RF) as a powerful machine learning classification tool was found to be able to identify the best performing features and build a robust classification tree quickly [16,20]. Compared to traditional classifiers (e.g., maximum likelihood classifier), RF as a non-parametric classifier is widely used in the forestry community [21][22][23].
The recently launched satellites, such as WorldView-2 and Worldview-3, offer very high spatial resolution and more spectral bands than traditional four-band (i.e., Blue, Green, Red, and Near-Infrared (Near-IR) bands) imaging sensors. Both WorldView-2 and WorldView-3 have vegetation-tailored spectral bands (i.e., Yellow and Red Edge), which are more sensitive to subtle differences in biochemistry between plant species [24,25]. These bands have proven useful for separating tree species especially when there is substantial spectral overlap in the traditional four bands [16,26,27]. In addition to using more spectral bands for tree species identification, the use of seasonal time series imagery has also proven useful [28][29][30][31]. Key et al. [28] found that classification using four dates of four-band aerial images yielded the best accuracy when identifying four deciduous species, including popular, red oak, white oak, and maple. Hill et al. [29] indicated a springtime image acquired in late April would best capture inter-species differences in the green-up phase, but image acquisition time was not always optimum due to many logistical considerations. To overcome this limitation, the authors demonstrated that a combination of early-spring, mid-summer, and late-fall images was able to achieve better accuracy of six overstorey species classification than any single-seasonal image even though the timing of image collection was suboptimal [29]. Pipkins et al. [30] utilized ten phenological seasons of RapidEye images to identify seven coniferous and deciduous species and suggested that the first-spring, full-spring, and early-summer images played most important roles in accuracy improvement due to individual species differences in tree phenology. Their results were in agreement with the previous studies, suggesting that a multi-seasonal approach to tree species classification has the potential to achieve better species classification results.
Despite the promise of increased accuracy, acquiring multi-seasonal high spatial resolution images is not usually possible and cost-effective over large areas. Consequently, few studies have assessed individual tree species discrimination by utilizing multi-seasonal high spatial resolution images with more than four traditional bands. In this study, we propose an operational workflow of individual tree-based species classification using three-seasonal WorldView images, involving three steps -ITC delineation, non-forest gap elimination, and object-based classification. In specific, we aim to answer the following questions: (1) What level of classification accuracy can be achieved by individual images or by a combination of multi-seasonal images? (2) How does each of the four new multispectral bands in WorldView image (i.e., Coastal, Yellow, Red Edge, and Near-IR2) contribute to tree species classification; and (3) How does the quality of ITC delineation impact the accuracy of species classification at the individual tree level?

Study Area
foliage spectra can potentially enhance inter-species separability for tree species identification due to differences in tree phenology.

Field Data for Training and Test Classifications
During the summer of 2015, individual trees of seven dominant species in Haliburton Forest were located along three triangular transects (i.e., green, yellow, blue triangle with side length of 400 m in Figure 1) using a highly accurate Trimble GeoExplorer (Trimble Navigation Limited, Sunnyvale, CA, USA). The triangular transect sampling method is a standard cruise method in forestry [35]. This sampling method is very efficient because the triangular shape makes it not only possible to cover a broad topographic range but also to start and end at the same point to avoid wasted effort or tuck logistics. Trees along the transects and their surroundings (~50 m) were recorded if they exhibited healthy crown and were in dominant or co-dominant crown positions visible to the sensor. From reference perspective, the triangular transect sampling method is also beneficial because once the major trees are identified at the vertices, the tree point accuracy can be determined based on its deviation from the edges of the triangular.
A total of 672 individual trees were located: sugar maple (Acer saccharum, abbreviated as Mh hereafter: 276), hemlock (Tsuga canadensis, abbreviated as He hereafter: 95), beech (Fagus grandifolia, abbreviated as Be hereafter: 94), yellow birch (Betula alleghaniensis, abbreviated as By hereafter: 89), red maple (Acer rubrum, abbreviated as Mr hereafter: 64), balsam fir (Abies balsamea, abbreviated as Bf hereafter: 28), and red oak (Quercus rubra, abbreviated as Or hereafter: 26). The abbreviations of the tree pieces follow the codes used in the Ontario Land Survey Data and these codes are commonly used by foresters in Ontario, Canada. The composition of species across the three transects, and in the forest in general, is dominated by sugar maple and so the less common species were sampled preferentially and opportunistically in an attempt to achieve a more balanced sample size distribution across the seven species. Table 1 summarizes the ground-measured individual tree species in the three transects. It should also be noted that although these seven dominant species represent the majority of species found at our field site, several other species are present in the study area including white pine, white ash, eastern white cedar, trembling aspen, white spruce, and black cherry. To ensure the multi-seasonal images aligned with each other and matched the GPS measured trees well, we recorded 10 road intersections ( Figure 1) around the three transects as ground control points, and then implemented geometric correction for the multi-seasonal images. Geometric correction was complemented in software ENVI 5.3 (Harris Geospatial Solutions, Broomfield, Colorado, United States) with a 3rd polynomial transformation and resulted in an accuracy of better than 0.5 pixel root mean square error (RMSE), representing approximately 25 cm or less error on the earth's surface.
To expedite the image processing, three 25-ha square subsets were clipped from multi-seasonal images to cover the area of field measurement, hereafter named Sites 1, 2, and 3, respectively ( Figure 1). The late-spring, mid-summer, early-fall, and late-fall WorldView images of Site 1 are shown in Figure 2. The spatial distribution of ground-measured individual tree species along three transects are shown in Figure 3. The presence and frequency of each species differed from one transect to the next, e.g., red oak was absent from Transects 1 and 3.
Remote Sens. 2019, 11, x FOR PEER REVIEW 6 of 24 1). The late-spring, mid-summer, early-fall, and late-fall WorldView images of Site 1 are shown in Figure 2. The spatial distribution of ground-measured individual tree species along three transects are shown in Figure 3. The presence and frequency of each species differed from one transect to the next, e.g., red oak was absent from Transects 1 and 3.   The spectral values of dominant species over late-spring, mid-summer, and early-fall images are displayed in Figure 4. The seasonal variations are evident in spectral values, with trees in midsummer (solid lines in Figure 4b) generally having higher reflectance, followed by late-spring (dotted lines in Figure 4a) and early-fall (dashed lines in Figure 4c). For all seasons, the greatest variation is observed in Near-IR1 region, followed by Near-IR2 and Red Edge. Among species, the two conifer species (i.e., balsam and hemlock) display lower reflectance than the deciduous species in Red Edge, Near-IR1, and Near-IR2 regions. Among deciduous species, sugar maple and red oak tend to have higher reflectance in Red Edge, Near-IR1, and Near-IR2 regions. The spectral values of dominant species over late-spring, mid-summer, and early-fall images are displayed in Figure 4. The seasonal variations are evident in spectral values, with trees in mid-summer (solid lines in Figure 4b) generally having higher reflectance, followed by late-spring (dotted lines in Figure 4a) and early-fall (dashed lines in Figure 4c). For all seasons, the greatest variation is observed in Near-IR1 region, followed by Near-IR2 and Red Edge. Among species, the two conifer species (i.e., balsam and hemlock) display lower reflectance than the deciduous species in Red Edge, Near-IR1, and Near-IR2 regions. Among deciduous species, sugar maple and red oak tend to have higher reflectance in Red Edge, Near-IR1, and Near-IR2 regions.

Overview
Individual tree-based species classification starts with ITC delineation, followed by object-based classification. In this study, the multi-scale ITC maps were first produced by segmenting the pansharpened multispectral image from the mid-summer, using the spectral angle segmentation (SAS) algorithm [9,36]. Using 750 reference crowns that were manually digitized, the best scale was optimized based on the lowest segmentation evaluation index (SEI) value [37], yielding the final ITC map for the subsequent classification. The object-based species classification was conducted in five feature spaces, including the independent use of early-spring, mid-summer, and early-fall image, the combined use of early-spring, mid-summer, and early-fall images, as well as the integration of all four images. Of the 672 trees identified, 70% of the trees were assigned as the training samples while 30% of the trees were used for validation. RF classifier [20] was then implemented for classifying tree species in the above five feature spaces. Finally, confusion matrices were derived for accuracy

Overview
Individual tree-based species classification starts with ITC delineation, followed by object-based classification. In this study, the multi-scale ITC maps were first produced by segmenting the pan-sharpened multispectral image from the mid-summer, using the spectral angle segmentation (SAS) algorithm [9,36]. Using 750 reference crowns that were manually digitized, the best scale was optimized based on the lowest segmentation evaluation index (SEI) value [37], yielding the final ITC map for the subsequent classification. The object-based species classification was conducted in five feature spaces, including the independent use of early-spring, mid-summer, and early-fall image, the combined use of early-spring, mid-summer, and early-fall images, as well as the integration of all four images. Of the 672 trees identified, 70% of the trees were assigned as the training samples while 30% of the trees were used for validation. RF classifier [20] was then implemented for classifying tree species in the above five feature spaces. Finally, confusion matrices were derived for accuracy assessment and comparison. The workflow of individual tree-based species classification is displayed in Figure 5 and described in more details in the following sections.
Remote Sens. 2019, 11, x FOR PEER REVIEW 9 of 24 assessment and comparison. The workflow of individual tree-based species classification is displayed in Figure 5 and described in more details in the following sections.

Individual Tree Crown Delineation
As our study site was a mixed deciduous forest dominated by shade-tolerant hardwood species, we chose to implement a recently-proposed watershed segmentation method, the SASL algorithm [9,36], for ITC delineation. Watershed segmentation is always implemented with gradient images rather than original images to ensure that watershed lines match the boundaries of ITCs [34]. The first step of the SAS algorithm was to calculate the spectral angle gradient, which is able to take full advantage of eight spectral bands for watershed segmentation. Next, the preliminary segments were produced through the watershed transformation of the spectral angle gradient using the most efficient algorithm proposed by Vincent and Soille [38], which is implemented in the System for Automated Geoscientific Analyses (SAGA). Controlled by the scale parameter of "Seed to Saddle Difference", the preliminary segments were further merged from bottom to top using the merging process implemented in the SAGA [39] to generate multi-scale ITC maps. The upper and lower bounds of this parameter were determined using a "too coarse" and a "too fine" scale, where a "too coarse" scale corresponds to the case that many ITCs are merged together to a large object, and a "too fine" scale corresponds to the case that an ITC is divided into many small pieces. More details of the SAS algorithm can be referred to Yang et al. [9,36]. Finally, we manually delineated and rasterized 750 reference crowns and utilized a supervised scale selection method to optimize the best scale of ITC map for the following object-based species classification. In this step, the modified SEI, which was first proposed by Yang et al. [37] and further modified by Yang et al. [9], was utilized to measure the goodness of fit between the delineated ITCs and reference crowns. As demonstrated by Yang et al. [9], the value of SEI is between 0 and 0.71, with lower values indicating higher similarity to the manual delineated segments of ITC delineation. The resultant SEI values were calculated for the multi-scale ITC maps and the best one was identified by the lowest value of SEI.
In this study, we segmented the mid-summer WorldView image to create the final ITC map for species classification. The mid-summer scene was in the intermediate stage of canopy dynamics among three leaf-on scenes and corresponded best with the ground-based data collection. In order to

Individual Tree Crown Delineation
As our study site was a mixed deciduous forest dominated by shade-tolerant hardwood species, we chose to implement a recently-proposed watershed segmentation method, the SASL algorithm [9,36], for ITC delineation. Watershed segmentation is always implemented with gradient images rather than original images to ensure that watershed lines match the boundaries of ITCs [34]. The first step of the SAS algorithm was to calculate the spectral angle gradient, which is able to take full advantage of eight spectral bands for watershed segmentation. Next, the preliminary segments were produced through the watershed transformation of the spectral angle gradient using the most efficient algorithm proposed by Vincent and Soille [38], which is implemented in the System for Automated Geoscientific Analyses (SAGA). Controlled by the scale parameter of "Seed to Saddle Difference", the preliminary segments were further merged from bottom to top using the merging process implemented in the SAGA [39] to generate multi-scale ITC maps. The upper and lower bounds of this parameter were determined using a "too coarse" and a "too fine" scale, where a "too coarse" scale corresponds to the case that many ITCs are merged together to a large object, and a "too fine" scale corresponds to the case that an ITC is divided into many small pieces. More details of the SAS algorithm can be referred to Yang et al. [9,36]. Finally, we manually delineated and rasterized 750 reference crowns and utilized a supervised scale selection method to optimize the best scale of ITC map for the following object-based species classification. In this step, the modified SEI, which was first proposed by Yang et al. [37] and further modified by Yang et al. [9], was utilized to measure the goodness of fit between the delineated ITCs and reference crowns. As demonstrated by Yang et al. [9], the value of SEI is between 0 and 0.71, with lower values indicating higher similarity to the manual delineated segments of ITC delineation. The resultant SEI values were calculated for the multi-scale ITC maps and the best one was identified by the lowest value of SEI.
In this study, we segmented the mid-summer WorldView image to create the final ITC map for species classification. The mid-summer scene was in the intermediate stage of canopy dynamics among three leaf-on scenes and corresponded best with the ground-based data collection. In order to examine the effects of multi-seasonal images and four extra bands on species classification, the same ITC delineation result was utilized for all of the subsequent classifications. The detailed procedure for the ITC delineation is illustrated in Figure 6.
examine the effects of multi-seasonal images and four extra bands on species classification, the same ITC delineation result was utilized for all of the subsequent classifications. The detailed procedure for the ITC delineation is illustrated in Figure 6.

Object-Based Species Classification
Within each band, and ITC, the average pixel value was determined. In order to determine which season of WorldView images is most effective for species identification, the early-spring, midsummer, and early-fall image were analyzed separately to classify the segmented ITCs into seven dominant species. As there were some canopy gaps in the above three sites, the hierarchical strategy was adopted to mask out these canopy gaps based on the selected ITC map prior to the species identification. The canopy gaps were most often over-segmented during the ITC process, but this did not affect the canopy gap identification. As illustrated by Vincent and Soille [38], the independent use of high spatial resolution optical images is able to capture non-forest gaps (i.e., waterbodies where trees are absent) based on their distinct spectral features but is not sufficient to identify forest gaps (i.e., the gaps appear in the images because trees are much smaller than their immediate neighbors to be invisible in the images) due to their spectral similarity to tree canopies. Unfortunately, the LiDAR data used for that study were collected six years prior to the acquisition of multi-seasonal WorldView images. We felt that it was not appropriate to use the LiDAR data to determine the status of forested canopy gaps in the target year due to the subsequent gap dynamics that had occurred during the intervening the six-year period (e.g., harvesting activities, mortality events, gap filling by small trees). Therefore, we masked out the non-forest gaps only prior to the species classification. Hereafter, we will refer to these as non-forested areas to distinguish them from forested areas, which include both tree canopies and forest gaps.
When using a single-seasonal image, the entire early-spring, mid-summer, or early-fall scene was first classified into forested and non-forested areas, then seven tree species were identified from the forested areas (Figure 7a). When using the late-spring, mid-summer, and early-fall images simultaneously, the mid-summer image was first used for separating the forested and non-forested areas, then all the three seasonal images were used for tree species classification (Figure 7b). When

Object-Based Species Classification
Within each band, and ITC, the average pixel value was determined. In order to determine which season of WorldView images is most effective for species identification, the early-spring, mid-summer, and early-fall image were analyzed separately to classify the segmented ITCs into seven dominant species. As there were some canopy gaps in the above three sites, the hierarchical strategy was adopted to mask out these canopy gaps based on the selected ITC map prior to the species identification. The canopy gaps were most often over-segmented during the ITC process, but this did not affect the canopy gap identification. As illustrated by Vincent and Soille [38], the independent use of high spatial resolution optical images is able to capture non-forest gaps (i.e., waterbodies where trees are absent) based on their distinct spectral features but is not sufficient to identify forest gaps (i.e., the gaps appear in the images because trees are much smaller than their immediate neighbors to be invisible in the images) due to their spectral similarity to tree canopies. Unfortunately, the LiDAR data used for that study were collected six years prior to the acquisition of multi-seasonal WorldView images. We felt that it was not appropriate to use the LiDAR data to determine the status of forested canopy gaps in the target year due to the subsequent gap dynamics that had occurred during the intervening the six-year period (e.g., harvesting activities, mortality events, gap filling by small trees). Therefore, we masked out the non-forest gaps only prior to the species classification. Hereafter, we will refer to these as non-forested areas to distinguish them from forested areas, which include both tree canopies and forest gaps.
When using a single-seasonal image, the entire early-spring, mid-summer, or early-fall scene was first classified into forested and non-forested areas, then seven tree species were identified from the forested areas (Figure 7a). When using the late-spring, mid-summer, and early-fall images simultaneously, the mid-summer image was first used for separating the forested and non-forested areas, then all the three seasonal images were used for tree species classification (Figure 7b). When further adding the late-fall image, the coniferous and deciduous trees were separated from the forested areas using the late-fall image, and each species was then identified from the corresponding coniferous or deciduous class by the combined use of the late-spring, mid-summer, and early-fall images (Figure 7c). further adding the late-fall image, the coniferous and deciduous trees were separated from the forested areas using the late-fall image, and each species was then identified from the corresponding coniferous or deciduous class by the combined use of the late-spring, mid-summer, and early-fall images (Figure 7c).

Figure 7.
Illustration of hierarchical classification strategy within five feature spaces. Mh, sugar maple; Mr, red maple; Be, beech; By, yellow birch; He, hemlock; Or, red oak; Bf, balsam fir. The flowchart (a) illustrates a single-seasonal image classification step, where the entire early-spring, midsummer, or early-fall scene was first classified into forested and non-forested areas, then seven tree species were identified from the forested areas. The flowchart (b) shows the use of mid-summer image first for separating the forested and non-forested areas, and then all the three seasonal images were used for tree species classification The flowchart (c) shows the use of the late-fall image to separate the coniferous and deciduous trees from the forested area using the late-fall image, and each species was then identified from the corresponding coniferous or deciduous class by the combined use of the late-spring, mid-summer, and early-fall images.
In the above five feature spaces, each level of classification was implemented using the RF classifier, which has recently become available in the Trimble eCognition Developer. The RF classifier is an ensemble learning algorithm that consists of many decision trees and the decision is taken on the class that is the mode of the class's output by individual trees [20]. In a decision tree, each node is split using the best split among all variables. In RF, however, each node is split using the best among a subset of predictors randomly chosen at that node. Therefore, the RF classifier is able to overcome a decision tree's tendency to overfit its training set [40]. In the Trimble eCognition Developer, we implemented the RF classifier by using the default setting of eight parameters (i.e., Depth: 10; Min sample count: 0; Use surrogates: No; Max categories: 16; Active variables: 0; Max tree number: 50; Forest accuracy: 0.01; Criteria termination type: Both). RF in eCognition uses the same parameters as a decision tree such as depth, minimum sample count, and maximum categories as well as make use of surrogates. Additional parameters include active variables which are the number of randomly selected features to be considered at each tree node, forest accuracy which is a target for the desired level of accuracy, and a termination criteria which can be set to the maximum number of trees, forest accuracy, or both.
In order to maintain a sufficient number of training and test samples for all the seven species, all three sites were pooled for species classification. To determine whether or not the five feature spaces significantly improve species classification accuracy, seven three-raining-test samples were randomly generated for classifying tree species in all of the five feature spaces. Accuracy parameters, derived from the error matrix, include producer's accuracy (PA), user's accuracy (UA), overall accuracy (OA), and Kappa index of agreement (KIA). The best feature was identified by the highest OA value. Mr, red maple; Be, beech; By, yellow birch; He, hemlock; Or, red oak; Bf, balsam fir. The flowchart (a) illustrates a single-seasonal image classification step, where the entire early-spring, mid-summer, or early-fall scene was first classified into forested and non-forested areas, then seven tree species were identified from the forested areas. The flowchart (b) shows the use of mid-summer image first for separating the forested and non-forested areas, and then all the three seasonal images were used for tree species classification The flowchart (c) shows the use of the late-fall image to separate the coniferous and deciduous trees from the forested area using the late-fall image, and each species was then identified from the corresponding coniferous or deciduous class by the combined use of the late-spring, mid-summer, and early-fall images.
In the above five feature spaces, each level of classification was implemented using the RF classifier, which has recently become available in the Trimble eCognition Developer. The RF classifier is an ensemble learning algorithm that consists of many decision trees and the decision is taken on the class that is the mode of the class's output by individual trees [20]. In a decision tree, each node is split using the best split among all variables. In RF, however, each node is split using the best among a subset of predictors randomly chosen at that node. Therefore, the RF classifier is able to overcome a decision tree's tendency to overfit its training set [40]. In the Trimble eCognition Developer, we implemented the RF classifier by using the default setting of eight parameters (i.e., Depth: 10; Min sample count: 0; Use surrogates: No; Max categories: 16; Active variables: 0; Max tree number: 50; Forest accuracy: 0.01; Criteria termination type: Both). RF in eCognition uses the same parameters as a decision tree such as depth, minimum sample count, and maximum categories as well as make use of surrogates. Additional parameters include active variables which are the number of randomly selected features to be considered at each tree node, forest accuracy which is a target for the desired level of accuracy, and a termination criteria which can be set to the maximum number of trees, forest accuracy, or both.
In order to maintain a sufficient number of training and test samples for all the seven species, all three sites were pooled for species classification. To determine whether or not the five feature spaces significantly improve species classification accuracy, seven three-raining-test samples were randomly generated for classifying tree species in all of the five feature spaces. Accuracy parameters, derived from the error matrix, include producer's accuracy (PA), user's accuracy (UA), overall accuracy (OA), and Kappa index of agreement (KIA). The best feature was identified by the highest OA value.
In order to further determine whether or not the four extra bands of WorldView images (i.e., Coastal, Yellow, Red Edge, and Near-IR2) improve classification accuracy, we repeated the above analyses using only the four traditional bands (i.e., Blue, Green, Red, and Near-IR1) of WorldView images to classify tree species in the best feature space. The above ten sets of seven three-training-test samples were also applied. Moreover, we investigated the effectiveness of each new band by examining overlap values among the sampled tree species, which is a quantitative measure of spectral separability between a focus class and another class (Trimble eCognition Developer). The overlap is defined as the ratio of the overlap between the histograms of two selected classes to the histogram of the focus class. Specifically, the overlap value is between zero and one while the lower value indicates greater separation between the focus class and the other selected class.
Finally, all of the multi-scale ITC maps created in Section 3.2 were used for object-based classification and accuracy assessment to demonstrate the relationship between the accuracy of tree species classification and the quality of ITC delineation.

ITC Delineation
In total, 20 scales of ITC maps were produced by adjusting the scale parameter of "Seed to Saddle Difference" from 0.05 to 1 with an interval of 0.05. As the scale parameter increased, the SEI value first decreased to a minimum value and then increased. The lowest SEI value (0.49) was observed when "Seed to Saddle Difference" was set to 0.5. All subsequent ITC maps (Sites 1, 2, and 3) were created using this scale parameter followed by object-based species classification ( Figure 8). Visual inspection indicated that the ITSs were delineated well, but with a few instances of over-segmentation and under-segmentation ( Figure 8). To further examine how the delineated segments matched with tree crowns, the reference crowns and delineated crowns were overlaid on top of the images from three seasons (early-spring, mid-summer, and early-fall). The late fall image was not included because it was the leaf-off season without the deciduous' crowns. The delineated crowns exhibited three cases of over-segmentation in the selected subset ( Figure 9).

Individual Tree-Based Species Classification
When using an image from a single season, the OA of tree species classification was 0.46 for the late-spring scene, 0.42 for mid-summer scene, and 0.32 for the early-fall scene. When combined, these three images yielded a much higher OA of 0.79. Unfortunately, when using the late-fall image to separate coniferous and deciduous species prior to specific species classification, the OA of conifers and deciduous was 0.80 (UA: 0.53, PA: 0.47 for the conifers; UA: 0.89, PA: 0.86 for the deciduous), and the OA of all species classification decreased to 0.53. To demonstrate the detailed accuracy of individual tree-based species classification, the error matrices of the five feature spaces (Figure 7) were generated in Table 2. The seven dominant species were well identified with high PA and UA by the combined use of late-spring, mid-summer, and early-fall WorldView images.
The classification maps of Sites 1, 2, and 3 are depicted in Figure 10. In Site 1, for example, most of coniferous trees were located along the northeast-southwest transition areas, whereas most of deciduous trees were located at the northwest and southeast areas ( Figure 10). Maple species (i.e., sugar maple and red maple) dominated the second site while hemlock was also abundant in the northwest areas ( Figure 10). It is also worth noting that most of red oak trees were identified from the second site instead of Sites 1 and 3, consistent with the distribution observed in the field work.
By comparing the classification accuracies of five feature spaces (Table 2), we concluded that individual tree-based species were best identified by the combined use of late-spring, mid-summer, and early-fall images. Using this best combination of the images, the OA of tree species classification only using the four traditional bands of WorldView images (0.53) was substantially lower than that using all the eight bands (0.79). The classification accuracies of all the species were not as high as those using all eight bands, except for the PA of sugar maple (Table 3)

Individual Tree-Based Species Classification
When using an image from a single season, the OA of tree species classification was 0.46 for the late-spring scene, 0.42 for mid-summer scene, and 0.32 for the early-fall scene. When combined, these three images yielded a much higher OA of 0.79. Unfortunately, when using the late-fall image to separate coniferous and deciduous species prior to specific species classification, the OA of conifers and deciduous was 0.80 (UA: 0.53, PA: 0.47 for the conifers; UA: 0.89, PA: 0.86 for the deciduous), and the OA of all species classification decreased to 0.53. To demonstrate the detailed accuracy of individual tree-based species classification, the error matrices of the five feature spaces (Figure 7) were generated in Table 2. The seven dominant species were well identified with high PA and UA by the combined use of late-spring, mid-summer, and early-fall WorldView images.
The classification maps of Sites 1, 2, and 3 are depicted in Figure 10. In Site 1, for example, most of coniferous trees were located along the northeast-southwest transition areas, whereas most of deciduous trees were located at the northwest and southeast areas ( Figure 10). Maple species (i.e., sugar maple and red maple) dominated the second site while hemlock was also abundant in the Figure 9. A close view of the reference crowns (left maps: a subset of Site 1. Late-spring, mid-summer, and early-fall images from top to bottom) and corresponding delineated crowns (right maps: a subset of Site 1. Late-spring, mid-summer, and early-fall images from top to bottom). The delineated crowns were taken from the ITC map produced from the mid-summer image based on the optimal scale parameter of 0.5. The late-fall image is not included in the figure because it was taken from the leaf-off season.
The number of test samples that were actually used for classification accuracy assessment also provided insights on how segmentation and non-forest gap classification could affect classification results. We kept 30% of the ground tree samples (i.e., 672 trees) for testing, equivalent to 201 test samples (i.e., 672 × 30%). However, when adding up the test tree samples in Tables 2 and 3, it is obvious that the test tree samples used in classifications were less than those reserved. For example, we only used 151 samples for early fall image classification accuracy assessment and around 167 or 168 for other individual or combination of seasonal image classification assessments. These results suggest that out of 201 test samples, about 50 (=201 − 151) or 24.9% of tree samples were removed from the validation process for early fall image classifications, and about 34 (=201 − 167) or 16.9% of tree samples were removed from the validation process for other individual or combined seasonal image classifications.
The average overlap values of Coastal (0.59) and Yellow (0.51) bands were much higher than those of Red Edge (0.41) and Near-IR2 (0.40) bands, indicating that the Red Edge and Near-IR2 bands captured more spectral separability for tree species classification compared to the Coastal and Yellow bands. While there was only a marginal difference between the Red Edge and Near-IR2 bands, the Yellow band contributed more to tree species separability than the Coastal band. Using the Red Edge and Near-IR2 bands, the spectral separability between sugar maple and other species was the highest, with the overlap values of 0.20 and 0.19, respectively.
We also classified tree species using the ITC maps produced based on different scale parameters, and further utilized the same set of test samples to evaluate their classification accuracies using the multi-seasonal eight-band WorldView images. We chose to use ten scales of ITC maps by adjusting the scale parameter of "Seed to Saddle Difference" from 0.05 to 1 with an interval of 0.1 (Figure 11). It was expected that the highest classification accuracy (i.e., OA: 0.79) would correspond to the lowest SEI value (0.49), i.e., the best scale of ITC map. Meanwhile, we also noted that the classification accuracies dramatically decreased for the neighboring two scales of ITC maps although the quality of ITC delineation was not that bad at those two scales. On the other hand, the classification accuracy was quite stable for the other scales of ITC maps even though the SEI value increased a lot. northwest areas ( Figure 10). It is also worth noting that most of red oak trees were identified from the second site instead of Sites 1 and 3, consistent with the distribution observed in the field work. Figure 10. Individual tree-based species classification map by the combined use of late-spring, midsummer, and early-fall images for three sites.
By comparing the classification accuracies of five feature spaces (Table 2), we concluded that individual tree-based species were best identified by the combined use of late-spring, mid-summer, and early-fall images. Using this best combination of the images, the OA of tree species classification only using the four traditional bands of WorldView images (0.53) was substantially lower than that using all the eight bands (0.79). The classification accuracies of all the species were not as high as Figure 10. Individual tree-based species classification map by the combined use of late-spring, mid-summer, and early-fall images for three sites.

Workflow of Individual Tree-Based Species Classification
It is more intuitive that individual tree-based species classification should be implemented in three consecutive steps: canopy gap elimination, ITC delineation, and object-based species classification. In some circumstances, it may be beneficial to add an additional step of image segmentation for non-forest gap elimination. We streamlined this strategy by utilizing the selected ITC map to identify and mask out the non-forest gaps prior to the species identification. Despite many over-segmentation issues for non-forest gaps in the selected ITC map, the over-segmentation would not affect the identification of non-forest gaps as much as under-segmentation.
In the current study, we also made use of ITC maps produced with a range of scale parameters to classify species at the individual tree level, and found the classification was quite sensitive to the quality of ITC delineation. In other words, the best classification result (i.e., OA: 0.79) was derived from the best ITC map (i.e., SEI: 0.49), and the classification accuracy decreased significantly when using only a slightly worse ITC delineation. This observation could be explained by the specificity of individual tree-based species classification in dense, species diverse forests. Any slight misdelineation of a tree crown would cause subsequent mis-delineation for all of its neighboring tree crowns. This mis-delineation may cause misclassification if the neighboring crown is a different species with different spectral signatures. This in turn can affect the average ITC value and contribute to greater perceived spectral overlap among the neighboring species in all bands. We noticed that when using ITC delineation at very coarse or very fine scales (i.e., SEI was very high) to run classification, the map accuracy was low but quite stable. This was not only because the SEI was designed to give more penalties to serious over-segmentation and under-segmentation, but also due to the sample assignment in object-based classification. At a very coarse scale, many tree crowns of similar or confused spectral features were merged together, and thus very likely to include more than one training or test samples in different species. In our study, we eliminated these kinds of samples, unless all of them belonged to a single species, so classifying these confused trees was incidentally avoided when using this coarse-scale ITC map. Therefore, the classification accuracy was able to remain relatively stable although the over-merging problem resulted in serious confusion of intracrown spectral features. On the other hand, many tree crowns were partitioned into smaller pieces at a very fine scale. With the decreasing size of tree crowns, the intra-crown spectral features became more homogenous, even more similar to their typical spectral characteristics for some trees. We

Workflow of Individual Tree-Based Species Classification
It is more intuitive that individual tree-based species classification should be implemented in three consecutive steps: canopy gap elimination, ITC delineation, and object-based species classification. In some circumstances, it may be beneficial to add an additional step of image segmentation for non-forest gap elimination. We streamlined this strategy by utilizing the selected ITC map to identify and mask out the non-forest gaps prior to the species identification. Despite many over-segmentation issues for non-forest gaps in the selected ITC map, the over-segmentation would not affect the identification of non-forest gaps as much as under-segmentation.
In the current study, we also made use of ITC maps produced with a range of scale parameters to classify species at the individual tree level, and found the classification was quite sensitive to the quality of ITC delineation. In other words, the best classification result (i.e., OA: 0.79) was derived from the best ITC map (i.e., SEI: 0.49), and the classification accuracy decreased significantly when using only a slightly worse ITC delineation. This observation could be explained by the specificity of individual tree-based species classification in dense, species diverse forests. Any slight mis-delineation of a tree crown would cause subsequent mis-delineation for all of its neighboring tree crowns. This mis-delineation may cause misclassification if the neighboring crown is a different species with different spectral signatures. This in turn can affect the average ITC value and contribute to greater perceived spectral overlap among the neighboring species in all bands. We noticed that when using ITC delineation at very coarse or very fine scales (i.e., SEI was very high) to run classification, the map accuracy was low but quite stable. This was not only because the SEI was designed to give more penalties to serious over-segmentation and under-segmentation, but also due to the sample assignment in object-based classification. At a very coarse scale, many tree crowns of similar or confused spectral features were merged together, and thus very likely to include more than one training or test samples in different species. In our study, we eliminated these kinds of samples, unless all of them belonged to a single species, so classifying these confused trees was incidentally avoided when using this coarse-scale ITC map. Therefore, the classification accuracy was able to remain relatively stable although the over-merging problem resulted in serious confusion of intra-crown spectral features. On the other hand, many tree crowns were partitioned into smaller pieces at a very fine scale. With the decreasing size of tree crowns, the intra-crown spectral features became more homogenous, even more similar to their typical spectral characteristics for some trees. We suspect that this is why the classification accuracy remained stable within the fine-scale range, even increasing a bit when using the two finest scales of ITC maps. Based on the above analysis, neither coarse-scale nor fine-scale ITC maps were able to provide the satisfactory classification accuracy as the best scale. Therefore, we conclude, not unexpectedly, that that the quality of ITC delineation directly impacts the accuracy of individual tree-based species classification.
The number of test samples that were actually used for classification accuracy assessment is much lower than the reserved test samples. In specific, about 24.9% of tree samples were removed from the validation process for early fall image classifications, and about 16.9% of tree samples were removed from the validation process for other individual or combined seasonal image classifications. The tree points were removed from analysis due to (1) under-segmentation leading to more than one tree points in some segments; and (2) [41], the influence of the view angle on the image usability could be pronounced when the angle is greater than 20 • . Figure 9 displays some shift when overlapping the mid-summer image-based segments with the late-spring and early-fall images. It is clear that a few segments do not cover the entire tree crowns with black gaps showing at the bottom of the segments in the late-spring and early-fall images. Further investigation is warranted to explore how the view angles affect the segmentation and further classification results.
To validate the effectiveness of multi-seasonal images for tree species classification, this study selected a representative scene from three leaf-on seasons (i.e., spring, summer, and fall), respectively. When using single-seasonal images, we found that the late-spring and mid-summer images produced equivalent classification accuracy (i.e., OA: 0.46 vs. 0.42), but much higher than the accuracy yielded by the early-fall image (i.e., OA: 0.35). This observation was in agreement with the classification result reported by Pipkins et al. [30], concluding that the late-spring and mid-summer images were the better choices than the early-fall to differentiate spectral features between species. In addition to species differences in foliage and shift caused by view angles, this could also be explained by species differences in reproductive biology, including the timing of reproduction as well as the spectral differences between reproductive structures (flowers and seeds). Some tree species like red maple produce flowers and seeds in the spring, while others flower in the spring and set seed in the late summer. Thus, it is possible that reproductive phenology helped to improve the classification accuracy when using the late-spring and mid-summer images, but not the early-fall image. The amount of seed produced on a yearly basis is however not consistent. Most tree species produce bumper crops of seed every 3-10 years and so the strength of the spectral signal associated with flower and seed production may not always be reliable. In the case of Haliburton it was reported by Hossain et al. [42] that 2015 (the acquisition year of multi-seasonal WorldView images) was not a mast year for the most abundant species in Haliburton (sugar maple), meaning that it produced few if any flowers and seeds. In this context, the reproductive structures of other species may have had a great effect on canopy reflectance patterns, such that the spectral differences between species could be even more pronounced in this non-mast year. However, Key et al. [28] and Hill et al. [29] noted that the optimally timed photography acquired during peak autumn colors provided the best single date of image to identify deciduous species because many deciduous species differed considerably in their senescence strategies, producing unique fall colors that could be exploited for species discrimination. Due to the absence of mid-fall image in our study, we were not able to verify this conclusion. Similar to [30], we found that the late-spring and mid-summer images were effective for classifying the forests where coniferous and deciduous species coexist. By the combined use of these three-season images, the significant increase in classification accuracy provided evidence that multi-seasonal images have the capability to enhance inter-species separability for tree species identification [28][29][30]43,44]. However, separation of deciduous and coniferous trees as an extra step based on a late-fall image was not successful, probably due to the complex structure of Haliburton Forest. It is likely that the intermediate canopy and understory coniferous trees, that were hidden beneath the dominant and co-dominant overstory trees in the leaf during three leaf-on scenes, were exposed in the late-fall image and contributed to some spectral mismatches with the earlier scenes.
Although the multi-seasonal imagery greatly improved classification accuracy, the great improvements were also observed when using all eight bands of WorldView images over four traditional bands. These additional four bands (i.e., Coastal, Yellow, Red Edge, Near-IR2) enhanced inter-species separability for tree species identification (i.e., OA: 0.79 vs. 0.53). Several other studies [11,16,25,26] have also found that additional bands improved classification accuracy, particularly when a larger number of tree species had to be separated or when the tree species showed substantial spectral overlaps. Based on the contributions of four extra bands to spectral separability between species, we can draw the conclusion that the Red Edge and Near-IR2 bands were more useful to mitigate spectral overlaps than the yellow and coastal bands and that the coastal band contributed least to the improvement in spectral separability. This is likely because the Near-IR2 band partly overlapped the wavelength range of Near-IR1 band and was less affected by atmospheric influence [13] and more sensitive to chemistry and physical composition of vegetation [27]. The Red Edge and Yellow bands were able to capture even minor differences in carotenoid and chlorophyll pigments amongst species, thus supposed to enhance the separation between different species and may be most effective in early fall when deciduous leaves are experiencing senescence [11,24,25].

Conclusions and Future Work
In this study, we proposed an operational workflow of individual tree-based species classification using three-seasonal WorldView images, involving three steps of ITC delineation, non-forest gap elimination, and object-based classification. In specific, we implemented the SAS algorithm for ITC delineation, followed by non-forest gap classification and individual tree species identification using the RF classifier. The late-spring and mid-summer images produced similar classification accuracy when using single-seasonal images, but much higher than the accuracy yielded by the early-fall image. A multi-temporal classification approach was satisfying using late-spring, mid-summer, and early-fall images, whereas the use of late-fall image for the separation of deciduous and coniferous trees as an extra step was not successful and thus failed to further improve the classification accuracy. We also noted that the four extra bands of WorldView images contributed substantially to classifying species at the individual tree level. It was further concluded that the best species classification map was obtained only using the best scale of ITC map. Even small changes in the scale of ITC, in either a finer or a coarser scale, lowered the classification accuracy dramatically.
Although this study proposed an operational method to classify individual tree species, some issues (e.g., image view angle, forest gap identification, ITC delineation refinement, and classification improvement) will limit its potential applications thus still warranting further work. Supplementary data sources, such as SWIR bands of Worldview images, LiDAR, and hyperspectral data, can also facilitate this task and may achieve a higher accuracy of individual tree-based classification despite the increasing costs of data acquisition and processing complexity.
Despite the fact that many efforts have been made to improve delineation accuracy in recent years [6,9,18,19,45], ITC delineation algorithms are still in need of further development, especially for interlocking tree crowns in dense forests. Some recent advances include work by [18,19] who identified eight types of ITC shapes (i.e., regular, embayed, dumbbell, bent dumbbell, worm, irregular worm, irregular, and convolute) by evidence-based rules and then refined these types of poorly delineated crowns by splitting them into multiple ITCs. Guo et al. [46] recently provided a review on deep learning for semantic segmentation, and suggested that recent studies in deep learning have resulted in groundbreaking improvements in the accuracy of the segmentations. For ITC delineation, Weinstein et al. [47] used semi-supervised deep learning neural network-detected ITCs in RGB imagery, and they provided neural networks the opportunity to learn generalized features on a wider array of training examples, with a goal to improve deep learning on limited training data.
Separating forest gaps from tree canopies with optical images as was done in [36] would help reduce the "noise" created by young trees in the forest gaps. Classifying these trees in the forest gaps in a separate step would likely improve the OA of the dominant or co-dominant trees in the continuous canopies. The additional canopy height model (CHM) derived from LiDAR data should further improve the forest and non-forest canopy gap elimination, provided that the acquisition of LiDAR data was within a year or two of the multi-seasonal WorldView images.
Additional and supplementary data sources could also be used to achieve a higher accuracy, though it would increase the costs of data acquisition and processing complexity. At large scales, cost and limits in storage and computing power are significant barriers. To maximize the utility of the limited funding source, we decided to use eight band WorldView images with 0.5 m (Panchromatic) and 2.0 m (Multispectral) instead of using all other possible bands (including SWIR and CAVIS) and the higher spatial resolution of 0.3 m (Panchromatic) and 1.2 m (Multispectral). We understand that the SWIR bands and the higher spatial resolution have a high potential for tree species classification, however, the cost of image acquisition would be dramatically increased for more bands or for higher spatial resolution. One should consider acquiring the complete data for analysis when funding permits. In addition, although Korpela et al. [22] suggested that multi-seasonal images could compensate for the limited spectral resolution of multispectral images to some extent, a narrow-band hyperspectral image is still valuable to provide more spectral information for tree species classification [5,14,21]. To further explore this, spectral signatures of target tree species should be measured in specific wavelength ranges and compared to their physicochemical properties. Despite the increased accuracy of species classification found in this study, acquiring three seasons of eight-band WorldView was still a very cost-prohibitive task compared to four-band aerial imagery. From this perspective, it could be of great value to identify the exact timing of image acquisition for optimal classification while acquiring and using as few images as possible. For instance, Madonsela et al. [25] combined only two WorldView-2 images acquired at key points of the typical phenological development of savannahs (peak productivity, transition to senescence) to improve the discrimination of savannah tree species. It may be interesting to acquire images every week for the entire growing season and identify which two scenes produce the highest OA of all the species present in the forest and not just the most common ones. In our study site, this would mean that we would need to sample extensively to acquire a suitable sample size for the entire area of 22 tree species found on the landscape. As we move from experimental to operational, the ability to identify all species will become even more important, particularly when we consider the fact that understanding and documenting species diversity is a key component to sustainable forest management and one of the primary goals of any forest inventory.
Further, we only used band reflectance to test the potential influence of multi-seasonal images on tree species identification. We are aware that a combination of spectral features (e.g., band reflectance, various vegetation indices, and transformation features) with spatial and textural features (e.g., first order textural feature) may improve individual tree species classification, especially given the rich and useful textural information hidden in the images. In addition, we only used RF classifier and a set of default setting of eight parameters to enable comparison. Advanced machine learning algorithms that have been developed in recent years may achieve better classification accuracy, and different settings of parameters hold potential to further improve classification results with RF algorithm. As indicated in the methodology section, we used the default values for the RF parameters in eCognition. For example, the default number of trees was 50 while this parameter is usually set to be much higher in similar studies. Further work should further explore the effect of the number of trees in the results and ensure the out of bag observations error does not change with increased number of trees higher than 50. For the images we used, the impact of view angles on segmentation and classification was ignored, while previous studies (e.g., [41]) have highlighted the influence of the view angle on the usability of the data for classification approaches. Further, we focused on the accuracy improvement by taking advantage of three-seasonal images compared to each single-seasonal image. Madonsela et al. [19] has showed that the combined use of WorldView-2 images from two seasons also improve the classification accuracy. Therefore, future work should focus on testing more features for classification, exploring different classifiers, investigating more setting of control parameters when running RF classifier, considering the impact of sensor view angles on classification, and taking into account all possible combinations of the four images so that the users would know the best option considering accuracy and data costs.