Combining Spectral, Spatial-Contextual, and Structural Information in Multispectral UAV Data for Spruce Crown Delineation

: Precise delineation of individual tree crowns is critical for accurate forest biophysical parameter estimation, species classiﬁcation, and ecosystem modelling. Multispectral optical remote sensors mounted on low-ﬂying unmanned aerial vehicles (UAVs) can rapidly collect very-high-resolution (VHR) photogrammetric optical data that contain the spectral, spatial, and structural information of trees. State-of-the-art tree crown delineation approaches rely mostly on spectral information and un-derexploit the spatial-contextual and structural information in VHR photogrammetric multispectral data, resulting in crown delineation errors. Here, we propose the spectral, spatial-contextual, and structural information-based individual tree crown delineation (S3-ITD) method, which accurately delineates individual spruce crowns by minimizing the undesirable effects due to intracrown spectral variance and nonuniform illumination/shadowing in VHR multispectral data. We evaluate the performance of the S3-ITD crown delineation method over a white spruce forest in Quebec, Canada. The highest mean intersection over union (IoU) index of 0.83, and the lowest mean crown-area difference (CAD) of 0.14 m 2 , proves the superior crown delineation performance of the S3-ITD method over state-of-the-art methods. The reduction in error by 2.4 cm and 1.0 cm for the allometrically derived diameter at breast height (DBH) estimates compared with those from the WS-ITD and the BF-ITD approaches, respectively, demonstrates the effectiveness of the S3-ITD method to accurately estimate biophysical parameters.


Introduction
Spruce trees are important conifers in the temperate and boreal regions of the Northern hemisphere. They display the highest gross primary productivity in global forest ecosystems [1,2]. They are critical entities in mitigating the anthropogenically induced carbon imbalance between the atmosphere and the Earth's land surface [3]. Efforts to preserve and/or generate a climate-resilient forest ecosystem include sustainable forest management [4], precision forestry [5], and pest-and climate-resilient species and genotype selection [6,7]. However, such efforts demand accurate estimates of forest parameters such as tree height [8], biomass [9], leaf area index [10], and foliar pigment concentration [11], which are crucial inputs in activities such as tree-health monitoring [12], ecosystem modelling [13], and carbon-cycle studies [14]. Traditional field-surveying approaches for collecting tree-level parameters in large forests is time-consuming, low-throughput, and costly in terms of human labour. Alternatively, remote sensing data collected using optical light detection and ranging (LiDAR) and radio detection and ranging (RADAR) erogeneity and shadowing within an individual crown. In contrast, VHR multispectral data contain details of crown components, including branches, twigs, and leaves, resulting in an undesirably large variation in pixel values within individual tree crowns [31]. Thus, a preprocessing step that minimizes crown heterogeneity (i.e., intracrown pixel variance) is critical to delineate tree crowns reliably in the case of VHR multispectral data. Employing techniques such as the Gaussian smoothening mitigates the spectral heterogeneity in the data at the cost of losing crown edge information [32]. By grouping pixels based on spectral similarity and being subjected to geometric constraints, object-oriented crown delineation approaches use template matching [22], multiresolution analysis [30], and hierarchical segmentation [33] to mitigate the effect of spectral heterogeneity in crown delineation. Huang et al. (2018) mitigated the crown spectral heterogeneity problem in VHR multispectral data by performing marker-controlled watershed segmentation on the morphologically smoothened bias field estimate [25]. However, deriving edge-mask (i.e., image representing the edges) using an edge-detection filter often results in inaccurate boundary delineation of tree crowns in forests with proximal and/or overlapping crowns. In addition, it does not exploit the potential of the 3D crown structural information in VHR multispectral data to accurately delineate tree crowns. Thus, it is essential to develop a crown delineation approach that jointly exploits both the spectral, spatial-contextual, and 3D crown structural characteristics to address the undesirable effects of intracrown spectral heterogeneity and nonuniform illumination/shadowing. This paper describes a novel approach to delineate individual spruce crowns by combining spectral, spatial-contextual, and crown structural information in VHR photogrammetric multispectral data. The proposed crown delineation method, referred to as the S3-ITD, achieves accurate crown delineation by: (a) minimizing the undesirable effect of intracrown spectral variance and nonuniform illumination/shadowing by exploiting the noise robustness of the fuzzy framework and (b) jointly exploiting the spectral, spatialcontextual, and 3D structural information in VHR photogrammetric multispectral data.
In the following Section 2, we will first describe the study area and the data collected. Section 3 elaborates on the proposed crown delineation approach. The experiments that validate the performance of the method are reported in Section 4. Section 5 discusses the experimental results in addition to highlighting merits and limitations of the proposed method. The paper is concluded in Section 6.

Study Area and Data Description
The study area is a mature planned white spruce (Picea glauca) forest located in Saint-Casimir in the province of Quebec in southern Canada, with an approximate centre at 46°72 N-46°11 E. The forest stand is at the centre of a Canada-wide research project investigating white-spruce tolerance in response to a changing climate and includes individuals of 180 different white-spruce genotypic families, differing in their physiological and phenological characteristics [34].

Reference Data
In the experiments, we use two different sets of reference data collected for trees in eight carefully selected plots with different levels of crown complexity and overlap. The former set of reference data includes field-collected tree height and stem diameter at breast height (DBH) for all the trees in the eight plots, obtained in August 2018. The latter set consists of crown boundary/span (in the form of vector shape files) of the individual trees in the eight plots which were manually derived by an expert operator by visual interpretation of the photogrammetrically generated 3D dense point cloud and the image orthomosaic. The manual approach to crown span/boundary collection was preferred as it was challenging to derive crown span/boundary from the field due to factors such as the proximity to neighbouring trees, crown overlaps, and irregular conical shape of the spruce crowns. The plotwise basic statistics of the tree height and the DBH for all the trees in the eight plots are shown in Table 1. We compare the reference DBH with estimated DBH to evaluate the performance of the ITD approaches. Optical data provides only the canopy-level information of trees and, hence, we indirectly estimate the stem DBH of a delineated tree in the experiments by using the allometric equation where DBH i is the estimated DBH of the ith tree in millimetres, and h i and d i are the tree height in decimetres and the crown diameter (in decimeter), respectively. The model coefficients specific to spruce trees used are b 0 = −3.524, b 1 = 0.729, and b 2 = 1.345 [35]. We correct the bias in the DBH estimate resulting from the nonlinear transformation on the dependent variables in (1) by the method described in [36]. The estimated stem DBH is compared with the reference stem DBH measured on the field to evaluate the biophysical parameter estimation performance of the respective delineation tree method.

Remote Sensing Data
The VHR multispectral images of the forest area were acquired on 10 October 2018 using a modified MicaSense RedEdge multispectral camera (Micasense, Seattle, WA, USA) mounted on a DJI Matrice 100 quadrocopter (DJI Technology Co., Ltd., Shenzhen, China) with Autonomous mode with the global positioning system (GPS)-based waypoint navigation. The custom-filter camera captures near-simultaneous images in five narrow bands in the visible and the near-infrared regions of the electromagnetic spectrum (Table 2). Images were acquired for a 0.11 square km area ( Figure 1) on 10 August 2017 within two hours of local solar noon to minimize shadowed area and in stable illumination conditions (i.e., clear skies or fully overcast). The data acquisition was made with more than 75% swath overlap and sidelap to facilitate accurate detection of tie-points critical for accurate photogrammetric processing.

Methodology
The proposed method exploits the spectral, spatial-contextual, the 3D crown structural information in the VHR photogrammetric multispectral data to perform accurate crown delineation in a fuzzy framework. Figure 2 shows the block scheme of the proposed S3-ITD tree crown delineation approach. The 3D point cloud P and the orthomosaic O derived from the UAV optical data are the sources of the geometric and spectral/spatialcontextual information in the S3-ITD approach. Block scheme of the spectral, spatial-contextual, and structural information-based individual tree crown delineation method (S3-ITD), crown detection, and delineation method for VHR multispectral data. The geometric and radiometric corrections of the VHR data are performed using the photogrammetrically derived digital surface model (DSM) and the known reflectance panel parameters, respectively. The set of treetops t corresponding to local maxima in the canopy height model (CHM) are detected using the local maxima detection (LM) algorithm. The crown-fractional map (u c ) generated using the Markov random field based spatial-contextual model (FCM-MRF) classifier is integrated with the ridge map (u r ) obtained using the marker-controlled watershed segmentation in order to derive the ridge-integrated fractional map (u rc ). Spruce crown delineation is achieved by performing region growing on the ridge-integrated fractional map (u rc ) using the gradient vector field (GVF) snake algorithm.

Preprocessing
Radiometric preprocessing of the raw VHR multispectral data is performed separately for individual bands to transform respective pixel values to physically meaningful radiance value L. The transformation compensates for sensor black level, the sensitivity of the sensor, the sensor gain, the exposure settings, and the lens vignette effects, using (2).
where p is the normalized raw DN number, P BL is the normalized dark pixel value, a 1 , a 2 and a 3 are the radiometric calibration coefficients, t e is the exposure time, g is the sensor gain, (x, y) is the pixel coordinate set, and L is the radiance. The reflectance conversion is performed by multiplying the radiance image by a scaling factor that is determined by measuring the radiance of a surface with known reflectance [12]. In this study, the radiometric processing to convert raw digital numbers (DN) to radiance and then to reflectance is performed in the AgiSoft Metashape Professional commercial software (version 1.5.5) using the radiance derived from a 61% reflectance panel that is imaged right before the data acquisition [12,37]. The reflectance (i.e., radiometrically corrected) images are also geometrically preprocessed to (a) estimate the camera and sensor orientation at the imaging instance, (b) determine the 3D crown geometric models from images, and (c) remove spatial distortions in the images. We perform all the geometric preprocessing steps using the Agisoft Metashape Professional commercial software for its excellent performance in feature detection, tie-point matching, and photo-based 3D reconstruction using the structure from motion (SfM) photogrammetric technique [38]. The camera and the sensor orientation parameters required for the image alignment and sparse point cloud generation were estimated using Agisoft with high accuracy by selecting 40,000 and 4000 key and tie points, respectively. An automatic outlier removal is performed on the sparse 3D cloud by removing 10% of the points with the largest reprojection errors. The alignment and location errors were estimated based on 14 widely spaced control points, for which the accurate geographic location is collected using the Trimble Geo XT GPS (Trimble Inc., Sunnyvale, CA, USA). A 3D dense point cloud representing the 3D canopy structure and underlying ground topography was generated with the quality attribute set to medium. The dense point cloud data are used to derive the digital surface model (DSM) and digital elevation model (DEM) that represent the crown surface geometry and underlying Earth surface geometry, respectively [39]. The canopy height model (CHM) representing the local tree canopy height is obtained by subtracting the DEM from the DSM. Effects of canopy/ground surface relief on the reflectance images are compensated for by performing orthorectification using the DSM. The output of the geometric processing is referred to as the image orthomosaic, where all crown geometric distortions are minimized, if not removed, for the individual trees.

Crown Detection
The individual spruce crowns are detected and localized by (a) performing Gaussian smoothening using a N m × N m spatial mask on the CHM to remove artifacts caused due to vertical branches proximal to the treetop, where N m is the the kernel size of the Gaussian filter that is chosen in an empirical way to minimize both omission and commission errors in detecting tree crowns; (b) detecting and localizing peaks in the CHM using the LM algorithm on the assumption that treetops manifest themselves as local maxima in the CHM; and (c) selecting only the apexes points that have an apex-height greater than a threshold t h in order to minimize the commission error caused by other shorter vegetation such as shrubs in the scene [40]. The value of t h is selected using a sensitivity analysis that aims to maximizes the overall crown detection accuracy by varying t h between 0 m and 3 m with increments of 0.25 m. Based on the assumption that treetops in CHM, which are very proximal to one another, are very likely to be part of the same tree (e.g., false peaks resulting from protruding branches), we merge the treetops that are less than a threshold Euclidean distance of d g from each other. Considering the approximate conical shape of spruce crowns, we define d g to be equal to one-fourth of the average distance of individual treetops to its nearest neighbour. The locations of treetops which remains after merging are used as the seed points to perform spruce crown delineation.

Crown Delineation
For each detected spruce crown, the S3-ITD approach performs crown delineation by performing region growing on a fuzzy crown-spectral map integrated with ridgelocation information.

Fractional Map Generation
VHR multispectral images contain pixels which are often not pure but mixed in nature due to the presence of more than one object/land-cover class in the scene and, hence, the relative composition of each class in a pixel can be better represented by a class membership value rather than digital numbers. In our case, the membership value u ij ∈ [0, 1] of the ith image pixel to belong to jth class is estimated by jointly considering the information in both the multispectral spectral bands and the CHM. We use the fuzzy C-means classifier integrated with the Markov random field based spatial-contextual model (FCM-MRF) [41] to generate noise-robust fractional maps U = u ij , i ∈ [1, N]; j ∈ [1, N c ] and which contain estimates of the class membership values of each pixel. Here, N and N C are the total number pixels and classes, respectively. The fractional maps generated by the FCM-MRF classifier are the least affected by spectral variance and nonuniform illumination/shadowing, as the class membership values of a pixel is estimated by jointly considering the spectral (i.e., DN) value, CHM value, and spatial-context of the pixel. Here, the study focuses only on two broad object/land-cover classes: (a) the crown and (b) the background. The former class is composed of stem, branches, and leaves, and the latter class is composed of the remaining background objects in the scene, including soil and shadowed regions/objects. We refer to the factional maps generated against the crown and the background class as u c ∈ U and u b ∈ U, respectively. The objective function (3) of the FCM-MRF is a minimization problem that minimizes the posterior energy E of each image pixel by considering both the spectral similarity with respective class reference spectrum, local crown height, and spatial context of pixels.
Here, N is the number of pixels, m is the fuzzification index, and D is the Euclidean distance between the data point x i and the cluster centre c j . The optimal m is estimated by a sensitivity analysis that aims to maximize classification performance while minimizing the loss of edge information in the data measured using image entropy. Here, m is varied between 1.2 and 3.0 with increments of 0.2 [42]. We initialize the crown class with the average spectral response of the 25 brightest pixels that are located within a distance of 1 m from the 10 randomly selected treetop locations (i.e., local maxima in the CHM), while the ground class is initialized with the average spectral response of the 25 darkest pixels that are located within a distance of 1 m from 10 randomly selected local minima locations in the CHM. All parameter updates are subjected to the constraint 0 ≤ u ij ≤ 1, which ensures that the class membership values are effectively relaxed. Here, N ξ is the pixel and v 3 (w r , w r , w r ) represent the potential function corresponding to the single-site w r , pair-site w r , and triple-site w r cliques, respectively. A clique is a neighborhood pixel subset where individual members are mutual neighbours [41]. The first term in (3) estimates the spectral similarity of a pixel to individual classes, while the second term is an adaptive potential function that estimates the influence of a pixel with its neighbours in N ξ , where η is the pixel value variance in N ξ . A larger η results in lower neighbourhood influence, and a higher neighbour influence can be generated from smaller η. It is worth saying that higher η occurs at the crown boundaries and, hence, causes minimum neighbourhood influence in the estimation of the corresponding class membership values. The influence of the first and the second components in determining the class membership value is controlled by λ ∈ [0, 1]. The smoothening strength at the boundaries is controlled by γ ≥ 0. The global posterior energy E for the ith pixel and jth class in (3) is minimized using the simulated annealing optimization algorithm (Goffe, 1996) by iteratively modifying u ij and c i using (4) and (5), respectively.
The optimized fractional maps u c and u b represent the likelihood of a pixel to belong to the crown and the background class, respectively. Here, u c is identified as the one which has its mean spectral value most proximal to the mean spectral value of the referencecrown-class spectral value. The class membership associated with a class is often never zero, even in regions dominated by other classes. For example, the vegetation membership value associated with a soil pixel is not often zero. This low membership value is a source of noise and, hence, is undesirable for our analysis. Thus, we remove the undesirable variance in u c ∈ [0, 1] by assigning zero to all membership values u c ij where u c ij ≤ u b ij .

Crown Ridge Detection
Crown delineation using only u c becomes challenging when there is no detectable variation in the likelihood values u ij at the crown boundaries. Such challenges occur at those sections of tree crowns which overlap or touch the neighbouring crown(s). In this case, we exploit the canopy surface information in the CHM to identify ridges that correspond to the lowest height contour in the local valley around tree crowns. It is worth noting that the ridges only define the section of boundaries where tree crowns overlap or touch and not the boundary sections, which are already separated from neighbouring crown(s) by another class (e.g., the ground class). Individual pixels in CHM, b i ∈ [0, N] represent the ith pixel height and, hence, a binary ridge map u r derived from the CHM is used to locate the crown boundaries at the overlapping regions. We derive u rc by (a) performing the marker-controlled watershed algorithm with the treetop locations as the markers/seeds, (b) assigning maximum membership value (i.e., 1) to all the watershed pixels in u r , and minimum membership value (i.e., 0) to all the ridge pixels in u r . We perform a pixelwise multiplication of the ridge map u r and the u c to generate the ridgeintegrated fractional map u rc , which is used to perform the crown delineation. The ridges occur at the crown boundaries of all proximal trees with overlapping or touching crowns, and the pixelwise multiplication forces all the u c pixels at the watershed ridges to have the minimum membership value (i.e., 0). To this point, we assume that canopy overlap/fusion does not occur below the height of the local crown valley, as both branch and foliage density below the height is likely to be very low due to the minimal availability of sunlight.

Crown Segmentation
Delineation of individual tree crowns is performed on the ridge-integrated fractional map u rc . Here, the gradient vector Ffow (GVF) snake region-growing algorithm [43,44] is used to perform crown delineation in VHR multispectral data for its (a) tolerance to pixel heterogeneity (i.e., abrupt changes or edges) within a crown area and (b) ability to map complex crown shapes without causing overestimation of crown area. The GVF snake algorithm detects tree-crown boundaries by iteratively minimizing the energy E S of a seed curve f (s) = [x(s), y(s)], s ∈ [0, 1] in the spatial domain R 2 of the input tree-crown image. The objective energy function of the GVF snake algorithm is given by (6), At minimum energy state, (6) must satisfy the Euler equation as shown in (7).
This can be viewed as a force balance equation F int + F ext = 0, where F int = αx (s) − βx (s) and F ext = −∆E ext are the internal and external forces acting on the input seed curve. On the one hand, the internal force F int resists the stretching and bending of the curve, while on the other hand, the external force F ext pulls the curve towards the image edges/boundary. Here, the edge map e(x, y), derived from the image u c (x, y), is used as the −E ext .
where, the first and second terms represent the partial derivatives of the vector field, and the gradient field of the edge map f (x, y) = E i ext (x, y) where i = 1, 2, 3, 4, respectively. The regularization parameter µ controls the contributions from the first and the second terms. The GVF can be iteratively solved by treating v and w as time-variant parameters, using (9) and (10) v t (x, y, t) = µ∆ 2 v(x, y, t) − (v(x, y, t) − e x (x, y)).
(e x (x, y) 2 − e y (x, y) 2 ) The computed g is used as the external potential force in (11).
The parametric curve that solves (11) is referred to as the GVF snake contour. We initialize the parametric curve corresponding to the ith tree crown as a set of uniformly spaced 100 points at a distance of 0.1 m from the respective treetop (i.e., in a circular pattern).
The assumption here is that the region most proximal to the treetop is very likely to be part of the respective tree crown. The thin-plate energy β, membrane energy α, and balloon force δ are estimated in an empirical way to minimize crown segmentation error. Over a finite number of iterations, the vertices of the seed curve (which is has a circular shape in our case) shifts toward the crown boundary based on the membership values of the crown pixels. However, the shifting is constrained at the crown edges as the membership values abruptly fall to a minimum at the ridges. The boundary point updation is stopped when the average shift in vertices over successive iteration is less than 1 unit. The resulting contour captures the 2D crown boundary/span of a tree crown in the VHR multispectral data. The output of the GVF-snake-based region growing on a tree is a polygon vector shape v t that defines the crown boundary of the tree.

Results
The alignment of individual UAV images was performed using the AgiSoft Metashape Professional software and had a mean standard deviation error of 3 m for the camera locations and a mean error of 3.2 pixels for the tie points. The reprojection errors were approximately 0.35 pixels, and the root mean square error (RMSE) values of the GPS position residuals is 1.2 m. In our case, the point cloud P generated (Figure 3a) was of medium quality with a mean density of 96 points/m 2 , and is in turn used to generate the image orthomsaic (Figure 3b). The apex-height threshold t h that maximizes the crown detection accuracy is estimated to be 2 m. For all experiments, treetops are automatically detected by performing local-maxima detection on the CHM (Figure 4c) that is Gaussian smoothened using the 3 × 3 spatial mask. The kernel size N m is estimated to be 3 for the considered plots. We merged treetops detected by the local-maxima algorithm, which are more proximal than d g equal to 1 m to prevent multiple treetops for the same tree.  Both u c and u b are derived from the FCM-MRF classifier with the optimal fuzzification factor m as 2 and the number of clusters C set to 2. Figure 5a,b shows the fractional maps derived without and with the incorporation of the MRF-based spatial-contextual term in FCM, respectively. It can be clearly seen that the u c generated by the FCM-MRF classifier has relatively less intraclass membership variance. The thin-plate energy β, membrane energy α, and balloon force δ for the dataset are estimated as 1.5, 0.2, and 0.8, respectively. Figure 6a shows the marker-controlled watershed segments (coloured areas) obatined using the detected treetops (red dots) as markers, and ridges (white lines) in the CHM, for a sample plot. The fractional image of the crown class and the ridge-intergrated fractional map for the sample plot are show in Figure 6b,c, respectively. Figure 7a,b show the iterative region growing (starting from the seed circle placed around the treetop) performed by GVF snake on a fractional image with and without ridge integration, respectively. It can be clearly seen (Figure 7b) that crown span is overestimated when the ridge integration is not performed on the crown fractional map. This is because (a) the GVF snake can smoothly expand the seed curve as the intracrown class membership values tend to be more homogeneous, and (b) the abrupt fall in membership values at the crown boundary constrain the expansion of the seed curve. In other words, crown delineation error is maximum when considering only the spectral information but ignoring the spatial contextual and structural crown information. Figure 8 shows the crowns detected and delineated by the proposed method for the eight reference plots with various levels of complexities in terms of crown proximity, tree height, and crown span. (a) (b) Figure 7. The crown delineation by the gradient vector field snake (GVF snake) on fractional images with and without ridge integration are shown in (a,b), respectively. A circular seed contour whose centre is placed at the treetop (red dot) is iteratively grown (red lines) using the GVF snake to detect crown boundary (dotted-green line). The reference crown boundary is shown in white-dotted lines. Figure 8. The crown polygons (black lines) were derived using the spectral, spatial-contextual, and structural information-based individual tree crown delineation (S3-ITD) approach for the eight reference plots (Plot 1-Plot 8 (a-h)). The manually delineated reference crown boundaries and treetops are shown using dotted white lines and red dots, respectively.

Assessing Crown Delineation Accuracy
We assessed the performance of the S3-ITD method on the eight circular forest plots and also compared it with the performance of two other state-of-the-art (SoA) methods. The first SoA method considered is the widely used marker-controlled watershed segmentation (WS-ITD), which uses treetop locations in the image as the seeds for crown segmentation [20]. The second SoA method is the bias-field segmentation algorithm (BF-ITD), which performs watershed segmentation on the bias-field image generated by modelling the variance in the local neighbourhood of a pixel [25]. The absolute accuracy of individual crown polygons is obtained by comparing it with the reference crown polygons. The trees proximal to the plot boundaries with incomplete crowns were not considered for the accuracy assessment. The accuracy of delineation is quantified using two indices: (a) The intersection over union (IoU) ∈ (0, 1), which is the ratio of the common area and the total area, between the estimated and the reference crown polygons, respectively. The IoU value of less than 1 occurs when the estimated crown polygon does not exactly overlap the reference crown area, and an IoU value equal to 0 corresponds to no crown overlap. (b) The crown-area difference (CAD) ∈ (−∞, ∞), which is the difference in area between delineated crown polygons and reference polygons. A crown polygon larger than the reference polygon results in positive CAD values, and a polygon smaller than the reference polygon results in negative CAD values. The CAD value provides a quantitative estimate of the underestimation (i.e., positive CAD values) or overestimation (i.e., negative CAD values) in the crown area. The IoU and CAD for an accurately delineated crown are 1 and 0 m 2 , respectively. Table 3 shows the mean IoU and mean CAD for trees in the eight reference plots. It can be seen that the S3-ITD method delineated crowns with high IoU and small CAD for all the eight reference plots, compared with the WS-ITD and the BF-ITD methods. This shows the ability of the S3-ITD method to accurately delineate tree crowns by addressing issues resulting from large crown spectral variance and nonuniform illumination/shadowing. In the case of WS-ITD, the low IoU and high CAD occur due to overestimation of the crown boundaries by the watershed algorithm. This is because the watershed algorithm assigns all the pixels including the noisy and the shadowed ones to one of the nearest crown segments, and the background is selected by performing binary thresholding on the input image. The BF-ITD attempts to address the problem of large crown spectral variance with the help of field-bias maps. However, the algorithm relies only on the spectral crown data to achieve crown delineation, and, hence fails to distinguish crowns of neighbouring trees, where there is no or little difference in crown spectral properties. Although the BF-ITD method performs better than the WS-ITD methods, its limited ability to accurately delineate proximal and or overlapping crowns affects the delineation performance. Figure 9 shows the IoU and CAD distribution for all the trees in the eight reference plots represented as boxplots. Table 3. The mean intersection over union (IoU) and mean crown-area difference (CAD) for trees in the eight reference plots. For an accurately delineated crown, the IoU ∈ (0, 1) and CAD ∈ (−∞, ∞) will be 1 m 2 and 0 m 2 , respectively.

Performance Validation Using DBH Estimates
We validate the performance of crown delineation approaches based on its ability to accurately estimate the stem DBH. The crown parameters derived from the S3-ITD, WS-ITD, and BF-ITD methods are separately used to estimate the stem DBH in an allometric fashion using (1) and comparing with the field-collected reference DBH to qualify the DBH estimation performance. The input parameters of (1) include the crown diameter d i and tree height h i and are taken as the maximum span of crown boundary of polygon shape vector derived from the respective crown delineation method and the field-collected tree height, respectively. The DBH estimation accuracy quantifies the ability of the considered approach to accurately delineate spruce crowns. We also evaluate the DBH estimation accuracy of trees based on the spectral homogeneity within tree crown in order to quantify the effect of intracrown spectral variance in crown delineation. Thus, we divide the 383 trees from the eight plots into three spectral groups with different ranges of pixel homogeneity represented in terms of image entropy. The Group 1 and the Group 3 consist of trees with the lowest (i.e., homogeneous) and the highest (i.e., heterogeneous) intracrown spectral variance, while Group 2 has trees with intermediate intracrown spectral variance. Table 4 shows the mean error (ME), mean absolute error (MAE), and root mean squared error (RMSE) in DBH estimates for the three different groups. In all the investigated cases, increasing heterogeneity in the crown affected the crown delineation accuracy and is reflected as higher RMSE (see Table 4). In general, the lower DBH estimation error associated with the S3-ITD approach proves its ability to accurately delineate tree crowns in very-high-resolution multispectral data. This is evident from the minimum RMSE in DBH for S3-ITD in all three entropy groups compared with the WS-ITD and the BF-ITD methods. The ME provides an estimate of the underestimation or overestimation associated with the individual methods. i.e., a negative ME value implies that the DBH values for most of the trees in a plot were underestimated, and a positive ME corresponds to overestimation. The WS-ITD often results in large inaccurate segments and is evident from its positive ME in DBH estimation, obtained on the plots (Table 3). It is also worth noting that the ME is found to be minimum and the closest to zero for the S3-ITD method compared with the other methods for all the three groups. This infers the ability of the S3-ITD method to accurately estimated stem DBH.

Discussion
The proposed S3-ITD method merges the spectral, spatial-contextual and structural information derived from the photogrammetric multispectral data into to a fuzzy membership map, where trees are delineated using the GVF-snake-based region-growing technique. The MRF-based estimation of membership values based on the context of pixels minimizes the intracrown spectral variance resulting from nonuniform illumination/shadowing in VHR optical data. Such an estimation forces the membership values within a class (i.e., crown or ground) to be homogeneous, and this allows the GVF snake region-growing algorithm to detect tree crowns accurately. In addition, the crown ridges/boundaries (detected using the marker-controlled watershed algorithm) integrated into the fuzzy map restricts the growth of the GVF snake algorithm and prevents associating sections of spectrally similar neighbouring crown sections to the considered crown, while the WS-ITD and the BF-ITD rely on the Gaussian smoothing technique to minimize the undesirable effects of nonuniform illumination/shadowing on crown delineation. However, such smoothing results in loss of edge information in the optical data and causes delineation errors. It is evident from Table 3 that the S3-ITD method delineates crowns with high IoU and small CAD for all the eight reference plots, compared with the WS-ITD and the BF-ITD methods. Despite the improved performance of the S3-ITD method over the WS-ITD and the BF-ITD methods, the crown delineation performance of the S3-ITD method is found to vary across the eight plots. This variance in performance is attributed to the difference in forest complexity within these plots, largely in terms of crown proximity and tree density. In particular, plots with a lower number of trees (Plots 1, 5, and 7) are shown to have higher delineation errors compared with denser plots. This is because regions along the boundary of individual tree crowns, which are defined by the ridge lines, becomes small when (a) the trees' crowns are further apart from one another and (b) there is a fewer number of neighbouring trees. It is also worth noting that ridge lines act as good attractors of the GVF snake contour (as they produce prominent edges in the edge map), and thereby reduce both over and underestimation of crown span/boundary. For example, there are more ridge lines in Plots 2, 3, 4, 6, and 8 (due to the presence of relatively more trees and high crown proximity) and they are associated with higher performance in crown delineation. The proposed S3-ITD approach resulted in the highest average IoU of 0.83, while the average IoU for the WS-ITD and the BF-ITD are only 0.73 and 0.77, respectively. In addition, the lowest average CAD of 0.13 m 2 proves the ability of the S3-ITD approach to minimize both the under and overestimation of crowns compared with the WS-ITD and the BF-ITD approaches. In addition, it is a fact that the camera location and reprojection errors induce geometric distortions in tree crowns, and consequently result in tree parameter estimation errors. In our case, the average variance camera location and reprojection errors are as low as 3 m and 0.35 pixels, respectively. This means that that the average distortion in crown shape is also as low as 1.4 cm (for a GSD of 4 cm) and, hence, results in minimal or no impact on the estimated crown parameters.
We also evaluated the potential of the S3-ITD method to estimate stem DBH. The lowest mean error (ME), mean absolute error (MAE), and root mean squared error (RMSE) accuracy of estimated DBH in Table 4 proves the potential of the S3-ITD method to accurately estimate stem DBH over the state-of-the-art counterparts. The S3-ITD method achieves accurate DBH estimation by minimizing the errors associated with the crown-span and crown-height measurements. In forestry, low errors in the estimation of biophysical parameters such as DBH is very advantageous for effective monitoring and analysis of large forests. Most state-of-the-art ITD algorithms fail to maximally exploit spatial and spectral information, resulting in parameter estimation errors. In this context, the S3-ITD method has great potential to be used as a fully automatic parameter estimation approach in operational forestry.
Nevertheless, it is also important to note that the proposed approach performs well only with conifers where the crown shape is approximately symmetrical and, hence, it is open to improvements. Thus, prospective future research directions include enabling the method to be used for deciduous tree-crown delineation as well maximally exploiting the textural and spatial information in tree point clouds.

Conclusions
An approach for spruce crown delineation was developed by exploiting spectral, spatial-contextual, and structural information in very-high-resolution (VHR) photogrammetric multispectral remote sensing data. We refer to this approach as the S3-ITD method. Fractional images of the crown class are derived using the Markov random field based fuzzy C-means classifier (FCM-MRF) to minimize the effect of intracrown spectral variance and the crown illumination/shadowing. Region growing performed using the gradient Vector field snake (GVF snake) algorithm on the ridge-integrated fractional map enabled precise crown delineation by allowing the joint exploitation of the spectral, spatial-contextual, and structural information in the photogrammetric VHR multispectral data. The improved average intersection over union (IoU) of 0.1 and 0.05 and the lower average absolute crown-area difference (CAD) of 0.8 m 2 and 0.2 m 2 , respectively, compared with the markercontrolled watershed segmentation (WS-ITD) and the bias-field segmentation algorithm (BF-ITD) methods on the eight white-spruce forest plots proves the ability of the S3-ITD approach to accurately delineate individual crowns in the VHR multispectral data. The S3-ITD method reduces the overall RMSE in diameter at breast height (DBH) estimates by 2.95 cm and 1.5 cm over the WS-ITD and the BF-ITD methods. The DBH estimation accuracy decreased with an increase in intracrown spectral variance and shadowing effects. However, the S3-ITD method minimizes the inaccuracies in crown delineation compared with its WS-ITD and BF-ITD counterparts.