Cross-Comparison of Individual Tree Detection Methods Using Low and High Pulse Density Airborne Laser Scanning Data

: Numerous individual tree detection (ITD) methods have been developed for use with airborne laser scanning (ALS) data to provide tree-scale forest inventories across large spatial extents. Despite the growing number of methods, relatively few have been comparatively assessed using a single benchmark forest inventory validation dataset, limiting their operational application. In this study, we assessed seven ITD methods, representing three common approaches (point-cloud-based, raster-based, hybrid), across coniferous forest stands with diverse structure and composition to understand how ITD and height measurement accuracy vary with method, input parameters and data, and stand density. There was little variability in accuracy between the ITD methods where the average F-score and standard deviation ( ± SD) were 0.47 ± 0.03 using a lower pulse density ALS dataset with an average of 8 pulses per square meter (ppm 2 ) and 0.50 ± 0.02 using a higher pulse density ALS dataset with an average of 22 ppm 2 . Using higher ALS pulse density data produced higher ITD accuracies (F-score increase of 10–13%) in some of the methods versus more modest gains in other methods (F-score increase of 1–3%). Omission errors were strongly related with stand density and largely consisted of suppressed trees underneath the dominant canopy. Simple canopy height model (CHM)-based methods that utilized ﬁxed-size local maximum ﬁlters had the lowest omission errors for trees across all canopy positions. ITD accuracy had large intra-method variation depending on input parameters; however, the highest accuracies were obtained when parameters such as search window size and spacing thresholds were equal to or less than the average crown diameter of trees in the study area. All ITD methods produced height measurements for the detected trees that had low RMSE (<1.1 m) and bias (<0.5 m). Overall, the results from this study may help guide end-users with ITD method application and highlight future ITD method improvements.


Introduction
Accurate forest inventories are necessary for sustainable forest management and supply chain planning. For more than a century, manually collected subsampling of tree conditions across a landscape has driven forest inventory. These conventional methods are time-and accuracy-limited [1] and are often cost-prohibitive to execute at large scales [2]. Numerous methods utilizing airborne scanning light detection and ranging (LiDAR), commonly referred to as airborne laser scanning (ALS), have been developed to predict stand-level forest inventories and imputations of population-level summaries, and detect individual tree locations at various spatial extents [3][4][5][6]. The segmentation of single-trees from ALS is often referred to as individual tree detection (ITD) and multiple methods have been developed and broadly considered as potential tools for providing near-census forest inventories [7][8][9], improving growth and yield modeling [10][11][12], monitoring treescale growth and mortality [13], and optimizing harvest planning and operations [14]. A limited number of studies have assessed different ITD method approaches using a single benchmark dataset [15][16][17][18]. The study by [17] is notable as it evaluated eight ITD methods using high-density (5-121 ppm 2 ) ALS data in the Alps. However, cross-comparison of ITD methods to benchmark datasets associated with different forest structures remains relevant, especially given the rapid advances in ALS technology, computational power, and machine learning.
Individual tree detection methods using ALS data can generally be grouped into three major types based on what input data they use: point-cloud-based, raster-based, and hybrid approaches. Point-cloud-based approaches identify individual trees using a variety of methods that take advantage of tree crown shape and spacing (e.g., [19]) and vertical and horizontal return density changes in the ALS point cloud (e.g., [20]). Raster-based ITD methods rely on gridded data that provide summary information extracted from ALS point clouds, such as minimum and maximum height of all returns within a grid cell. Raster-based ITD methods typically use canopy height models (CHMs), which represent the normalized height of non-ground objects in each grid cell. Many ITD methods that use CHMs focus on techniques to find local maxima in the CHM, which are assumed to be treetops [21][22][23][24]. Hybrid ITD approaches utilize gridded point cloud data (e.g., CHMs) and information extracted from the original point cloud to identify individual trees. In some approaches, CHMs are used to initially identify trees/tree clusters and the point cloud is used to further separate those clusters into individuals [9,25].
Numerous ITD approaches have been proposed and discussed throughout the literature in both formal and informal reviews [3,23,[26][27][28][29][30]; however, there are a limited number of method comparison and benchmark studies available and these have been predominantly focused on forests in Europe (e.g., [15][16][17][18]31]). Many of these comparison studies have employed data ≥10 years old and have not assessed data from contemporary ALS sensors. These prior studies found that regardless of ITD method, dominant trees (trees in dominant and codominant canopy-height positions) have higher detection rates than subdominant trees (trees in intermediate and suppressed canopy-height positions), owing to smaller trees being occluded by the dominant forest canopy [32,33]. Some benchmarking studies found that choice of method was the dominant factor of ITD accuracy [16], while others found that stand density and structure were more important [17,31]. Furthermore, ALS pulse density has been found to improve ITD accuracy in some studies [18,34] but not others [16], with discrepancies potentially resulting from the sensor model, acquisition flight parameters, and/or laser-scanning pattern [3,35].
While prior benchmarking has provided valuable comparison information on ITD approaches, uncertainties remain for forests within North America (e.g., [4]), partly as a result of advances in ALS technology and processing tools. In particular, there is an outstanding need for testing and comparing ITD methods in forests with high tree species diversity where crown shapes and structure vary significantly, like many forests found in North America. Additionally, the effects of input-parameter selection and methodology on the resulting accuracy are not well-understood. As different ITD methods become incorporated into freely available processing tools and software (e.g., LidR [36], rLiDAR [24], FUSION [37]), it is increasingly important to assess how the accuracy of these methods compare with commercial products that may not provide the user with detailed descriptions of the underlying methodologies (e.g., grey-or black-box methods) to help guide end-user decisions and application.
The need for accurate, timely, and cost-effective forest inventory information has continued to increase globally as end-users decide actions to meet objectives on forested landscapes, and developers identify improvements to ITD methods. Therefore, the objective of this study was to understand how ITD accuracy varies by method, input parameters, and pulse density of the input ALS data across coniferous forest stands with diverse stand structure and composition within the inter-mountain West, USA. Seven ITD methods, representing three common approaches (point-cloud-based, raster-based, and hybrid), were assessed for ITD and height measurement accuracy via comparison with a single benchmark forest inventory dataset in north-central Idaho, USA. All methods were tested using a lower pulse density ALS dataset with an average of 8 pulses per square meter (ppm 2 ) and a higher pulse density ALS dataset with an average of 22 ppm 2 , and accuracies were reported using a suite of standard metrics.

Study Area
This study was conducted in the University of Idaho Experimental Forest (UIEF), which is located on the Palouse Range in north-central Idaho, USA ( Figure 1). The UIEF is a mixed-conifer temperate forest and dominant tree species include Pseudotsuga menziesii (Mirb.) Franco var. glauca (Beissn.) Franco (Douglas fir), Abies grandis (Douglas ex D. Don) Lindl. (grand fir), Thuja plicata Donn ex D. Don (western redcedar), and Larix occidentalis Nutt. (western larch). Other species within the study area include Pinus ponderosa C. Lawson var. scopulorum Engelm. (ponderosa pine), Pinus contorta Douglas ex Louden (lodgepole pine), Pinus monticola var. minima Lemmon (western white pine), and Picea engelmannii var. glabra Goodman (Engelmann spruce). The species within the study area have high crown shape and structure diversity ranging from narrow, pyramid-like crowns (e.g., Pseudotsuga menziesii, Larix occidentalis), to rounded cylindrical crowns (e.g., Pinus ponderosa, Pinus contorta), to irregularly shaped crowns (e.g., Thuja plicata). The UIEF has a diverse range of stand structures and composition ( Figure 1) resulting from its multi-use management for timber, research, and education purposes. Prior studies have used this location to evaluate ITD and stand-level forest structural attributes due to the variability of stand structure and species (e.g., [4,23,38,39] hybrid), were assessed for ITD and height measurement accuracy via comparison with a single benchmark forest inventory dataset in north-central Idaho, USA. All methods were tested using a lower pulse density ALS dataset with an average of 8 pulses per square meter (ppm 2 ) and a higher pulse density ALS dataset with an average of 22 ppm 2 , and accuracies were reported using a suite of standard metrics.

Study Area
This study was conducted in the University of Idaho Experimental Forest (UIEF), which is located on the Palouse Range in north-central Idaho, USA ( Figure 1). The UIEF is a mixed-conifer temperate forest and dominant tree species include Pseudotsuga menziesii (Mirb.) Franco var. glauca (Beissn.) Franco (Douglas fir), Abies grandis (Douglas ex D. Don) Lindl. (grand fir), Thuja plicata Donn ex D. Don (western redcedar), and Larix occidentalis Nutt. (western larch). Other species within the study area include Pinus ponderosa C. Lawson var. scopulorum Engelm. (ponderosa pine), Pinus contorta Douglas ex Louden (lodgepole pine), Pinus monticola var. minima Lemmon (western white pine), and Picea engelmannii var. glabra Goodman (Engelmann spruce). The species within the study area have high crown shape and structure diversity ranging from narrow, pyramid-like crowns (e.g., Pseudotsuga menziesii, Larix occidentalis), to rounded cylindrical crowns (e.g., Pinus ponderosa, Pinus contorta), to irregularly shaped crowns (e.g., Thuja plicata). The UIEF has a diverse range of stand structures and composition ( Figure 1) resulting from its multiuse management for timber, research, and education purposes. Prior studies have used this location to evaluate ITD and stand-level forest structural attributes due to the variability of stand structure and species (e.g., [4,23,38,39]). July 2020, is used as background to highlight the variability of forest stands within the study area. Pink-colored areas represent recently harvested areas and newly planted stands. Lightgreen areas represent younger, more homogenous stands. Dark-green speckled areas represent older, more heterogeneous stands.

ALS Data and Preprocessing
Two ALS datasets were used in this study. The first ALS dataset was acquired across the study area in the summer of 2018. These data were collected with a Teledyne Geospatial Optec Galaxy PRIME sensor (Teledyne Geospatial, Toronto, ON, Canada) mounted on a fixed-wing aircraft and collected under the guidance of the Teledyne Geospatial Swath-TRAK™ software to achieve a 50% flight-line overlap with respect to the 60-degree sensor field-of-view. SwathTRAK™ software uses plane pitch, roll, and yaw for optimization of flight line efficiencies by varying the field-of-view to account for areas with highly variable terrain. Elevation of the aircraft varied between 3550 and 4200 m above ground level and flight lines alternated orientations while maintaining an average pulse density of 8 pulses per square meter. The average per-pulse return rate over forested areas was two. These ALS data were processed using GeoCue LP360 (GeoCue, Huntsville, AL, USA) and Terrasolid TerraScan software (Terrasolid, Helsinki, Finland) and returns were classified as bare earth and vegetation following the 2019 United States Geological Survey, 3D elevation program (3DEP) specification [40] before being delivered as 1000 m 2 tiles.
The second ALS dataset was acquired across the study area in the summer of 2021, using a RIEGL VQ-1560II sensor (RIEGL, Horn, Austria) mounted on a fixed-wing aircraft fitted with a gyro-stabilized mount. Elevation of the aircraft was maintained between 1700 and 1900 m above ground level and flight lines alternated orientations while maintaining a 55% flight-line overlap with respect to a 58-degree sensor field-of-view. The average pulse density was 22 pulses per square meter and the average per-pulse return rate over forested areas was four. These ALS data were preprocessed by the supplier to normalize laser intensity using the RIEGL RiPROCESS software, and returns were classified as bare earth, vegetation, water, buildings, and noise before being delivered as 500 m 2 tiles.

Field Validation Dataset
Sixty-seven 13 m radius forest inventory plots were established in the summer of 2020 following a stratified sampling design ( Figure 1). The plot sampling strategy was designed to capture the variation in structure and composition across the study area and was guided by structural metrics derived from the ALS point clouds. All plots were precisely located using a Triumph-2 global navigation system (JAVAD, San Jose, CA, USA) and Samsung Galaxy tablet with a Rover pole. All trees > 1.5 m in height were stem-mapped using azimuth and distance from plot center using a Vertex Laser Geo 360 (Haglof, Sweden). In addition to location, each tree was measured for height, DBH, species, live/dead status, and canopy position (i.e., dominant, codominant, intermediate, and suppressed). Canopy position definitions were adapted from [41], where dominant trees receive light from 3-4 directions, codominant trees receive light from 1-2 directions, intermediate trees receive light mainly from the top, and suppressed trees are entirely shaded and underneath the stand canopy. Tree measurement summary statistics are presented in Table 1.

Individual Tree Detection Methods
Seven ITD methods, representing three common approaches (point-cloud-based, raster-based, hybrid), were assessed for individual tree detection and height measurement accuracy. Each method can produce widely varying results depending on the input data and parameters. To assess which parameter combinations were the most accurate, we tested each method using both ALS datasets and multiple parameter inputs. The following sections describe the input data and parameters used for each method approach and Table 2 provides a summary of the method approaches. For a detailed description of each method, the reader should consult each method's associated citation.

Point-Cloud-Based Approach
We tested a point-cloud-based approach developed by Li et al. [19]. This approach segments the ALS point cloud in a top-to-bottom manner, starting with the highest return to the lowest. Returns are segmented into individual trees by including and excluding returns based on user-defined horizontal spacing thresholds, suggested to be approximately equal to the average crown radius. The algorithm proceeds 'down' the tree from the highest return by including points that fall within the user-defined horizontal spacing thresholds and excluding those that fall outside. Once all points within the spacing threshold have been classified, the segmented points are removed, and the method moves to the next highest return in the point cloud. This method was assessed using both ALS datasets and adaptive spacing thresholds of 0.

Canopy-Height-Model-Based Approaches
Two of the assessed CHM-based methods employ local max filters (LMF) with fixed radius moving search windows to locate individual trees. The 'lmf' function in the 'lidR' package [36] and the 'FindTreesCHM' function in the 'rLiDAR' package [42] were tested in this paper. Both methods use a moving window of user-defined size to identify peaks in the CHM, which are assumed to be treetops. The rLiDAR LMF uses a square search window sized in pixels, whereas lidR LMF can use either a circular or square search window sized in map units. The lidR LMF method was assessed using circular and square search windows of 1.5 m, 2.5 m, and 3.5 m. The rLiDAR LMF method was assessed using square search windows of 1.5 m, 2.5 m, and 3.5 m. These search window sizes are centered on the average crown diameter of dominant and codominant trees on the Palouse Range [20].
A CHM-based method that uses a local max filter with a variable radius search window filter (VWF), developed by Popescu and Wynne [22], was also tested. This method was implemented using the 'ForestTools' R package in R [43]. This method passes a series of local max filters over the CHM, where the size of each local max filter is based upon the allometric relationship between tree height and tree crown diameter. We tested two allometric equations, the default in the 'ForestTools' package (crown diameter = (0.06 × height + 0.5) × 2) and an allometric relationship developed for the Palouse Range (crown diameter = 0.14 × height + 2.56) [4].
Spatial wavelet analysis (SWA), a method that has been applied in prior studies to identify individual trees using ALS-derived CHMs [4,23], was also tested. SWA has been widely applied to aerial photography and other imagery to assess crown widths across a range of scales from individual shrubs to large trees; the method is described in detail in the literature [44][45][46][47][48]. SWA passes a series of sequentially larger two-dimensional Mexican hat wavelet functions over the input CHM. Mexican hat wavelets are used as they are similar in shape to a conifer crown in a CHM. The maximum wavelet coefficient that is associated with a given step-size is then assumed to equal the crown width. After all wavelet sizes have been passed over the CHM, location, diameter, and height can be extracted from the CHM using the retained wavelets. We tested wavelet sizes from 1 to 8 m in 0.1 m increments.
We also tested a watershed segmentation method [49] implemented using the 'TreeSeg' program in FUSION [37]. Watershed algorithms invert CHMs so that trees/tree clusters appear as basins. Conceptually, these basins are 'filled' with water and edges of each basin are established where water overflows into the next basin. Pixels within each basin are assigned the same identification code and are assumed to be an individual tree or tree cluster. Location and height for each tree/tree cluster can then be extracted from each uniquely identified basin.
All CHM-based methods were assessed using 10 cm, 25 cm, and 50 cm resolution CHMs derived from the 22 ppm 2 ALS dataset and 25 cm and 50 cm CHMs derived from the 8 ppm 2 ALS dataset. The 8 ppm 2 ALS dataset had insufficient pulse density for producing 10 cm resolution CHMs, and thus only 25 cm and 50 cm CHMs were used. All CHMs were generated using pit-free methodology detailed in Khosravipour et al. [50]. While prior studies have found the pit-free CHM methodology to be superior to other methods in coniferous forests (e.g., [24,51]), other methods including Gaussian smoothing may produce better results in certain instances.

Hybrid Approach
A hybrid approach, the ForestView ® ITD algorithm developed by Northwest Management Incorporated (NMI, Moscow, Idaho) was also assessed [8,9]. This approach iterates through several CHM-based methods, similar to watershed segmentation and local max filtering, to detect peaks in the CHM, which are assumed to be treetops. To refine individual tree detection and derive tree attribute information, ForestView ® calculates structure-related metrics (e.g., height percentiles, stratified return densities, crown shape) from the ALS point cloud for each tree. Tree attributes for each detected tree are modeled using the ALS-derived structural metrics, LiDAR intensity values, and an internal database of field and ALS-measured stem mapped trees, each with species, height, DBH, crown condition, and taper information. Along with location and height, ForestView ® also provides individual tree DBH, volume, species, and live/dead status.

Matching ALS Detected and Reference Trees
An automated approach was used to match ITD method detected trees and reference trees in the field validation dataset. For each ALS detected tree, a candidate reference tree list was generated by selecting reference trees within 2.5 m of the detected tree, a distance that represents the average crown diameter of dominant and codominant trees on the Palouse Range [4]. The Euclidean distance and difference in height between each reference tree in the list and the ALS detected tree were calculated. Reference trees in the candidate list were removed from consideration if their height difference with the ALS detected tree was greater than 2 m. This height difference threshold corresponds to the observed RMSE in height between ALS estimated and field measured conifers [52]. The ALS detected tree was recorded as a commission error if no reference trees met the height difference criteria. The reference tree with the smallest combined error (Euclidean distance and height difference) was matched with the ALS detected tree. Given that each of the distance and height can be considered independent errors, the combined error was calculated as the square root of the sum of the squared errors: i.e., ∆k = √ [(∆k1) 2 + (∆k2) 2 ] following standard approaches [53]. In cases where multiple reference trees had the same combined error, one was randomly selected to be matched with the detected tree. Reference trees that were not matched with ALS detected trees were recorded as omission errors.

Accuracy Assessment
Individual tree detection accuracy for each of the methods was assessed using standard metrics that quantify commission and omission errors. Specifically, recall (r), precision (p), and F-score [19] were used: where TP = correct detection, FN = false negative (omission error, or trees missed by the ITD method), and FP = false positive (commission error, or trees identified by the ITD method but which do not exist). Recall is inversely related to omission error, precision is inversely related to commission error, and F-score is the harmonic mean of recall and precision. All three metrics range from 0 to 1, with values closer to 1 representing higher accuracy.
To quantify the effects of forest structure on method accuracy, the relationships between accuracy metrics and common stand density metrics were evaluated using linear regression. Standard metrics and coefficients including regression slope, coefficient of determination (r 2 ), and regression significance (p-value) were reported. Stand density index [54] and other metrics used in this analysis are presented in Table 3. Regression-based equivalence tests [55] were used to assess the accuracy of tree height derived from each ITD method. These tests evaluate regression slope and intercept equality between paired sets of data. Intercept equality implies that the mean of one dataset is not significantly different than the mean of another dataset. Slope equality implies that the regression slope is not significantly different than 1. Following similar studies [4,8,9,55], the region of equality was set to ±25% of the mean for the intercept and ±25% for the slope. The null hypothesis of dissimilarity is rejected if the region of equivalence contains two joint one-sided 95% confidence intervals for the intercept or slope. Additional accuracy measures, including average root mean square error (RMSE) and mean bias were calculated between ALS-derived and reference tree height measurements as follows: where (x) are the predicted values, x i are the observed values, and n is the number of observations. All statistical analyses were conducted in R [56], and the 'equivalence' R package [57] was used to conduct the regression-based equivalence tests.

Individual Tree Detection Accuracy
The individual tree detection results are presented in Table 4. Across all method approaches, recall had an average (±SD) value of 0.38 ± 0.07 using the 8 ppm 2 ALS data and an average of 0.41 ± 0.1 using the 22 ppm 2 ALS data. Precision was higher with an average value of 0.66 ± 0.1 using the 8 ppm 2 ALS data and an average of 0.65 ± 0.1 using the 22 ppm 2 ALS data. Across all method approaches, ITD accuracy in terms of F-score increased 2.6% on average using the higher pulse density ALS data versus the lower pulse density ALS data. ForestView ® , SWA, and watershed methods benefited the most from higher pulse density ALS data, with F-scores increasing over 10% on average. Methods including [19], lidR LMF, rLiDAR LMF, and VWF benefitted the least from higher pulse density ALS data, with F-scores increasing 1.2% on average.
The highest accuracy method approaches, in terms of F-score, are highlighted in Table 4. Variation in accuracy among the top seven method approaches was low (8 ppm 2 ALS data F-score SD = 0.03 and 22 ppm 2 ALS data F-score SD = 0.02). F-score ranged from 0.42 to 0.51 for the lower pulse density ALS data and ranged from 0.47 to 0.52 for the higher pulse density data. The CHM-based fixed search window methods (lidR LMF and rLiDAR LMF) were two of the highest accuracy methods for both the low and high pulse density ALS datasets. These methods used search windows smaller than the average crown diameter for the study area (i.e., <2.5 m). The most accurate CHM-based method approaches tested using the 8 ppm 2 ALS data used 25 cm CHMs whereas the most accurate approaches tested using the 22 ppm 2 ALS data used 50 cm CHMs. The observed differences in accuracy between the different CHM resolutions were small (F-score difference of 0.01-0.02).
Omission errors, or missed trees, for all methods, regardless of ALS data, were predominately associated with intermediate and suppressed trees. The distributions of correct detections and omissions or false negatives (FN) for the seven most accurate ITD method approaches highlighted in Table 4 and derived from the 8 ppm 2 and 22 ppm 2 ALS datasets are presented in Figures 2 and 3. Omission errors, or missed trees, for all methods, regardless of ALS data, were predominately associated with intermediate and suppressed trees. The distributions of correct detections and omissions or false negatives (FN) for the seven most accurate ITD method approaches highlighted in Table 4 and derived from the 8 ppm 2 and 22 ppm 2 ALS datasets are presented in Figures 2 and 3.

Effects of Stand Density on ITD Method Accuracy
The relationships between ITD accuracy metrics, averaged across the top seven methods for each ALS dataset, and stand density metrics are presented in Table 5. Recall for all ITD methods decreased with increasing stand density, implying higher omission errors

Effects of Stand Density on ITD Method Accuracy
The relationships between ITD accuracy metrics, averaged across the top seven methods for each ALS dataset, and stand density metrics are presented in Table 5. Recall for all ITD methods decreased with increasing stand density, implying higher omission errors in denser stands. Average recall for the top seven methods had negative relationships with all stand density metrics (Table 5). Recall for the top seven method approaches stratified by canopy cover is presented in Figure 4. Precision was poorly related (r 2 < 0.07) with all of the stand density metrics (Table 5).   Table 4 and derived the (a) 8 ppm 2 ALS dataset and (b) 22 ppm 2 ALS dataset and stratified by ALS-derived canopy c Red crosses represent median recall values across all plots.

ITD Method Height Measurement Accuracy
The average height measurement RMSE for the top ITD method approaches der from the 8 ppm 2 ALS dataset was 1.01 ± 0.05 m and average bias was −0.35 ± 0.09 m. H bias was positively correlated with both basal area per hectare (r 2 = 0.20, p < 0.01) and s density index (r 2 = 0.16, p < 0.01) ( Table 5). The average height measurement RMSE fo  Table 4 and derived from the (a) 8 ppm 2 ALS dataset and (b) 22 ppm 2 ALS dataset and stratified by ALS-derived canopy cover. Red crosses represent median recall values across all plots.

ITD Method Height Measurement Accuracy
The average height measurement RMSE for the top ITD method approaches derived from the 8 ppm 2 ALS dataset was 1.01 ± 0.05 m and average bias was −0.35 ± 0.09 m. Height bias was positively correlated with both basal area per hectare (r 2 = 0.20, p < 0.01) and stand density index (r 2 = 0.16, p < 0.01) ( Table 5). The average height measurement RMSE for the top method approaches derived from the 22 ppm 2 ALS dataset was 0.80 ± 0.03 m with an average bias of 0.12 ±0.17 m. Height RMSE was positively related to basal area per hectare (r 2 = 0.23, p < 0.01) and stand density index (r 2 = 0.20, p < 0.01) ( Table 5). Likewise, height bias had positive relationships with both basal area per hectare (r 2 = 0.20, p < 0.01) and stand density index (r 2 = 0.17, p< 0.01). Figures 5 and 6 present results from the regression-based equivalence tests. For all paired datasets (top ITD methods using 8 ppm 2 ALS data vs. reference and top methods using ALS 22 ppm 2 data vs. reference), the 95% confidence interval for the intercept (vertical grey bar) was within the ±25% region of equivalence (grey polygon), indicating that the null hypothesis of dissimilarity can be rejected (i.e., paired datasets are equivalent). Likewise, for all paired datasets, the 95% confidence interval for the slope (vertical black bar) was within the ±25% region of equivalence (grey dashed lines), indicating slopes were significantly similar to 1.

R PEER REVIEW
13 of 19 Figure 5. Equivalence test graphs for reference tree height versus tree height from top method approaches derived from the 8 ppm 2 ALS dataset highlighted in Table 4. The grey polygon represents the ±25% region of equivalence for the intercept. The ALS-derived and cruised heights are equivalent to the felled height when the vertical red bar is completely within the grey polygon. The grey dashed lines represent the ±25% region of equivalence for the slope. If the vertical black bar is within the grey dashed lines, then the regression slope is significantly similar to 1. The solid black line represents the best-fit linear regression model. The method shown in (b) refers to [19]. Figure 5. Equivalence test graphs for reference tree height versus tree height from top method approaches derived from the 8 ppm 2 ALS dataset highlighted in Table 4. The grey polygon represents the ±25% region of equivalence for the intercept. The ALS-derived and cruised heights are equivalent to the felled height when the vertical red bar is completely within the grey polygon. The grey dashed lines represent the ±25% region of equivalence for the slope. If the vertical black bar is within the grey dashed lines, then the regression slope is significantly similar to 1. The solid black line represents the best-fit linear regression model. The method shown in (b) refers to [19].
proaches derived from the 8 ppm 2 ALS dataset highlighted in Table 4. The grey polygon represents the ±25% region of equivalence for the intercept. The ALS-derived and cruised heights are equivalent to the felled height when the vertical red bar is completely within the grey polygon. The grey dashed lines represent the ±25% region of equivalence for the slope. If the vertical black bar is within the grey dashed lines, then the regression slope is significantly similar to 1. The solid black line represents the best-fit linear regression model. The method shown in (b) refers to [19]. Figure 6. Equivalence test graphs for reference tree height versus tree height from top method approaches derived from the 22 ppm 2 ALS dataset highlighted in Table 4. The grey polygon represents the ±25% region of equivalence for the intercept. The ALS-derived and cruised heights are equivalent to the felled height when the vertical red bar is completely within the grey polygon. The grey dashed lines represent the ±25% region of equivalence for the slope. If the vertical black bar is within the grey dashed lines, then the regression slope is significantly similar to 1. The solid black line represents the best-fit linear regression model. The method shown in (b) refers to [19].

Discussion
This study sought to cross-compare seven different ITD approaches with reference to a common validation dataset and as we expand on below, ITD accuracy metrics that account for both omission and commission errors (e.g., F-score) did not vary considerably by method. Notably, despite the diversity of the applied methods and parameters, the results were broadly similar with comparable accuracies obtained. While F-score was Figure 6. Equivalence test graphs for reference tree height versus tree height from top method approaches derived from the 22 ppm 2 ALS dataset highlighted in Table 4. The grey polygon represents the ±25% region of equivalence for the intercept. The ALS-derived and cruised heights are equivalent to the felled height when the vertical red bar is completely within the grey polygon. The grey dashed lines represent the ±25% region of equivalence for the slope. If the vertical black bar is within the grey dashed lines, then the regression slope is significantly similar to 1. The solid black line represents the best-fit linear regression model. The method shown in (b) refers to [19].

Discussion
This study sought to cross-compare seven different ITD approaches with reference to a common validation dataset and as we expand on below, ITD accuracy metrics that account for both omission and commission errors (e.g., F-score) did not vary considerably by method. Notably, despite the diversity of the applied methods and parameters, the results were broadly similar with comparable accuracies obtained. While F-score was similar across the methods, detection rates, or the proportion of reference trees that were detected, did vary by canopy position across methods. All methods had similar detection rates of dominant trees but ForestView ® , lidR LMF, rLiDAR LMF, and watershed methods had greater detection rates for codominant and subdominant trees (intermediate and suppressed). For applications where dominant trees are the focus, it could be argued that the selection of a specific ITD method is somewhat arbitrary and may rely more on user preference. This is important, as it means that more research can focus on exploring the applications regardless of the specific selected approach and cross-comparisons of application studies that use different ITD methods may be more readily achievable than previously considered. Such applications can include, but are not limited to: quantifying wildlife habitats [58][59][60]; assessing crown fire risk [61,62]; reducing uncertainties associated with estimating carbon stocks due to forest crown consumption through exploring the relative number of tree crowns that form standing dead biomass (snags) versus the formation of fallen large woody debris following fires [63][64][65]; developing new allometric relations using structural datasets and reducing uncertainties in the stand to landscape scale assessments in carbon monitoring, forest growth, and yield [10,66,67]; assessing structural metrics of fire severity [68][69][70][71]; assessing metrics of ecosystem vulnerability [72]; and mapping urban forests [73].
Overall, F-score accuracies varied little between the different methods, ranging from 0.42 to 0.51 for the 8 ppm 2 ALS data and 0.47 to 0.52 for the 22 ppm 2 ALS data with regard to the top seven best performing methods. This similarity is likely influenced by the consistent 'top-down' approach of the methods, where dominant overstory trees are identified but many subdominant trees are occluded by overstory canopies. ITD accuracy increased with increasing pulse density across all methods, regardless of method type (point-cloud-based, raster-based, hybrid). This is in contrast with several prior studies where increasing pulse density increased ITD accuracy for mainly point-cloud-based methods and not rasterbased methods [18] and those that did not observe a significant effect of increasing pulse density on ITD accuracy [16]. While ITD accuracy increased with increasing pulse density, results show that some methods (ForestView ® , SWA, watershed) benefited more than other methods ( [19], lidR LMF, rLiDAR LMF, VWF).
In part due to the top-down design of the assessed methods, most undetected trees (omission errors) were in intermediate and suppressed canopy positions (Figures 2 and 3). All seven top ITD methods detected most dominant trees (66% on average for 8 ppm 2 ALS dataset and 71% on average for the 22 ppm 2 ALS dataset). Only ForestView ® , lidR LMF, rLiDAR LMF, and watershed detected a majority of codominant trees using both ALS datasets. Other benchmarking studies using similar ITD methods observed higher detection for dominant (85%) and codominant trees (70%) [18], although stand density was lower (median: 353 trees hectare −1 ) than the stands assessed in this study (median: 509 trees hectare −1 ). Additionally, the stand composition, crown shape variation, and structural heterogeneity in the study area were likely greater than many prior benchmark studies given the higher number of species and diverse management history. Across all methods and for both ALS datasets, detection of intermediate trees was less than 31% and detection of suppressed trees was less than 13%. This is the direct result of trees in subdominant canopy positions being occluded by the dominant forest canopy [32,33]. Detection of dominant and codominant trees using the 22 ppm 2 ALS dataset was 4% higher on average, compared to methods using the 8 ppm 2 data. Higher detection rates using ALS datasets with higher pulse densities have been observed in other North American and European forests (e.g., [18,74]), but in some cases higher pulse density has not improved ITD [16]. CHM-based top ITD methods that used a fixed search window size (lidR LMF and rLiDAR LMF) achieved the highest detection across all canopy classes using the 22 ppm 2 ALS dataset and performed well using the 8 ppm 2 ALS dataset. Other ITD method comparisons have also found that simple local maximum-finding can be one of the most accurate and easy-to-implement methods [16,17]. However, these simple raster-based methods may not perform as well as point-cloud-based methods for subcanopy tree detection [18,74].
There was large intra-method accuracy variability depending on the input datasets and parameters. ITD methods that used spacing thresholds (i.e., [19]) or LMF search window sizes (i.e., lidR LMF, rLiDAR LMF) smaller than the average crown diameter tended to have higher commission error, while those with larger thresholds/windows tended to have higher omission error. Generally, ITD methods that used spacing thresholds or LMF search window sizes that were equal to or less than the average crown diameter for study area trees (2.5 m) were the most accurate. While most raster-based ITD methods applied to the 8 ppm 2 ALS data produced improved results with a 25 cm CHM over a 50 cm CHM, the difference was minimal (average F-score difference < 1%). Likewise, most CHM-based ITD methods using the 22 ppm 2 ALS data produced more accurate results when applied to a 50 cm CHM than to a 25 cm CHM, but the difference was small (average F-score difference < 4%). These particular accuracy results may be owing to differences in the pulse density created by the differing scanning patterns of the two ALS datasets. The 2018 dataset was acquired in a sawtooth, or 'zig-zag', pattern and likely has higher pulse density at the edges of flight paths where scans overlap and lower pulse density in between flight paths. The 2021 dataset was acquired in an equidistant pulse-spread, or 'matrix', scan that produces more consistent pulse densities across the flight path. We did not apply any point cloud thinning methods to homogenize the pulse density across either of the point clouds, but clearly this is an issue that should be explored in future work as pulse density can have significant effects on individual tree detection accuracy [18,34].
Like other ITD method comparison studies (e.g., [4,31]), stand density was a larger driver of method accuracy than the individual methods themselves. Typically, as stand density increases, space in between tree crowns decreases and more overlapping tree crowns are present, complicating identification of individual trees. Additionally, as the dominant forest canopy increases in density, more subdominant smaller trees are likely to become occluded, except where gaps within the canopy occur [32,33]. This suggests that non-raster-based ITD methods that operate directly on the ALS point cloud may be better suited for denser forest conditions [18,75].
All of the ITD methods produced height measurements for the detected trees that had low RMSE (<1.1 m) and bias (<0.5 m). Height bias was larger for methods using the 8 ppm 2 ALS dataset (mean bias: −0.35 m) than for the 22 ppm 2 dataset (mean bias: 0.12 m). The larger bias observed for the 8 ppm 2 ALS dataset is likely due in part to tree growth between the dates of ALS acquisition (2018) and reference-tree field data collection (2020). Generally, the mean height growth of mature mixed conifer species present within this study area is 0.4 m per year [39]. Regardless, all methods using either of the ALS datasets produced tree heights that were statistically similar to the reference tree heights.
This study primarily focused on assessing methods that use a 'top-down' approach to identify individual trees. The findings of this research outline the need for future benchmarking to include methods that utilize a 'bottom-up' approach (e.g., layer stacking methods in [20]) and methods that create separate CHMs for dominant and subdominant trees (e.g., [25]). However, it should be noted that detection rates of intermediate and suppressed trees (35% and 21%, respectively) reported in Duncanson et al. [25] are not substantially higher than those obtained by the 'top-down' methods in this study (31% of intermediate trees and 13% of suppressed trees).

Conclusions
This study assessed the individual tree detection and height measurement accuracy of different tree detection approaches via comparison with a benchmark dataset with diverse structure and composition. Generally, stand density and canopy cover were larger drivers of ITD accuracy than ITD method choice. However, results show that detection of trees in certain canopy positions can vary by method. All methods detected most trees with dominant canopy positions but ForestView ® , lidR LMF, rLiDAR LMF, and watershed methods had the highest detection proportion of codominant and subdominant trees and CHM-based local-maximum filter methods had the lowest omission errors across all tree canopy positions. This study additionally supports the value of ALS data with pulse densities > 8 ppm 2 for ITD, given higher pulse density data increased average F-scores by 10% for ForestView ® , SWA, and watershed methods and 1.2% for [19], lidR LMF, rLiDAR LMF, and the VWF methods. The results emphasize the importance of input parameter selection when applying ITD methods, as intra-method ITD accuracy variability was large depending on data and parameter selection. For CHM-based methods, ITD accuracy was generally highest using finer resolution CHMs (~25 cm) when lower pulse density ALS data (8 ppm 2 ) were available, which contrasted with CHM-based ITD results when higher pulse density ALS data were available (22 ppm 2 and a 50 cm resolution CHM). Likewise, CHM-based methods that utilized fixed size search windows achieved the highest accuracy with sizes equal to or less than the average tree crown diameter. All methods were comparable in terms of height measurements for the detected trees, but errors increased with increasing stand density. The results presented here should help end-users with the choice and application of these and similar ITD methods on available ALS datasets. Future benchmarking studies for ITD methods should consider more diverse methods (e.g., bottom-up approaches) that detect more subdominant trees and explore the influence of ALS scanning patterns (e.g., oscillating mirror vs. rotating polygonal mirror) on ITD accuracy.