Parameterization of the Individual Tree Detection Method Using Large Dataset from Ground Sample Plots and Airborne Laser Scanning for Stands Inventory in Coniferous Forest

: Highly accurate and extensive datasets are needed for the practical implementation of precision forestry as a method of forest ecosystem management. Proper processing of huge datasets involves the necessity of the appropriate selection of methods for their analysis and optimization. In this paper, we propose a concept for and implementation of a data preprocessing algorithm, and a method for the empirical veriﬁcation of selected individual tree detection (ITD) algorithms, based on Airborne Laser Scanning (ALS) data. In our study, we used ALS data and very extensive dendrometric ﬁeld measurements (including over 21,000 trees on 522 circular sample plots) in the economic and protective coniferous stands of north-eastern Poland. Our algorithm deals well with the overestimation problems of tree top detection. Furthermore, we analyzed segmentation parameters for the two currently dominant ITD methods: Watershed (WS) and Local Maximum Filter with Growing Region (LMF+GR). We optimized them with respect to minimizing the Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE). Additionally, our results show the crucial importance of the quality of empirical data for the correct evaluation of the accuracy of ITD algorithms.


Introduction
It is believed that the conceptual scope of the term "precision forestry" refers to both the application of modern techniques and technologies in the performance of specific forestry operations, and to the collection of real-time and truthful information to support decision-making processes [1][2][3][4].
Some authors directly link the concept of "precision forestry" with highly accurate geographical information [5,6]. For instance, the localization of single trees with sub-meter precision can be determined using a GPS (GNSS) instrument [7,8] or other electronic devices for accurate relative localization using a distance measurement to several points with known coordinates (e.g., Haglöf PosTex Measuring Instrument, produced by Haglöf in Langsle, Sweden) [9,10]. As a direct consequence of the possibility of obtaining precise geographic coordinates for individual tree location or forest ground plants-including old, magnificent trees and valuable or rare plant species-it is possible to protect them from negative impacts, for example, by omitting clear-cutting in a specific location. In the literature, some studies can be found that indicate the effect of conducting clear cuts on the decline of vascular plant species [11], including some indicator species of Old Growth Forests (OGFs) [12]. Indirectly, therefore, the "precision forestry" methods studied in this paper can support the conservation of valuable elements of the forest environment including habitats, plant communities and preserved areas of OGF.
The last twenty years have seen a rapid development of techniques for the efficient acquisition of precise remote sensing data. This fact has given rise to a technological revolution in dendrometry (tree biometry) and has introduced significant changes in the understanding of the relationships that shape forests. This development concerns many technologies of data acquisition in different spectral and radiometric ranges, but the most significant progress has taken place in measuring space and the environment with laser scanners [13][14][15][16][17].
Surveys of forest areas using laser scanners involve the use of different scanning platforms. Currently, the technology with tremendous potential for forest inventory is Airborne Laser Scanning (ALS), which offers adequate parameters for data collected (allowing modeling of selected stand parameters, i.e., number of trees, height, crown size, stem volume) from a large geographic extent [18][19][20]. A notably rich tradition of the practical application of ALS technology to acquire precise tree and stand data is attributed to the Nordic countries [21][22][23][24]. Technological developments have meant that scanners can be successfully mounted on unmanned aerial vehicles (UAVs), providing very dense point clouds, but from areas of significantly smaller spatial extent than ALS [25,26]. Studies of forest areas using remote sensing technologies are also performed stationary from the ground level by terrestrial laser scanning (TLS). This approach allows the acquisition of very precise data regarding individual trees (e.g., trunk shape, diameter at breast height (DBH)) in the surveyed sample plots (SPs). However, this technology has not become the leading one due to its low spatial operability compared to aerial scanning and the presence of other interfering factors (e.g., undergrowth limiting visibility). Still, it can provide an additional source of data [27].
The highly accurate and objective data obtained with this technique allow us to record completely new features of trees and stands and to measure the hitherto elusive dynamics of vast populations of individuals (many millions of trees) in a short time, which is different from the statistical-representational methods used so far [28,29]. These developments allow the practical application of the idea of precision forestry to protect forest ecosystems by limiting timber harvesting in susceptible areas [30] or the complete discontinuation of economic activities in small, irregular areas of Natura 2000 priority habitats [31]. Often, these priority habitats can be precisely delineated by only using precise models generated from ALS data. For example, this concerns the exceptionally valuable wet forest habitat, Vaccinio uliginosi-Pinetum (Natura 2000, habitat code: 91D0-2), which occurs in the examined area. It is undoubtedly beneficial to learn accurate local biometric models of trees and stands, balance their use based on real and precise data, and track their response to human activity to modify conservation and management actions quickly. In protected forests (ecosystems), the implementation of the concept of precision forestry allows us to closely follow natural processes and mimic them through the detailed planning and implementation of economic and protective actions [30], which will increase the safety of valuable habitats and species, especially fragments of forests (forest phytocoenoses) of exceptional importance for the sustainability of forests in general, for example, OGFs [12]. The rapid development of remote sensing technologies has significantly increased the availability of spatial data, including those freely available, often acquired with standard parameters [32], without taking into account the complexity of the local forest structure or specific research scope, for example, the detection of lying tree stems [33].
This attitude limits the use of fully automated methods for ALS data analysis, and a necessary step in their processing is the calibration stage based on local data acquired during field inventory. Therefore, it is crucial that analysis methods of ALS data are resistant to the often unknown parameters of point clouds. Furthermore, field inventory data for the calibration of these methods should be acquired very precisely, considering the local variability of ecosystems in the sample [34].
One of the main goals of ALS data analysis according to precision forestry theory is to detect trees, estimate their number in the stand, and measure biometric features as accurately as possible. In the last decades, many methods and analytical approaches dedicated to detecting individual trees have been introduced, referred to as ITD methods (Individual Tree Detection). We can distinguish two main approaches in the ITD methods in the delimitation of individual trees of the upper storey; these are direct point cloud analysis (3D clustering) [35] and the use of point cloud derived products-Canopy Height Models (CHMs) [36,37]. Whereas, the Single Tree Detection procedure, itself using laser scanning data, is most often done in two stages. First, the locations of potential tree tops are detected using local maximum filter methods, where different crown size-tree height relationship functions are applied. Then, the identified tree top locations form the basis for the delimitation of crown contours using the Watershed (WS) algorithm or by using the nearest neighbors method [38].
The main goal of our study was to investigate the effects of the parameters of popular ITD methods on the estimation error of the number of trees in stands with a predominance of coniferous species (Scots pine, Norway spruce) based on a large ALS dataset. In our study, we focused on the following methods: the WS algorithm and Local Maximum Filter + the Growing Region (LMF+GR) algorithm. Many studies have considered relatively limited ground sample datasets (a few tens of SPs at most) [39][40][41]. In this study, we used a large ground sample dataset including measurements of 21,830 trees on 522 SPs to minimize the uncertainty of our results. The large sample dataset also allowed us to divide the study population into subsets and investigate the influence of other factors (plot size, stand type, forest district, proportion of gaps and crown size) on the accuracy of the detection methods used.

Study Area
Poland, located in Central Europe, is covered with 29.6% forest (as of the end of 2019) [42]. Most of the forested areas are governed by State Forests (over 75%). Over 65% of the forest stands in Poland are coniferous.
We chose two administrative units of the State Forests as our research object, which constitute a vast forest complex of the Knyszyńska Primeval Forest with semi-natural character-and two neighboring Forest Districts: Supraśl andŻednia ( Figure 1). Knyszyńska Primeval Forest is located in the lowlands of Northeastern Poland. In this area there are economic and protected forests, nature reserves and Natura 2000 sites [43], which means temporary or permanent exclusion from management and submission to natural processes. It is worth mentioning that a part of the studied stands was damaged by a hurricane in 2016, which significantly disrupted the forest structure, but adds value to the experiment. The choice of the study area is also supported by its coverage by other research projects, involving the acquisition and analysis of ALS data and very extensive ground surveys: the two-phase research project Zajma I and II in theŻednia Forest District-140 SPs [44,45] and part of the national project RemBioFor in the Supraśl Forest District-500 SPs [46].

Field Inventory Data
We parameterized the two chosen ITD methods based on an extensive empirical dataset, including dendrometric measurements on randomly distributed circular ground sample plots. The dendrometric measurements were collected in two independent projects using a homogeneous measurement methodology. Data for the RemBioFor project (Supraśl Forest District) [47] were acquired in a regular grid of circular plots. The Zajma I and Zajma II (Żednia Forest District) experiment data were acquired from sample plots distributed randomly, employing a stratification layer method with selection. Measurements of the centers of each SP were taken with a static-mode geodetic-class Global Navigation Satellite System Trimble SPS 882 (GNSS, mean position error = 0.32 m). The measurements for the RemBioFor project were obtained from September to November of 2015. The measurements for the Zajma I and Zajma II experiments were obtained from January to March of 2018. The features of each sample plot were determined, including sample plot location, tree count in each sample plot (with DBH greater than 70 mm), and so forth. For each tree in a sample plot, dendrometric features were gathered, along with the tree's location (measurement in polar coordinates relative to the center of the sample plot), species of the tree, DBH [mm], tree height [m], crown visibility in terms of remote sensing (1-full, 2-partial, 3-no crown visibility) [46], and additional features (sharing common crown, high trunk deflection, etc.). We parameterized the examined ITD methods in the context of its effective implementation in the detection of coniferous stands that constitute about 68% of all stands in Poland, irrespective of ownership form [48]. For this reason, we selected plots with the majority of coniferous species in the upper story of the stand (trees with full crown visibility). An empirical statistical sample was constructed by selecting 522 out of 640 SPs with different radii (81% of all SPs). In the Supraśl Forest District 410 SPs with a radius of 12.62 m (500 m 2 ) were located, whereas in theŻednia Forest District 112 SPs, with radii ranging from 5.64 to 25 m, were established. A total of 21,630 trees were measured on SPs, of which 11,806 were trees with full crown visibility (55%), 2393 were trees with partially crown visibility (11%), and 7431 were canopy trees with crowns not visible from the aerial level (34%).
A characteristic feature of the analyzed stands is a complex vertical structure characterized by the 3-level crown visibility index adopted in our study. On the analyzed SPs, the upper forest layer of the stand (crown visibility 1) is dominated by Scots pine (Pinus sylvestris L.) and Norway spruce (Picea abies (L.) H.Karst), with a smaller share of oak (Quercus robur L.), birch (Betula pendula Roth), and other species. Spruce is the most abundant in the second, and undergrowth forest layer (crown visibility 2 and 3), accounts for about 31% of all observations. A large share of spruce in the lower forest layers is a characteristic feature of the stands of north-eastern Poland. As presented in Figure 2, despite a considerable share of trees in the lower forest layers, they do not have a significant share in total stem volume. The stem volume of trees with a crown visibility of 2 and 3 accounts for about 14%, and the stem volume of trees with a crown visibility of 3 accounts for only about 7% of the total stem volume. Despite minor differences between sets of circular plots (draw method, radius of circular plot) at the analyzed sites, there were no significant discrepancies in average tree characteristics. The average tree height was 21.52 m in the Supraśl Forest District and 22.12 m in theŻednia Forest District. Larger differences between the forest districts were noted in the case of the share of gaps in CHM, which is a result of the hurricane damage of 2016. The average value of this parameter is 8.76% in the Supraśl Forest District and 14.36% in theŻednia Forest District.

Airborne Laser Scanning Point Clouds
In this study, we used two sets of ALS data acquired under separate orders and at different times. However, most of the parameters of the obtained point clouds are similar. One contractor conducted both flights. The first dataset covers the Supraśl Forest District and comes from 2015, while the second scanning dataset was acquired for the Zajma Subdistrict-part of theŻednia Forest District. Table 1 summarizes the analyzed point clouds.
Both flights were conducted using a Vulcanair P-68 "Observer 2" light aircraft equipped with a Riegl LiteMapper LMSQ-680i laser scanning system. The collected data were processed in RiProcess software. The point cloud was stored in LAS format version 1.4. After XYZ series alignment, the accuracy of height and the XY coordinates are characterized by mean error m h ≤ 0.15 m and m p ≤ 0.20 m, respectively. The next step was point cloud classification using the TerraScan tool of the Terrasolid package. Points were assigned to classes according to the ASPRS standard [49]: 1-unclassified, 2-ground, 3-low vegetation, 4medium vegetation, 5-high vegetation, 6-buildings and engineering structures, 7-noise. The points in the cloud were assigned spectral values of the pixels in the color-infrared (CIR) images. Table 1. Technical parameters of the Airborne Laser Scanning datasets used in this study: airing date, mean scanning density (beams density on the scanning surface), mean point cloud density, mean density of single point returns (density of beams that resulted in a single point return in the dataset), mean flight altitude (both above ground level (AGL) and above mean sea level (MSL)), scanning coverage, number of strips of scanning, length of strips, data coverage area, and scanning angle (field of view (FOV)).

Homogenization of Field Dendrometric Measurements with Remotely Determined Trees
A tree matching procedure for linking the detection results to the reference field inventory data is crucial in modeling some detection parameters correctly, for example, the relationship between crown diameter and tree height. Various automatic procedures have been proposed in this field, but the reference is always a manual interpretation by experienced human interpreters [50]. We performed a manual spatial linking of groundbased relative location measurements of trees (trunk locations at 1.30 m above ground level from field inventory) with the corresponding crown segments of trees automatically detected using the raster-based WS segmentation method. In this way, we used these databases to build parameterization models of the considered ITD methods. We carried out an additional manual correction of the tree crown segments, determined automatically with the WS algorithm running on the Gaussian filter smoothed CHM [51]. We performed the correction of segment outlines on the CHM overlay displayed in grayscale by allowing operations such as changing segment outlines, merging adjacent segments, or splitting segments encompassing more than one tree crown.
Simultaneously, we controlled the correctness of the position of the trunk location from ground measurements on the SP relative to the automatically generated crown segment outline. In most cases, it was necessary to correct the trunk position relative to the segment slightly. It was needed due to several factors, the most important of which were the natural inclination of the trees resulting in the eccentricity of the treetop projection on the ground relative to the actual position of the stem and the limited accuracy of locating the center of the plot using a GNSS receiver in forested areas [52]. We used a ten-point scale to evaluate the correctness of the segment's delineation in reference to the ground dendrometric and location measurements. If we found unambiguous correspondence, then we rated the segment a ten. Depending on the degree of correction needed, we gave the segments lower ratings. We assumed a rating threshold value of seven to build the set of observations for creating parameterizing functions. As a result, we built a database containing a total of 9451 observations for coniferous species (Pine, Spruce). Figure 3 presents an example of SP before and after the correction.

Individual Tree Detection on Sample Plots
We automatically determined the number of trees in each SP based on two analytical solutions commonly available and effectively used in the forestry field. The first of the considered algorithms includes the classical WS method, while the second approach is a hybrid method of searching for treetops on a normalized point cloud using Local Maximum Filter [53] and generating crown contours using interpolated CHM and the growing region method (LMF+GR) [54][55][56]. The main advantage of the adopted methods is the possibility of using the information potential of a large set of empirical measurements to build functions that parameterize the ITD analysis process. We extended the considered methods with the author's approach to CHM generation and the implementation of the function of the dependence of the crown diameter on the tree height, as well as the relation of the crown projection area on the plane and the tree height. Considering the previously described deviation of the virtual tree top position in the crown segment, we additionally propose an approach for the criterion of recognizing the membership of a crown segment to an SP. The schematic process of determining the number of trees per SP using both ITD methods is summarized in Figure 4. We implemented the analytical procedure using the SAGA GIS free software libraries [51], R language script [57] and QGIS software geoprocessing models [58].

Point Cloud Pre-Processing
Point cloud, classified according to the ASPRS standard, underwent a normalization process, that is, the transformation of point cloud absolute coordinates above mean sea level (MSL) to relative coordinates, above ground level (AGL). The process of point cloud normalization is important to extract the stand height. We implemented point cloud normalization using the LidR library [59]. We realized the removal of topographical information from the processed point cloud based on points classified as class 2 (ground) using the TIN algorithm [60]. Next, from the normalized point cloud, we selected the points of first return (single or first of many returns) belonging to the high vegetation class with a height relative to the ground exceeding 5 m. The threshold value of points cut-off was adopted on the basis of field empirical data indicating the lack of occurrence of trees of that height possessing sufficient volume in the context of Forest Management Instructions in force in Poland (DBH ≥ 70 mm).

Parameterization of Individual Tree Detection Methods
Due to the different nature of the implemented methods, we considered different sets of output parameters. We used a total of 94 sets of parameters in the comparative analysis of the considered ITD methods.
For the WS method, we considered 54 parameter sets based on raster analysis, including combinations of various spatial resolutions of CHM (pixel sizes) and Gaussian filter smoothing strength (kernel size and standard deviation). We analyzed the validity of using CHM in the following pixel sizes [m]: 0.35, 0.40, 0.45, 0.50, 0.55, 0.60. Pixel size ≤ 0.3 m was not included in the analysis due to insufficient point cloud density. We smoothed the resulting CHM with a Gaussian filter in a moving window (circular window) with a size range of one to three pixels, and a standard deviation of one to three.
Given the crucial importance of CHM quality in the WS method, we proposed a proprietary algorithm for filling empty CHM cells based on nearest neighbor analysis. We implemented the construction of CHM in R by means of the point to raster method (P2R) available in the lidR package, with assigning height to the raster cell from the highest point. When building a CHM with a pixel size less than 0.45 m, empty pixels in the height image of the canopy becomes a significant difficulty due to insufficient density of first returns in the point cloud. We propose an algorithm performing iterative filling of empty CHM cells with the principle of minimizing the artificial growth of tree crowns.
The algorithm begins with reclassifying the CHM to zero (empty pixels) and one (pixels with positive value of height) (A). Then, non-empty pixels are counted in a 3 × 3 moving window (B). In the last phase of the procedure, the empty pixels that have at least five neighbors are calculated as the average height from neighboring cells (C). The presented procedure is easy to implement in the R environment and can be run with a predefined number of iterations, or until there are no empty cells with a sufficient number of neighbors with height, as presented in Figure 5. In the LMF+GR method, in the first stage of this analysis, we examine the normalized point cloud for local extremes, identified as the tops of individual trees. A given point in the cloud is classified as a tree top if it has the highest Z coordinate in a circular search region of a size dependent on the height of that point. For low-diversity stands with high similarity in tree height and crown size, a search radius of constant size may be sufficient. As the variation in these characteristics increases, it is desirable to determine an appropriate relationship between height and crown diameter. This relationship is described by a continuous function. We propose to use the function, where w(x) is a quadratic function determined from the measured data.
In determining the function f LMF (x) of the relationship, we used a total of 9451 tree segments of coniferous species (pine, spruce) validated in the previous step, enriched with data from ground dendrometric measurements, and approximated as circles determined from crown segments. We grouped the collected observations into 1 m bins with respect to their height, ordered in ascending order, and within each of these height bins we determined percentiles of crown diameter p ∈ {0.5, 1.0, 1.5, . . . , 20.0}. Based on these percentiles for each p and the corresponding heights, we constructed a crown size-height dependence function w p (x) (Figure 6). In this way, we constructed 40 such functions, which we tested to indicate the optimal p. Having treetops identified, we performed crown delineation using the GR algorithm on a CHM with 0.45 m spatial resolution.

Post-Processing Tree Detection Outputs
A general problem of tree detection from ALS data is the general tendency to overestimate the tree number due to the characteristics of LiDAR data (structure complexity, noise) [61]. Many researchers have encountered this problem and have dealt with it in various ways, for example, by thresholding the distance between detected treetops [35,62,63]. Given the availability of a large empirical dataset, we proposed a method to eliminate segments with an inappropriate ratio of tree segment area to tree height. We determined this relationship based on a dataset of 9451 validated tree segments. We grouped these trees by height into one-meter subsets. For each such subset we took a minimal segment area. Given these observations, we fitted an exponential function which resulted in a relationship: where Amin is the minimal area of the segment representing a detected tree and H is the tree height obtained based on LiDAR measurements. Based on relationship (1) we constructed a rule: for segments to be merged with another adjacent segment that shared the longest common boundary. As we were dissolving the segments, which are too small for its height, given the criterion above, we corrected the number of detected trees, respectively. We performed the post-processing procedure of ITD results for both methods analyzed in our study. We ran the post-processing procedure using a workflow processing model in QGIS. Figure 7 shows an exemplary execution result of the post-processing correction procedure after automatic tree detection.

Number of Trees Estimation in Sample Plots
For the number of trees estimation in an SP, we need to define a criterion of tree containment in the SP. It is a vital aspect of validation of the methods as it may render an error to the assessment of the results. A common eccentricity of the treetops encountered during field dendrometric measurements prevented us from using the treetops containment as that criterion [41,64]. In this situation, we constructed another criterion based on the point cloud returns in the crown segment. The detected tree belongs to the SP when over half of its returns fall into a cylindrical shape over the SP. It is also worth mentioning that we calculated the proportion of points of a designated tree in the circular plot using the relative height filter of points. We discarded points at heights not exceeding 40 cm above ground level (low vegetation class according to the ASPRS standard).
We implemented the procedure for determining the membership of a tree to an SP using a script in R language. The developed process consisted of the classification of points in a filtered point cloud according to the affiliation to the unique ID of crown segments, and the affiliation to an area of a sample circular plot. As a result, we were able to determine the percentage of points located in the SP to points located outside this area, which can be seen in Figure 8.

Sample Plots Classification Based on Several Attributes
The SPs analyzed in this paper, despite the fact that they are located in a homogeneous complex of the Knyszyn Forest in neighboring forest districts, can be grouped due to the occurrence of some specific conditions of local character. In this respect, we have made a grouping of circular plots due to six main factors: administrative affiliation of the site (A), radius size of the circular plot (B), proportion of gaps in the circular plot determined as the spatial difference of the sample plot and the area from the crowns represented by the CHM (C), variance of the area of the rectangular projection of the crown in the circular plot (D), variance of the height of the trees in the sample plot (E), and proportion of the number of trees with apex visibility 2 to visibility 1 (F). The ranges of variation of the plot differentiation factors, that we used in our analysis, along with the sample size in classes, are presented in Table 2. Table 2. Ranges of the variability of features grouping the circular sample plots (the counts within the considered categories are given in brackets): administrative affiliation of the site (A), radius size of the circular plot (B), proportion of gaps in the circular plot (C), variance of the area of the rectangular projection of the crown in the circular plot (D), variance of the height of the trees in the sample plot (E), and proportion of the number of trees with apex partial visibility to full visibility (F).

Results
In the comparative study, we performed a detailed analysis of the error distribution for each of the considered parameter sets, which ultimately allowed us to identify optimal solutions for the parameterization of the methods within each category (A-F) and group (g1-g3) of SPs. The objective function was to minimize the Root Mean Square Error (RMSE) value. In particular categories and groups of SPs, we noticed a large variation in the number of measured trees (ranging from 5 to 157, on average 27) on SPs with various sizes. For the correct evaluation of this factor, we also minimized the Mean Absolute Percentage Error (MAPE). We used the number of trees from ground measurements (14,199) with full and partial crown visibility from the aerial ceiling as a reference in the error calculus.

Impact of Parameterization of ITD Methods on Estimation Errors
The results of the ITD on SPs obtained with the use of the considered methods indicate a very high efficiency of representing the variability of the tree count (R 2 = 0.90). Depending on the group and category of the circular plots, as well as on the adopted parameter sets, outliers appeared which resulted from a high variability of the number of trees per SP and a very heterogeneous horizontal and vertical structure of stands in the Knyszyn Primeval Forest.
A direct comparison of the number of trees from ground-based dendrometric measurements to the number of treetops detected by the LMF+GR algorithm allows us to conclude that using percentiles in the range (p ∈ [0.5, 2.0]) to construct the LMF function leads to overestimating the results. Subsequent percentiles result in a decreasing trend in the number of detected trees. Considering all circular plots, the optimal variant in the ITD analysis providing a symmetric distribution of errors is the percentile 3.0. On the other hand, considering the categorization of SPs shows very strong disproportions of the error distribution between plots with different radius sizes. A similar situation applies to categories involving trees with partially obscured crowns (shown in Figure 9).
Analysis of the estimation error distribution with respect to the WS method shows that the most symmetric distributions were obtained for the analysis variants with CHM in the range 0.35-0.40 m spatial resolution, and low smoothing strength. Increasing the pixel size and the strength of smoothing causes an underestimation of the detected trees over SPs. It is also worth pointing out that with this method there are no solutions that cause the overestimation of the results (as was the case with low percentiles in the LMF method). Moreover, the results are strongly influenced by taking into account differentiating factors such as the SP radius and the share of trees with partially obscured crowns (presented in Figures 10-12).    Using the LMF+GR method, we recorded the highest RMSE in the range of 0.5-3.0 percentile of the crown diameter used in the parameterization of the LMF function. Crown diameter percentiles, p ∈ [3.0, 6.0], resulted in a stabilization of the RMSE, while we noticed a slight increasing trend in the subsequent percentiles. Analyzing the curves in Figure 13, it can be seen that the highest values of the metric occur for the variant B-g2 covering SPs with r ∈ [20,25], which is dictated by a significantly higher number of ground measured trees (80 trees on average) on enlarged SPs, as compared to typical SPs with r ≤ 12.62 m, where 23 trees were recorded on average. The detailed decomposition of the metric's variations in SP groups and categories allows us to distinguish the characteristic features of stands. These qualities affect the propagation of the laser signal and detection quality. The method applied leads to a higher efficiency in the case of stands with a more than 20% share of crown gaps and is characterized by a greater variance of height and size of crowns (groups g2 and g3). Simultaneously, the analysis of variation of RMSE in group F (ratio of the number of trees with visibility 2 to visibility 1) indicates that the results deteriorate with the increasing share of partially obscured trees.  Table 2; the 'All' group represents the entire set of sample plots).
Analysis of the variability of the MAPE metric allows us to conclude that we obtained the lowest percentage errors for estimating the number of trees in the SPs with a radius in the range r ∈ [20,25]. In the optimal range of 3.0-6.0 percentile of crown diameter, errors on larger SPs are about 10 percentage points lower than on plots with radius r ≤ 12.62 m. In contrast to the RMSE metric, we obtained lower errors in stands characterized by less variation in height and crown size (Dg1, Eg1) ( Figure 14). The use of higher-order percentiles of crown diameter, on the one hand, obliterates inter-group differences and, on the other hand, reveals an increasing trend in percentage error.
In the case of the WS segmentation method, it is remarkable that the RMSE increases with increasing CHM pixel size and Gaussian filter smoothing strength (kernel size and standard deviation). Figure 15 shows that the optimal set to minimize the RMSE error is a CHM with a pixel size in the range of 0.35-0.45, smoothed in the closest neighborhood of the processed cell (kernel size 1). Analyzing the variation of RMSE in categories and groups of SPs, it can be seen that the algorithm used performs better with stands with a greater variation in height, crown diameter, and with a greater proportion of gaps between crowns. Similar to the LMF+GR method, we obtained worse results on SPs with a higher proportion of crowns partially obscured by crowns of taller trees.  Table 2; the 'All' group represents the entire set of sample plots).

Figure 15. Variability of the RMSE metric as a function of Watershed method parameters-Canopy
Height Model (CHM) spatial resolution (pixel size in meters), Gaussian smoothing strength (Gaussian filter window size (gs) and standard deviation (sd) in pixels)-and with respect to groupings of circular plots in different categories (A-F, g1-g3, see Table 2; the 'All' group represents the entire set of sample plots).
Referring to the effect of the parameterization of the WS method on the MAPE value (presented in Figure 16), the differences in subgroups Bg1 and Bg2 (variation of SP radius) disappears with increasing pixel size and degree of smoothing of the CHM. Nevertheless, in the range of optimal variants of parameterization, the errors on SPs with smaller radii are much higher, reaching the level of 20 percentage points with minimal smoothing of the CHM with a pixel size of 0.35 m. For the WS method, we noted only slight differences in percentage error values between stands in different groups of categories of gap proportion, height variance, and crown size. MAPE differences for these group categories are less pronounced than for the LMF+GR method.  Table 2; the 'All' group represents the entire set of sample plots).

Choosing Optimal Parameters for ITD Methods
Detailed error analysis allowed us to identify the optimal parameter sets in the considered ITD methods by determining the minimum values of the RMSE and MAPE metrics. Analyzing Table 3, it can be generally indicated that, when implementing the LMF+GR method, the minimization of RMSE is obtained for low values of crown diameter percentiles (3.5-4.5). At the same time, when analyzing stands with a greater variation in height and crown size, the optimal solution shifts toward higher-order percentiles (4.0-8.0). Similar relations occur in MAPE minimization with the difference that the optimal indications are percentiles higher than in RMSE minimization (percentile value on average + 3.0). In this case, the optimal solution for the entire set was estimated for percentile 6.5. On the other hand, the variation of optimal combinations of parameters for the WS method (CHM pixel size and the strength of smoothing with the Gaussian filter) is less pronounced than in the LMF+GR method. In this method, the optimum parameters oscillate around the CHM pixel size in the range of 0.35-0.40 m and the minimum CHM smoothing strength with Gaussian filter (window size 1). Besides, it is very difficult to indicate an unambiguous relation of the indicated parameter sets with the local character of the analyzed stands. This is due to the multidimensional variability of forests.
ITD methods are supposed to provide precise information on the number of trees in a given plot unit (and on the location of individual trees or their dense groups). Considering only the parameter of tree abundance, we believe that the applied ITD methods allow us to estimate, with very high accuracy, the total number of trees in the control SPs across all categories (in Figure 17). Analysis of the following graphs shows a slight tendency to underestimate the results in both methods; this is slightly stronger in the case of LMF+GR. In addition, a trend of greater underestimation in the MAPE minimization variant compared to the RMSE minimization is evident. When comparing the two methods, in most of the analyzed cases, the WS method produces a better accuracy of fitting the estimates to the empirical data. Table 3. Optimal parameter combinations in the ITD process that ensure minimum errors of the considered metrics for each category and group of sample plots. Window size and standard deviation parameters for Watershed (WS) method are presented in pixels, while pixel size is in meters. Categories and divisions into groups correspond to Table 2 Figure 17. Comparison of the number of trees detected using the tested ITD methods and ground survey across categories and groups of circular plots (A-F, g1-g3, see Table 2) in the variant of the experiment that ensures minimization of the RMSE metric (with reference to use of the MAPE metric).

Analysis of ITD Results on Some Sample Plots (Case Study)
As a case study, we performed an analysis using both ITD methods with optimal parameters to ensure the minimization of RMSE and MAPE on three selected enlarged circular plots (r = 25 m) that differed significantly in stand vertical structure, tree compactness, and spatial distribution of gaps. We ranked the plots according to the complexity of their analysis: SP1-plot with a relatively low variance in tree size and height; SP2-plot with a greater variance in height, size and the occurrence of partially obscured trees; SP3-plot with a high variance in tree size and height and proportion of artificial and natural gaps between crowns (Figure 18). Comparing the ITD methods used in these few sample SPs (Table 4), we can indicate that the WS method generally provided better results in estimating the total number of trees. We noted the largest discrepancy at SP2, where the LMF+GR method resulted in a visible overestimation of the number of trees. It is due to the peculiarities of LMF, which worked worse in the case of greater variation in crown size. It is worth pointing out that the variant of parameterization ensuring minimization of RMSE caused greater overestimation of the number of trees. On the other hand, analyzing basic statistics of height and size of automatically determined segments, the differences turned out to be small. Segments determined by the WS method generally have more correct borders in the context of CHM. Only in partially obscured trees (SP3), the WS algorithm tends to merge the segment of the towering tree and the obscured tree.

Discussion
Despite the availability of sophisticated analytical tools, the accurate determination of tree numbers from ALS data analysis for precision forestry remains a complex problem [65].
This study aimed to investigate the influence of the ITD method parameters on the quality of automatic trees detection using a large-scale ground sample dataset. Its realization required field data from two neighboring forest districts, studied in independently realized research projects. As a consequence, it was necessary to integrate databases obtained in different time intervals. Despite the adoption of a uniform measurement methodology, there is uncertainty about the homogeneity of the data. An instance of this problem is the stage of subjective evaluation of crown visibility by two independent teams. The determination of tree visibility in categories 1 (full crown visibility) and 3 (no crown visibility) was not a problem. However, in the case of category 2 (partial crown visibility), there was a difficulty in unambiguously determining the feature in individual cases, for example, trees with a crown growing into another crown or limited visibility of crowns from the ground level. For this reason, we used facility (forest district) as one of the variables differentiating the study population. We found no clear differences between the results for individual objects (Figure 17). For subdivision category B (area size), differences are noticeable (Figures 9 and 10). We obtained significantly better results for larger SPs, which is in line with other studies [66][67][68]. We, therefore, conclude that SPs with larger radius (r > 12.62 m) are worth using in the absence of economic constraints.
Another issue that can affect the results of the experiment is the edge effect. It is a problem associated with all natural geographic studies where spatial analyses are performed. It also concerns methods based on sample plots (also circular) [46]. It is related to the size of the plot (the larger the plot, the smaller the influence of the edge effect) [45] and to measurement errors (location of the plot center, location of trees on the plot, etc.). Among other things, the edge effect results in uncertainty in the membership of crown segments to SPs. Some researchers compensate for this effect by manually correcting the location of detected trees [50] or correcting the SP outline by matching it to crown segments [41].
We also used manual correction of segment outlines for our experiment to build an accurate reference sample for the parameterization of ITD methods. We used a 10-point scale to evaluate the correctness of the segment's delineation in reference to the ground dendrometric and location measurements. With such a large sample, it was possible to discard uncertain crown segments (with scores less than 7).
Another critical issue is the subjective choice of ITD methods for the experiment. Among the many existing ITD methods [50], we chose two of them based on different analytical approaches: the CHM-based WS method and the point cloud-based LMF+GR method. These methods also differ significantly in the number of parameters that need to be taken into account. In the case of the WS method, there are more parameters. This method also requires more steps, including the process of CHM preparation. We introduced an original algorithm for filling in empty CHM pixels to overcome the influence of the low-density point cloud on CHM quality. In contrast, the LMF+GR method, at the stage of filtering in the point cloud for local maximums corresponding to theoretical treetops, is parameterized mainly by the function of the tree height-crown diameter relationship. Therefore, we found that LMF+GR was a more effective method for practical applications. As indicated by other studies [69], the overall performance of single-tree detection methods is strictly dependent on the adopted parameters. One of the general problems was the overestimation of detected trees, especially with using the WS algorithm [10,70]. Therefore, our introduced post-processing procedure for the results of automatic tree detection led to a clear stabilization of the results (Figures 15 and 16). This results in an increase in the range of optimal values of tree detection parameters in predominantly coniferous stands. Only for circular plots characterized by a large variation in crown size (D-g3) and the LMF+GR method did we record optimal detection parameters that were clearly different from the others ( Table 3).
The agreement of the number of detected trees with the reference data obtained in our work is at a similar level to that of other authors [71,72]. It is worth mentioning that in this study we focused on the number of trees as a dependent variable, so we did not analyze the correctness of the determination of the crown reaches of individual trees. The carried out case study clearly shows that there are differences in the basic statistics of the crown segment area depending on the adopted set of analysis parameters. The segment size was conditioned by the number of detected trees as well as the CHM pixel size. The issue of the influence of the parameterization of ITD methods on the correctness of the representation of the crown extent of individual trees, therefore, seems interesting and will be addressed in further analyses. This is a key issue in the context of further research on single tree segment features. The selection of criteria related to the inclusion of a given segment into a circular plot, taking into account issues such as the accuracy of the location of the center of the circular plot and the location of individual trees, as well as the direction and degree of their deflection from the vertical, require further analysis. It is worth mentioning that an interesting aspect of future research is the verification of the effectiveness of the developed parameterizing functions in ITD analysis of coniferous stands from other geographical regions.
We can find many studies of individual tree crown delineation using passive and active remote sensing data in the literature. We chose the two most popular ITD methods, one based on point cloud analysis and another one utilizing CHM. We assessed the influence of the parametrization of these methods on the resulting number of detected trees. The value of our study comes from a large dataset of field measurements, which we used in the parameterization of ITD methods. We used these data in the postprocessing of the output of the ITD methods as well: we used the dendrometric features to describe a relationship between tree height and tree crown area to establish a criterion to choose which boundaries between tree segments should be dissolved. It was of particular importance when applying the WS method. The use of postprocessing had a beneficial effect on stabilizing the results in using CHM with various resolutions and the strength of its smoothing, thus expanding the optimal variants. It is also worth noting that our proprietary procedure for the iterative filling of empty CHM pixels is particularly important in Poland due to the availability of ALS data from the ISOK project (4 pt/m 2 scan density is available for most of the country). Our study confirms indications from other research teams about the need for the local parameterization of applied ITD methods based on field dendrometric measurements [62,[73][74][75]. Due to the local nature of the analysis and the multitude of methods for their local modification, it is not easy to directly compare the results obtained by studies conducted by other authors. Nevertheless, the accuracy we obtained for estimating the number of trees corresponds with other studies conducted in Poland [46,76,77].

Conclusions
In this paper, we proposed an approach to exploit the information potential of a large set of ground-based dendrometric measurements for the parameterization of selected ITD methods. We parameterized two popular approaches for single tree detection in conifer stands: Watershed and LMF+GR. For the WS method, we proposed a proprietary algorithm for iterative filling in empty CHM pixels based on direct neighborhood analysis. This allowed us to generate correct CHMs at resolutions of 0.35-0.40 m, which would not be possible with the recorded number of first returns in the processed point clouds. Subsequently, we post-processed the results of the ITD WS method for the considered sets of output parameters in order to eliminate crown segments with an incorrect ratio of tree height to its segment area. The procedure used allowed us to stabilize the results across the sets of considered parameter combinations, which was particularly important when analyzing CHM with smaller pixel sizes. To parameterize the LMF+GR method, we used crown diameter-to-height functions at the stage of finding extremes on the point cloud. The obtained results for each percentile for creating LMF window size functions were subjected to the same procedure for eliminating invalid segments as used in the WS method.
The analysis of semi-natural coniferous stands indicated the necessity of taking into account selected conditions related to, among others, variation in crown size and height or the share of trees with partially obscured crowns when selecting output parameters. Nevertheless, the applied parameterization increased the range of combinations of ITD analysis parameters in the considered methods. In our opinion, conducting continuous research on single tree detection from ALS data, despite the need to take into account many confounding factors, is a key element of the idea of precision forestry.