In

Figure 5 the surveyed positions of tree trunks at breast height and of lidar-inferred tree tops from best performing window sizes (3 × 3, 5 × 5 and 7 × 7) are plotted. Tree count and corresponding density are extracted for all tested kernel sizes, and error with ground truth calculated (dD = D

_{ground-truth} − D

_{lidar-inferred}) and plotted in

Figure 6. From

Figure 6, it can be affirmed that the best result of the tree density parameter is obtained using the 5 × 5 kernel. Interpolation of results between window sizes for all areas except area C shows that a theoretical 6 × 6 window would have performed even better. This is most probably influenced by the nearest neighbor distance which shows that the average distance between trees in those three areas is ~2.3 meters. The 5 × 5 window, in our 0.5 meter resolution raster, is the closest size as it embraces a 2.5 × 2.5 area. The fact that it actually under‑estimates the tree count is most probably due to trees whose canopy is completely embedded with another tree, thus making separation very difficult. Area C has a slightly different result, showing a slight over-estimation using the 5 × 5. Since this area has the same characteristics as the others in terms of mean diameter, height and density, the reason might be found in irregular canopy shape. This aspect has to be investigated further as it can provide interesting information on the behavior of the method.

**Figure 5.**
Comparison between surveyed stem position (crossed grey circles) and calculated stem position (smaller filled circles) for area C. **(a)** 3 × 3 size window, **(b)** 5 × 5 size window and **(c)** 7 × 7 size window.

**Figure 6.**
Density error for each kernel size: the difference between surveyed density and lidar–derived density as a function of kernel size.

An evaluation of accuracy of tree-top position detection was done using an agreement test. Agreement tests improve on simple agreement by considering also the agreement expected by chance and giving an estimated coefficient of agreement. Cohen [

28] and Fleiss [

29] have proposed different applications for the coefficient of agreement, respectively for categorical items and for categorical ratings in a number of class items. This study case considers reliability of crown-top position correspondence to position of trunk at breast height. These two elements can have a spatial offset, thus the objective is not to estimate absolute position error, but to estimate agreement in tree detection considering expected chance agreement together with omission and commission errors (false negatives and false positives). These last two elements are calculated as complements of the two point sets (ground truth positions and lidar-inferred positions) where matching nearest neighbors between surveyed points and lidar-inferred points are considered as common points in sets:

where

S is the surveyed point set and

I is the lidar-inferred point set,

C_{err} is commission errors and

O_{err} is omission errors,

M is the matching point set. Matching was done by sorting the results from cross nearest neighbor calculations from smallest to greatest; a point in

I is considered as matching a point in

S if it is within a threshold distance equal to the average tree distance measured (~2.3 m). Matches have to be unique in the sense that a tree in

S cannot match more than one tree in

I. Once all trees from ground survey (

S) have been matched, eventual remaining points are considered as omission errors, while unmatched points and eventual remaining points from the lidar-inferred (

I) set are considered commission errors. Having all this information the kappa coefficient of agreement (

K) can be calculated:

where Pr

_{a} is observed agreement and Pr

_{e} is chance agreement,

N_{m} is number of matching points,

N_{S} is number of surveyed points,

N_{c} is number of points assigned as commission errors, and

N_{o} is number of points assigned as omission errors. In

Table 3 results are reported and calculated

K values are plotted as a function of correlation filter window size in

Figure 7. Percentage of tree extraction is the ratio of matched to surveyed tree ratio:

N_{m}/

N_{S}.

The agreement coefficient between tree extraction and measured tree positions shows the best correlation window size to be 5 × 5 for all four areas, in line with the above observations that the scale of the window size has to be proportioned to the scale of the canopy investigated. It is interesting to note that the percentage of tree extractions in

Table 3 is always higher for smaller kernel sizes. This of course does not account for false positives and false negatives like the agreement coefficient does and is therefore, misleading. In this context, it can be affirmed that an

a priori knowledge of average tree nearest neighbor distance can be used to decide kernel size. This is of course valid for stands which present a spatial distribution which is not clustered, but would work well for regularly spaced trees such as in plantations. For example, poplar plantations are common in this area and in many regions of Italy and are encouraged with incentives given by afforestation programs as well [

30]. They could be ideal stands for the application of this method.

**Figure 7.**
Kappa coefficient of agreement of tree detection for each kernel size.