Impact of Calibrating Filtering Algorithms on the Quality of LiDAR-Derived DTM and on Forest Attribute Estimation through Area-Based Approach

: Ground point ﬁltering of the airborne laser scanning (ALS) returns is crucial to derive digital terrain models (DTMs) and to perform ALS-based forest inventories. However, the ﬁltering calibration requires considerable knowledge from users, who normally perform it by trial and error without knowing the impacts of the calibration on the produced DTM and the forest attribute estimation. Therefore, this work aims at calibrating four popular ﬁltering algorithms and assessing their impact on the quality of the DTM and the estimation of forest attributes through the area-based approach. The analyzed ﬁlters were the progressive triangulated irregular network (PTIN), weighted linear least-squares interpolation (WLS) multiscale curvature classiﬁcation (MCC), and the progressive morphological ﬁlter (PMF). The calibration was established by the vertical DTM accuracy, the root mean squared error (RMSE) using 3240 high-accuracy ground control points. The calibrated parameter sets were compared to the default ones regarding the quality of the estimation of the plot growing stock volume and the dominant height through multiple linear regression. The calibrated parameters allowed for producing DTM with RMSE varying from 0.25 to 0.26 m, against a variation from 0.26 to 0.30 m for the default parameters. The PTIN was the least a ﬀ ected by the calibration, while the WLS was the most a ﬀ ected. Compared to the default parameter sets, the calibrated sets resulted in dominant height equations with comparable accuracies for the PTIN, while WLS, MCC, and PFM reduced the models’ RMSE by 6.5% to 10.6%. The calibration of PTIN and MCC did not a ﬀ ect the volume estimation accuracy, whereas calibrated WLS and PMF reduced the RMSE by 3.4% to 7.9%. The ﬁlter calibration improved the DTM quality for all ﬁlters and, excepting PTIN, the ﬁlters increased the quality of forest attribute estimation, especially in the case of dominant height.


Introduction
The success of airborne laser scanning (ALS) on collecting accurate measurements of forest ecosystems consolidated this technique worldwide as a state-of-the-art approach in forest inventories. The term ALS refers to light detection and ranging system (LiDAR) onboard an aerial platform, aiming to quickly scan large areas to produce detailed three-dimensional point clouds of the surface [1,2]. These characteristics allow the ALS data to be used for many purposes including topographic-and filters was traced from the DTM generation to its impact on the performance of the models, where the multiple linear regression approach and a eucalyptus forest plantation was used as showcase.

Study Area
The study area is located in Northwest Portugal (40 • 36 N, 8 • 25 W), close to the city of Águeda, comprising 9 km 2 of forested landscapes. The terrain presents a heterogenic topography, with altitude varying from 70 to 220 m and slopes ranging from 2.5% to 34.2%. At the time of the forest data collection (July 2008), the forest area was mainly covered by pure even-aged Eucalyptus globulus Labill stands, harvested every 10-12 years during three or four rotations, with some stands of Pinus pinaster Aiton. The eucalyptus stands had a mean tree density around 1600 trees per hectare, with regular or irregular spacings, and they were composed by stands from seedling (first rotation) and regenerated by coppice (following rotations) for pulp supplying. Many stands were multi-layered, with E. globulus in the uppermost layer and suppressed trees, shrubby and herbaceous vegetation in the lowermost layer.

Field Data Collection
The forest and the topographic surveys occurred between 10 June and 3 July 2008 through 41 circular plots with 400 m 2 of area (11.28 m of radius). The plots were systematically installed over the area (Figure 1a) and covered a large range of terrain slopes (Figure 1b). Within each plot, the diameters at breast height (dbh, at 1.30 m height) were measured from all trees higher than 2 m, together with the height of the dominant and co-dominant trees. For each plot, a concentric subplot of 200 m 2 (7.98 m of radius) was used to collect the heights of all trees higher than 2 m. The missing tree heights of each plot were estimated using the Prodan's model [27] fitted with the respective subplot data. The individual tree volumes with bark were estimated using an equation provided by Tomé et al. [28], and the tree volumes were summed to obtain the ground reference volume for each plot (V, m 3 ). The 41 plots were used for the filter calibration assessment, from which 25 plots were selected to evaluate the impact of the calibration on the forest modeling. The selection of the 25 plots was needed to remove plots that were located in the transition boundary between two stands, or crossed by roads, that could bias the forest modeling (see [29]). The biometrical description of the forest content within plots is presented in Table 1.
Remote Sens. 2020, 12,918 3 of 19 the filters was traced from the DTM generation to its impact on the performance of the models, where the multiple linear regression approach and a eucalyptus forest plantation was used as showcase.

Study Area
The study area is located in Northwest Portugal (40°36′N, 8°25′W), close to the city of Águeda, comprising 9 km 2 of forested landscapes. The terrain presents a heterogenic topography, with altitude varying from 70 to 220 m and slopes ranging from 2.5% to 34.2%. At the time of the forest data collection (July 2008), the forest area was mainly covered by pure even-aged Eucalyptus globulus Labill stands, harvested every 10-12 years during three or four rotations, with some stands of Pinus pinaster Aiton. The eucalyptus stands had a mean tree density around 1600 trees per hectare, with regular or irregular spacings, and they were composed by stands from seedling (first rotation) and regenerated by coppice (following rotations) for pulp supplying. Many stands were multi-layered, with E. globulus in the uppermost layer and suppressed trees, shrubby and herbaceous vegetation in the lowermost layer.

Field Data Collection
The forest and the topographic surveys occurred between 10 June and 3 July 2008 through 41 circular plots with 400 m² of area (11.28 m of radius). The plots were systematically installed over the area (Figure 1a) and covered a large range of terrain slopes (Figure 1b). Within each plot, the diameters at breast height (dbh, at 1.30 m height) were measured from all trees higher than 2 m, together with the height of the dominant and co-dominant trees. For each plot, a concentric subplot of 200 m² (7.98 m of radius) was used to collect the heights of all trees higher than 2 m. The missing tree heights of each plot were estimated using the Prodan's model [27] fitted with the respective subplot data. The individual tree volumes with bark were estimated using an equation provided by Tomé et al. [28], and the tree volumes were summed to obtain the ground reference volume for each plot (V, m³). The 41 plots were used for the filter calibration assessment, from which 25 plots were selected to evaluate the impact of the calibration on the forest modeling. The selection of the 25 plots was needed to remove plots that were located in the transition boundary between two stands, or crossed by roads, that could bias the forest modeling (see [29]). The biometrical description of the forest content within plots is presented in Table 1.   The coordinates of each tree within each plot were recorded as well as the coordinates of prominent terrain points, like breaklines or spot heights (Figure 1c), resulting in 3240 ground control points, a mean of 79 points per plot (σ = 24). The accuracy of the coordinates of points using a geodetic Global Navigation Satellite Systems (GNSS) on land covered with dense vegetation is not reliable; therefore, the devised strategy for measuring the coordinates of ground control points was not straightforward. Firstly, those coordinates were measured in each plot by means of a topographic survey using the radial method [30]. As these coordinates are in a local system, they were converted to that of the LiDAR data by using GNSS receivers. To this end, it was decided to attach to each plot two GNSS-derived points, named GNSS base, whose coordinates were measured with two GNSS receivers. They allow for coordinating the surveyed points directly in the referred coordinate system. Two points are needed to orient the total station. These two points were placed as close as possible to the plot and as much as possible in an open space. The method used to measure the coordinates of the two GNSS-derived points was the relative positioning by using a fixed receiver on a geodetic pillar with known coordinates on the same system as the LiDAR data. This method, in post-processing, is the most precise and may reach levels of precision in the order of centimeters [31].

ALS Data Collection and Pre-Processing
The ALS survey was carried out in July 2008, a few days after the forest inventory, using a LiteMapper-5600 laser system from RIEGL (www.riegl.com), which has as main components the high-resolution laser scanner LMS-Q560, the positioning system AEROcontrol, and the digital camera DigiCAM (see [32] for more technical details on the system). The airplane flew 600 m above ground with a mean speed of 46.26 m s −1 . The parameters of the laser system were: 0.5 mrad of beam divergence, ±45º of scan angle and pulse rate of 150 kHz. The resulted swath was 497 m (60% of overlap) and the returns density was 9.5 returns m −2 . More details about the ALS survey is described in [31,33]. The ALS point clouds were inspected for outliers and further clipped using the plot center coordinates with a 15-m radius buffer (706.85 m 2 ) for each plot. The clipping process was carried out to avoid edge-effect over the ground returns during the filtering process. The FUSION software V3.60 [34] was used for this pre-processing.

Filtering Calibration
The used filters were chosen based on their recurrent utilization on the related literature [20,26]. We gave preference to those filters implemented into line-code-based software that allows being incorporated into programming routines; in our case the R environment [35] was used. Each filter was applied to the last returns of the point clouds of the 41 plots using different parameter sets. The choice of which parameters to be calibrated was done for each filter since they follow different principles described in the following sections. This work is focused on the filtering process for forest applications, so settings regarding urban terrain or smoothing filters were not considered. The values tested in the Remote Sens. 2020, 12, 918 5 of 18 calibration were defined considering the ones close to the software defaults and recommendations ( Table 2).

Progressive Triangulated Irregular Network (PTIN)
The PTIN is frequently applied in forest studies [36][37][38]. The algorithm starts with a sparse triangulated irregular network (TIN) created from seed points and then performs the densification of the TIN iteratively. In this process, the densification occurs by including the returns according to their distance to the TIN facets and their angles to the nodes. We used the adaptation of PTIN of the LASground software (rapidlasso.com), which comprises the following three assessed parameters: Spike, which is the threshold at which spikes get removed; step size, defines the size of the initial search window and it is dependent on the terrain roughness, where values around 5 are suggested for forest or mountain areas; and granularity, related to the computational effort invested into finding the initial ground estimate.

Weighted Linear Least-Squares Interpolation (WLS)
The applied WLS filter was the adaptation of the Kraus and Pfeifer's algorithm [14] into the FUSION software [34], which was used in many works [39][40][41]. This filter averages reiteratively the return heights inside a defined search window, assigning the weights according to their residuals in relation to the mean height. In this case, the weights are recalculated in each iteration according to the weighting function (Equation (1)), so that height values associated with lower residuals receive higher weights, whereas those with higher residuals receive lower weights. The parameter values a = 1 and b = 4 are commonly used and are recommended by the software developers for most applications so they were kept in the analysis [14,26]. The parameters g, w, window size and the number of iterations (iteration) are supposed to be defined by the user according to the data, thus they were chosen to be calibrated ( Table 2). Among these parameters, the window size is the only one without a specified default value, although the software's manual points to 5 m 2 . In this case, a window size equal to 5 m 2 was considered as a default value.
Remote Sens. 2020, 12, 918 where p i is the weight for the return i = 1, . . . , n; v i is the residual point height value from the average height, being i = 1, . . . , n; the parameters a and b determine the steepness of the weight function; g is negative and represents a threshold value after which the weights are set to 1 if v i ≤ g and to 0 if v i > g + w. Note that w ≤ |g| for all w values.

Multiscale Curvature Classification (MCC)
The MCC is a filter developed by Evans and Hudak [15] and implemented in MCC-LIDAR software (sourceforge.net/projects/mcclidar). It uses the thin-plate spline interpolation to produce surfaces in different resolutions and uses progressive curvature tolerances to eliminate non-ground returns. The software uses only two parameters that should be set by the user: the initial scale (λ), related to the search window size that is used to interpolate the points; and the initial curvature tolerance (t), which accounts for slope interaction between the interpolated surface and the returns. During the processing, both parameters are changed through three domains to address variable canopy configurations and their interaction with the terrain slope: the initial value set to λ is multiplied by 0.5, 1, and 1.5, while 0.1 is added to the initial t in each domain. Values for λ =1.5 and t = 0.3 are claimed by the developers as efficient to filter non-ground returns, so variations around those values were tested in this work ( Table 2). Applications of the MCC in forest studies can be found, for example, in [42][43][44].

The Progressive Morphological Filter (PMF)
The PMF filter was developed by Zhang et al. [16] and uses concepts of object identification in grey-scale images by applying mathematical morphology filters like opening and closing operators. The closing operator removes returns from objects of sizes smaller than the window size, while the opening operator keeps the returns from larger objects. The PMF was applied using the implementation of the lidR package [45], where the filter works at point cloud level without any rasterization process. The PMF has been commonly applied to ALS data [18,26], and the release of the lidR package has also promoted this filter to process photogrammetric point clouds [46]. The package uses two parameters: the sequence of windows size, and the sequence of threshold, which is the height value below which a return is classified as a ground return. In this study, the PMF was applied using not a sequence but a specific value for each parameter to better isolate the effect of each component in the processing of the ALS datasets (Table 2).

Filtering Accuracy Assessment
For each combination of filter parameters, the classified ground returns within each plot were interpolated into a DTM (1.0 m of cell size) from a TIN surface using the grid_terrain command implemented on the lidR package [45]. The choice of using the TIN interpolation was due to its efficiency and frequent application to generate DTM from ALS data [9,19,47]; thus, eventual errors in the interpolation were not considered in this analysis. Each combination of the filter parameters was assessed using the root mean squared error (RMSE, Equation (2)) computed with the height values interpolated from the derived DTM at the same planimetric positions as the ground control points. Note that the DTM was generated for each plot separately to allow detecting DTM erosion, which was defined here as the non-inclusion of all ground control points within the DTM extension of its respective plot (Figure 1d). The efficiency of a filtering process was then evaluated using the accuracy of the DTM (the lower the RMSE, the better) so as its integrity, where the erosions were not allowed.
where, y i andŷ i are, respectively, the observed and estimated value for the observation i = 1, . . . , n.

Forest Modeling Assessment
The calibrated parameter values for each filter were further benchmarked against the default values by assessing the impact of the corresponding derived DTM on the estimation of forest attributes through ABA. Each of these DTM was used further into the normalization process, which provided the height of points above ground. The growing-stock volume (V, m 3 ) and the dominant height (h d , m) were estimated for each plot using ALS metrics since these attributes are frequently assessed in the ALS applications. The ALS metrics were computed for each plot using its respective normalized point cloud and considering only points higher than 1 m above ground. The normalization and the computation of ALS metrics were performed using the lidR package [45]; the set of metrics (Table 3) were used as candidate predictors in forest modeling. Percentage of first (PFR 2m ) and all returns (PAR 2m ) above 2 m Cumulative percentage of returns in the C-th layer, i.e., C 10 = 100% Others CR Canopy relief ratio: Although there are several methodologies to perform a forest attribute modeling [48], the multiple linear regression fitted using ordinary least squares was used as a showcase for its simplicity, efficiency and frequent application to ALS data [49][50][51][52]. Thus, the multiple linear model (Equation (3)) was used to estimate each forest attribute (V and h d ), where the two metrics were selected through exhaustive search.
where, √ Y is the response variable; β i is the model parameter i = 0, 1, 2; x i is the predictor i = 1, 2; and ε is the random error.
Despite this analysis focusing on the estimative efficiency of the models (no inference made), the linear regression assumptions were taken into account. The response variable was square rooted to avoid heteroscedasticity, and the residual variance component was added to the back-transformed response variable when an equation was used to estimate a forest attribute: In the exhaustive search, all possible combinations of two candidate metrics were used to fit the models using the above-mentioned dataset with 25 plots. The best model was considered to be the one with the lowest RMSE (Equation (2)), all parameters significantly different from zero (t-test, α = 5%), and with variance inflation factors lower than 10 [54] computed using the car package [55]. The RMSE was chosen for this purpose for is has been proven to be a stable and robust measure to assess model performances [52]. Each model was selected and fitted using the data from the 25 plots referred above. The fitted equation was assessed through 5-fold cross-validation. In the cross-validation the dataset is split into five sets ("folds") of equal size to start an iterative process. In each iteration one fold is omitted from the model fitting and has their values estimated by the model; the estimated values of all omitted folds at the end of the process are used to compute the final error. The entire cross-validation was repeated 100 times to reduce the randomness involved in this process [56]. Thus, the resultant RMSE values from the calibrated filter were compared to the ones derived from the default values through the Wilcoxon-Mann-Whitney test [57,58], since preliminary analysis demonstrates that the RMSE values were not normally distributed. The accuracy of the models was thus represented by the median of the error values (RMSE med ) as well as its percentage from the observed mean values RMSE% med .

Filtering Parameters Calibration
As a general result, the calibration of the filter parameters allowed to produce more accurate DTMs than those produced using the software defaults, with RMSE from 0.25 to 0.26 m when calibrated, against a variation from 0.26 to 0.30 m with the default ( Table 4). The accuracy of the DTMs produced with the calibrated PTIN had the smallest improvement when compared to that obtained by using the default parameters (a reduction of 4% in the RMSE). On the other hand, the accuracy of the DTMs produced with calibrated WLS had the highest improvement on accuracy, decreasing 16% in terms of RMSE. Excepting the PMF, the filters have more than one calibrated parameter set, so the set that has less impact on the computational effort was chosen.  The tested parameters of PTIN resulted in a small variation in the accuracy of DTMs, with RMSE ranging from 0.25 to 0.29 m (Figure 2). The DTM accuracy was less affected by the spike and most influenced by the step size-the RMSE decreased continually as the step size increased. On the other hand, larger values for step size increased the susceptibility of the filter to border effect, producing eroded DTMs. This effect was intensified when the coarse granularity was applied so that smaller values of step size were needed to derive eroded DTMs. Despite the influence of the border effect, the granularity had marginal impact on the accuracy of the DTMs, and the fine and extra fine granularity had comparable results. Therefore, the fine granularity should be preferred since it requires less computational effort during data processing.
The WLS presented high variation in the DTM accuracy among the different parameter settings (RMSE between 0.25 and 0.46 m). It is strongly and similarly affected by the parameters g and w (Figure 3), where the RMSE decreased with their decreasing in value. Additionally, the RMSE decreased as the differences in the absolute values of these parameters decreased, which means that the filter is more accurate as |g| and w values are closer. The tested number of iterations and window cell sizes did not influence the accuracy of the DTMs, but they were important concerning the border effect. Eroded DTMs occurred when |g| = w, except when the cell size was set to 1 m 2 and using three or five iterations. By setting |g| = w the filter is forced to consider only the lowest residuals to compute the averages, and positive residuals (v) are no longer accepted. Consequently, the WLS becomes less tolerant of Remote Sens. 2020, 12, 918 9 of 18 variations in the ground surface within the search window so more points are removed if a higher cell size or a higher number of iterations are used. Furthermore, there was no difference in the filtering performance when the number of iterations was set to either three or five, so the lower value (i.e., three) should be preferred to speed up the computational process.
*Differences between the calibrated parameter values and the default ones.
The tested parameters of PTIN resulted in a small variation in the accuracy of DTMs, with RMSE ranging from 0.25 to 0.29 m (Figure 2). The DTM accuracy was less affected by the spike and most influenced by the step size-the RMSE decreased continually as the step size increased. On the other hand, larger values for step size increased the susceptibility of the filter to border effect, producing eroded DTMs. This effect was intensified when the coarse granularity was applied so that smaller values of step size were needed to derive eroded DTMs. Despite the influence of the border effect, the granularity had marginal impact on the accuracy of the DTMs, and the fine and extra fine granularity had comparable results. Therefore, the fine granularity should be preferred since it requires less computational effort during data processing. The WLS presented high variation in the DTM accuracy among the different parameter settings (RMSE between 0.25 and 0.46 m). It is strongly and similarly affected by the parameters g and w (Figure 3), where the RMSE decreased with their decreasing in value. Additionally, the RMSE decreased as the differences in the absolute values of these parameters decreased, which means that the filter is more accurate as |g| and w values are closer. The tested number of iterations and window cell sizes did not influence the accuracy of the DTMs, but they were important concerning the border effect. Eroded DTMs occurred when |g| = w, except when the cell size was set to 1 m² and using three or five iterations. By setting |g| = w the filter is forced to consider only the lowest residuals to compute the averages, and positive residuals (v) are no longer accepted. Consequently, the WLS becomes less tolerant of variations in the ground surface within the search window so more points are removed if a higher cell size or a higher number of iterations are used. Furthermore, there was no difference in the filtering performance when the number of iterations was set to either three or five, so the lower value (i.e., three) should be preferred to speed up the computational process.  The calibration of the MCC did not produce eroded DTMs, and their accuracy varied between 0.26 and 0.36 m (Figure 4). The tolerance had the greatest impact on the filter's performance (i.e., the higher its value, the higher the RMSE and thus the smaller the DTM accuracy). On the other hand, the scale appears to have only marginal impact, tending to reduce the RMSE as the values increase. The tested values for scale suggested that its effect on the DTM accuracy also depends on the tolerance; when the tolerance was set to 0.1, the effect of the scale on the accuracy was marginal (RMSE between 0.26 and 0.27 m), while a wider range was observed for intermediary tolerance values (e.g., RMSE between 0.29 and 0.34 m for tolerance value equal to 0.4). Despite the most accurate DTM being produced when setting tolerance to 0.1, many values could be used for the scale (0.1-4.5). This parameter controls the cell resolution of the thin-plate spline interpolation (see Section 2.4.3), so setting larger values increases the number of returns to be interpolated. For this reason, using lower values for the scale (i.e., between 1 to 2) is advisable to reduce the computational effort. The calibration of the MCC did not produce eroded DTMs, and their accuracy varied between 0.26 and 0.36 m (Figure 4). The tolerance had the greatest impact on the filter's performance (i.e., the higher its value, the higher the RMSE and thus the smaller the DTM accuracy). On the other hand, the scale appears to have only marginal impact, tending to reduce the RMSE as the values increase. The tested values for scale suggested that its effect on the DTM accuracy also depends on the tolerance; when the tolerance was set to 0.1, the effect of the scale on the accuracy was marginal (RMSE between 0.26 and 0.27 m), while a wider range was observed for intermediary tolerance values (e.g., RMSE between 0.29 and 0.34 m for tolerance value equal to 0.4). Despite the most accurate DTM being produced when setting tolerance to 0.1, many values could be used for the scale (0. 1-4.5). This parameter controls the cell resolution of the thin-plate spline interpolation (see Section 2.4.3), so setting larger values increases the number of returns to be interpolated. For this reason, using lower values for the scale (i.e., between 1 to 2) is advisable to reduce the computational effort. The calibration of the PMF parameters resulted in the widest range of DTM accuracies, with RMSE between 0.25 and 0.56 m ( Figure 5). Both tested parameters influenced the filtering efficiency. The RMSE of the DTMs increased with the increasing of the threshold. The changes in the accuracy due to the window size were more highlighted when its values shifted from 1 to 3, but a marginal effect was noted for values higher than three. The eroded DTMs were more frequent as the threshold value was lower and the window size higher. Since the PMF uses a sequence of window size values in the filtering, the use of higher values (i.e., ≥9) showed to be not effective for the filtering efficiency in the studied area.

Estimation of Forest Attributes
The default and the calibrated parameter values listed in Table 4 were used to estimate the forest attributes through ABA. The parameters with more than one calibrated value were set as follows: Fine granularity for PTIN; g = 0 and w = 0 for WLS; and scale = 1.0 for MCC. The calibration of the PMF parameters resulted in the widest range of DTM accuracies, with RMSE between 0.25 and 0.56 m ( Figure 5). Both tested parameters influenced the filtering efficiency. The RMSE of the DTMs increased with the increasing of the threshold. The changes in the accuracy due to the window size were more highlighted when its values shifted from 1 to 3, but a marginal effect was noted for values higher than three. The eroded DTMs were more frequent as the threshold value was lower and the window size higher. Since the PMF uses a sequence of window size values in the filtering, the use of higher values (i.e., ≥9) showed to be not effective for the filtering efficiency in the studied area. The calibration of the PMF parameters resulted in the widest range of DTM accuracies, with RMSE between 0.25 and 0.56 m ( Figure 5). Both tested parameters influenced the filtering efficiency. The RMSE of the DTMs increased with the increasing of the threshold. The changes in the accuracy due to the window size were more highlighted when its values shifted from 1 to 3, but a marginal effect was noted for values higher than three. The eroded DTMs were more frequent as the threshold value was lower and the window size higher. Since the PMF uses a sequence of window size values in the filtering, the use of higher values (i.e., ≥9) showed to be not effective for the filtering efficiency in the studied area.

Estimation of Forest Attributes
The default and the calibrated parameter values listed in Table 4 were used to estimate the forest attributes through ABA. The parameters with more than one calibrated value were set as follows: Fine granularity for PTIN; g = 0 and w = 0 for WLS; and scale = 1.0 for MCC.

Estimation of Forest Attributes
The default and the calibrated parameter values listed in Table 4 were used to estimate the forest attributes through ABA. The parameters with more than one calibrated value were set as follows: Fine granularity for PTIN; g = 0 and w = 0 for WLS; and scale = 1.0 for MCC.
The dominant height equations resulted in good accuracy for all filters, with RMSE% med between 4.9% and 5.2% for the calibrated values, and between 5.0% and 5.6% for the default values (Table 5). The calibrated and the default parameters values originated equations with comparable performances for the PTIN considering a confidence level (α) of 5%. The metrics used by the equations derived from PTIN were also similar; the equation derived from the calibrated parameters used Z 65 , while the one derived from the default parameters used Z 60 . The calibration of WLS, MCC and PFM resulted in a significant improvement of the estimated dominant height accuracy when compared to that one derived using the default parameter values. The decrease in the RMSE med values due to the calibration was 0.08 m points for WLS, 0.06 m for MCC and 0.10 for PMF, which are equivalent to an improvement of 8.5%, 6.5% and 10.6%, respectively. Although their respective equations used the metric Z 95 , the ones derived for the calibrated values used Z 60 instead of C 8 in the equations of the default values, which are metrics computed with different principles (see Table 3). Table 5. Dominant height equations with their associated variances (σ 2 ), and the p-value for the Wilcoxon-Mann-Whitney test obtained with different settings of the progressive triangulated irregular network (PTIN), weighted linear least-squares interpolation (WLS), multiscale curvature classification (MCC), and progressive morphological filter (PMF). ; Z x is the height of the x-th percentile of height distribution; C x is the cumulative percentage of returns in the x-thlayer. ** Median of root mean square error (RMSE) values computed through 100 repetitions of five-fold cross-validation. The RMSE% med is shown in parenthesis.

Filter
Regarding the volume estimation, the models presented RMSE% med between 16.3% and 16.7% when using the calibrated values and 16.5% and 17.7% when using the default values (Table 6). Although these errors are higher than those encountered in the estimation of the dominant height, they could be considered low in the case of volume modeling when assessed through cross-validation. The estimation efficiency when using the calibrated PTIN and MCC did not differ from those derived with filters using their respective default values. The use of WLS and PMF with calibrated parameters significantly improved the volume estimation when comparing to their respective default values. Despite the related equations used the same metrics, the decrease of the RMSE med values by using the calibrated parameters were 0.018 m 3 for WLS and 0.044 m 3 for PMF, which is equivalent to an improvement of 3.4% and 7.9%, respectively. Table 6.
Volume equations with their associated variances (σ 2 ), and the p-value for the Wilcoxon-Mann-Whitney test obtained with different settings of the progressive triangulated irregular network (PTIN), weighted linear least-squares interpolation (WLS), multiscale curvature classification (MCC), and progressive morphological filter (PMF).

Discussion
This study demonstrated that a DTM derived from ALS is more accurate when the parameters of the filtering process are calibrated. The DTM produced with the WLS was the most affected by the calibration, followed by that of the MCC, PMF and of the PTIN (less affected). This fact influenced the estimation of forest attributes, especially for dominant height. Except for PTIN, the estimation of the dominant height derived by using the calibrated parameter values was significantly more accurate than those that are derived with the default ones. In the case of the volume estimation, the calibration of WLS and PFM derived equations with significantly better accuracies, contrary to the PTIN and MCC filters that performed comparably when using calibrated and default parameter values.
The lower effect of the calibration for PTIN is justified by the similarity between the calibrated and the default parameters values, which differs only by 0.5 in the spike parameter value (see Tables 2 and 4). The calibrated parameters for the MCC differed the most with respect to the default ones, having a large impact on the dominant height estimation as opposed to the volume estimation. The result shows that a DTM can have different impacts on the modeling of these forest attributes and that the calibration will lead to the best results depending on the filter and on the attribute to be estimated.
Although these two forest attributes are highly correlated, they present different aspects when are estimated through ALS data. Many works have reported good correlations between the tree height attributes with upper-tail percentiles of the height distribution of ALS returns [59,60]. The literature is less consensual for the case of the volume estimation, for which good performances are found based on ALS metrics associated to intermediary percentiles (higher than 50%), density metrics (e.g., C 8 ) and/or height variability measures (e.g., height kurtosis) [6,44,50]. These characteristics were also observed in our work, where the metric Z 95 appears in all dominant height equations just as the Z max and C 2 in the volume equations. However, studies focusing on the effect of point density over the metrics demonstrated that those ones related to the tail ends of the return distribution (e.g., C 1 and C 9 , or Z min and Z max ) are more sensible and, therefore, less stable [61]. Despite the point density remaining constant in our analysis, it is possible that those same metrics are also more sensitive to variations in the normalized point cloud due to errors in the DTM. This fact is important in the case of ordinary least squares regression since the predictors are selected following several rules to match the regression assumptions, so small variations in the metric values have unpredictable effects in the final model.
It should be highlighted that the quality of the estimation of stand attributes is strongly dependent on the applied modeling approach [52]. Nonparametric models, such as k-nearest neighbors or random forest, have the advantage of being distribution-free and are normally used with more predictor variables to improve their accuracy [48,62,63]. In this case, it is reasonable to suspect that using more variables would turn the models less vulnerable against changes in the metrics and, thus, the effect of the filter calibration could be hidden by an improvement in the performance of the models. However, the non-parametric approaches require a higher amount of field data for the modeling, which is not often available (as is the case of this work). Besides, traditional parametric modeling approaches have shown to be less affected by biased normalized point clouds, for instance, due to co-registration errors, so that alternative combinations of ALS metrics models can result in similar estimative efficiency [51]. The co-registration effect could be ignored in this work given the high-accurate plot positioning throughout the data collection; therefore, it corroborates that DTM quality is the major factor affecting the performances of the model in the benchmark.
The researches focused on calibration of the above tested filters are scarce in the literature and the few examples were oriented to digital aerial photogrammetry (DAP, [64,65]). One of them is the study of Graham et al. [66] who analyzed the PTIN, WLS and the simple morphological filter (SMRF, see [67]), which works similarly to the PMF. Their results share some similarities with ours: the PTIN was mostly affected by the step size parameter, while the spike had minor or no effects over the accuracy of the derived DTM; the WLS had the best accuracy with the |g| close to w; and low threshold for the SMRF (analogous in PMF). On the other hand, the parameters related to the size of the search windows of these filters (i.e., window size and step size) were exceptionally higher (≥17 m 2 ). However, we have demonstrated that using larger values for these parameters over ALS point clouds increases the susceptibility of the filters to the border effect, resulting in eroded DTM.
The PTIN was also analyzed by Wallace et al. [68] using DAP-based data, where the calibration was performed for different ecosystems. However, in contrast to the study of Wallace et al. [68], our work did not distinguish the areas regarding the terrain conditions or forest covers due to a lack of data for this discretization, especially in the forest modeling assessment. Instead, the filters were analyzed considering all ALS data available so the results of the calibration can be applied to a wider range of forest conditions. Furthermore, many works have been applied the ABA to mountainous sites with success, demonstrating that the impact of the terrain slope over the forest attribute estimation is not significant as expected [60,69,70], thereby it is unlikely that this effect can also compromise the performances of our forest models.
Most of the ground filtering benchmarks assess the accuracy through visual inspection of the filtered ground returns, which allows accounting for omission and commission errors of the filtering [20,21,26]. Although such analysis produces detailed information about the filtering process, it is highly time-consuming and can be impracticable in terms of the calibration routines like the ones performed in this study. The analysis based on the quality of DTM is thus a good and practical alternative when high-accurate ground control data is available. Additionally, further benchmarks should also account for DTM erosion, since it is prohibitive when the goal is forest modeling.
Although this work did not aim at comparing filters, it should be highlighted that all tested filters had comparable performances after the calibration, considering the accuracy of the derived DTM. This fact suggests that more efforts should be given to calibrate the ground point filters instead of finding a better one. Therefore, the software developers must be encouraged to implement adaptive filters to reduce the number of parameters to be set to process the data (e.g., [24,71,72]).
The improvement in the estimation of dominant height is of great importance for forest management since it ensures a more accurate analysis of forest site productivity [59,73]. Likewise, ALS-based models play a key role in the valuation of growing stock inventory [74,75], so reducing the errors of the estimated attributes by calibrating the filters allows increasing the liability of the assessments. However, ALS-data users must preliminarily consider the potential improvement on the DTM accuracy and forest attribute estimation before deciding to calibrate it instead of using the software's default.
This work did not consider the impact of the errors originating from different interpolation methods on the accuracy of DTM nor on the forest attribute estimation. Previous studies demonstrated the difference among the efficiency of interpolators while deriving DTM from ALS data [19,47]; despite the TIN approach usually performed the best, Stereńczak et al. [19] showed that the differences among the interpolators were reduced after calibration. Additionally, Graham et al. [76] tested several interpolation methods using DAP-data and showed that they do not have significant differences regarding the accuracy of estimated forest attributes. The same may occur in the case of ALS-data, but proper research is needed to investigate such a hypothesis. Finally, our analysis was based on a massive ground control dataset that was collected using an exhaustive and high-accurate topographic survey, which supports the liability of the DTM accuracy assessment [77]. For this reason, our results can be used as a rule of thumb and the information we provided from the filter calibration can guide the user during the ALS data processing, especially if the estimation of forest attributes is the goal.

Conclusions
The calibration of four algorithms to filter ground returns of airborne laser scanning data were assessed, namely, the progressive triangulated irregular network (PTIN), weighted linear least-squares interpolation (WLS), multiscale curvature classification (MCC) and the progressive morphological filter (PMF). The impact of the calibration was assessed on the quality of the digital terrain models (DTM) and on the forest attribute estimation accuracy, where the area-based approach (ABA) was applied. The conclusions of this work are: - The calibration of the ground filter parameters improved the quality of the DTM. - The calibrated parameter values for WLS, MCC, and PMF allowed deriving more accurate estimated forest attributes than those obtained when filtering using their default counterparts, with a more highlighted impact on the estimation of dominant height than of growing stock. - The results derived when using the PTIN filter varied the least with the calibration of the parameters.