Accuracy of Ground Surface Interpolation from Airborne Laser Scanning (ALS) Data in Dense Forest Cover

A digital model of the ground surface has many potential applications in forestry. Nowadays, Light Detection and Ranging (LiDAR) is one of the main sources for collecting morphological data. Point clouds obtained via laser scanning are used for modelling the ground surface by interpolation, a process which is affected by various errors. Using LiDAR data to collect ground surface data for forestry applications is a challenging scenario because the presence of forest vegetation will hinder the ability of laser pulses to reach the ground. The density of ground observations will be therefore reduced and not homogenous (as it is affected by the variations in canopy density). Furthermore, forest areas are generally present in mountainous areas, in which case the interpolation of the ground surface is more challenging. In this paper, we present a comparative analysis of interpolation accuracy for nine algorithms, which are used for generating Digital Terrain Models from Airborne Laser Scanning (ALS) data, in mountainous terrain covered by dense forest vegetation. For most of the algorithms we find a similar performance in terms of general accuracy, with RMSE values between 0.11 and 0.28 m (when model resolution is set to 0.5 m). Five of the algorithms (Natural Neighbour, Delauney Triangulation, Multilevel B-Spline, Thin-Plate Spline and Thin-Plate Spline by TIN) have vertical errors of less than 0.20 m for over 90 percent of validation points. Meanwhile, for most algorithms, major vertical errors (of over 1 m) are associated with less than 0.05 percent of validation points. Digital Terrain Model (DTM) resolution, ground slope and point cloud density influence the quality of the ground surface model, while for canopy density we find a less significant link with the quality of the interpolated DTMs.


Introduction
In the past, forestry research typically relied on two-dimensional representations (images) of forested areas [1][2][3]. The advent of Light Detection and Ranging (LiDAR) and the development of navigational systems such as Global Navigational Satellite System (GNSS) or Inertial Measurement Unit (IMU) enabled scientists to analyse the forest in a tri-dimensional space, using data formats such as point clouds or surface models [4]. Mounting a LiDAR sensor on an aerial platform (a method commonly referred to as Airborne Laser Scanning-ALS) allows efficient collection of high-precision data for relatively large areas. The result of LiDAR data collection is a data structure called point cloud, which can be used to obtain information related to the horizontal and vertical structure of the forest [5], from the top of the canopy to the ground level; in other words, laser scanning offers a new perception on forest structure and processes, at a local or regional level [6].
From a LiDAR point cloud, certain forest parameters can be determined, such as: aboveground forest biomass [7][8][9], Leaf Area Index (LAI) [10][11][12][13][14] or canopy density [15,16]. Furthermore, LiDAR ISPRS Int. J. Geo-Inf. 2020, 9,224 3 of 20 which involves the random separation of observations into two datasets: one used for prediction and the other for validation. This method provides a pseudo-external dataset, in situations where using an independent dataset for validation is not feasible. Any systematic error (e.g., the measurement error of the LiDAR system) will evidently not be taken into account, but the method can still provide an estimation of the relative accuracy of interpolation methods. External factors that were taken into account for the analysis of DTM accuracy are: model resolution, ground slope, ground point density and canopy cover density.
In addition, recognizing that free DEMs with worldwide coverage such as Advanced Spaceborne Thermal Emission Reflection Radiometer (ASTER) [52] or Shuttle Radar Topography Mission (SRTM) [53] are common products in forestry practice and research (especially at regional scales), we established the following as a secondary objective: a comparison between the most accurate model obtained from the interpolation of LiDAR data and the ASTER/SRTM datasets available for our test site. These models are global in coverage and operate at a different scale in terms of space (their resolution of 1-arcsecond corresponds to a cell-size of about 21 × 30 m at the latitude of our test area) and level of precision, so they are not suitable as validation data for ALS data or the interpolation methods that were tested. Therefore, results pertaining to this secondary objective are presented in Appendix A.

Test Site
The analysis was carried out in a test site with an area of 1.6 km 2 , located in the Latorit , ei Mountains of Romania, along the Bora River valley (Figure 1). Most of the area is covered with forest vegetation, mainly spruce (90 percent) and pine (10 percent). The majority of tree stands have an age of 90-95 years, with younger stands (55-75 years) also being present to a smaller extent. The most recent forest management plan indicates a high canopy density, with an average value of 0.8. The area is characterised by steep, mountainous relief, with ground slope having an average value of 26 degrees. covered by dense forest vegetation). To compare algorithm performance, we used the crossvalidation method, which involves the random separation of observations into two datasets: one used for prediction and the other for validation. This method provides a pseudo-external dataset, in situations where using an independent dataset for validation is not feasible. Any systematic error (e.g., the measurement error of the LiDAR system) will evidently not be taken into account, but the method can still provide an estimation of the relative accuracy of interpolation methods. External factors that were taken into account for the analysis of DTM accuracy are: model resolution, ground slope, ground point density and canopy cover density.
In addition, recognizing that free DEMs with worldwide coverage such as Advanced Spaceborne Thermal Emission Reflection Radiometer (ASTER) [52] or Shuttle Radar Topography Mission (SRTM) [53] are common products in forestry practice and research (especially at regional scales), we established the following as a secondary objective: a comparison between the most accurate model obtained from the interpolation of LiDAR data and the ASTER/SRTM datasets available for our test site. These models are global in coverage and operate at a different scale in terms of space (their resolution of 1-arcsecond corresponds to a cell-size of about 21×30 metres at the latitude of our test area) and level of precision, so they are not suitable as validation data for ALS data or the interpolation methods that were tested. Therefore, results pertaining to this secondary objective are presented in Appendix A.

Test Site
The analysis was carried out in a test site with an area of 1.6 km 2 , located in the Latoriței Mountains of Romania, along the Bora River valley (Figure 1). Most of the area is covered with forest vegetation, mainly spruce (90 percent) and pine (10 percent). The majority of tree stands have an age of 90-95 years, with younger stands (55-75 years) also being present to a smaller extent. The most recent forest management plan indicates a high canopy density, with an average value of 0.8. The area is characterised by steep, mountainous relief, with ground slope having an average value of 26 degrees.

LiDAR Data
LiDAR data was collected by ALS using a full-wave Riegl LMS-Q560 mounted on a Diamond Aircraft Industries airplane (the DA42 MPP). Parameters of the laser sensor and flight technical data are presented in Table 1. The resulting point cloud has an average density of 5.2 points m −2 , which was achieved by scanning the same area two or three times. LiDAR data was delivered in the Universal Transversal Mercator (UTM) projection (zone 35N), with matched scan strips but unclassified point data.
As previously discussed, the initial point cloud contains both ground-level returns (or ground points) and returns caused by various above-ground objects, such as forest vegetation [54]. To accurately model the ground surface, above-ground laser returns have to be eliminated from the initial point cloud, in a process commonly called ground filtering [26]. To achieve this, we used a combined method: initially, we applied an automatic ground filter using the lasground algorithm included in the LasTools software package (rapidlasso GmbH). The filtered point cloud was then visually checked, in order to manually correct any noticeable filtering errors. Therefore, the filtered dataset should theoretically contain only points located at (or close to) the ground level. This final point cloud has an average density of 0.9 points m −2 , corresponding to an average point spacing of 1.06 m.

Validation Data and Accuracy Assessment
To estimate the accuracy of interpolation, a validation dataset has to be used for comparison. To achieve this, we used the technique called cross-validation [55,56], which involves the separation of the initial dataset into two separate sets: one used for prediction and the other for validation. In some papers, this method is referred to as true validation [42,57]. In choosing the proportion between the subsets two aspects have to be considered [38]: (1) the validation data should contain enough observations to ensure the statistical significance of the results and (2) the integrity of the prediction set should be kept intact. There are no widely accepted guidelines for establishing the percentage of initial observations to be used for validation. As an example, we find that [42] uses 4 percent of input data for validation, while [38] uses 3 percent of data for the same purpose, [45] uses 0.1 percent of input data (but also carries out external validation using independent data) and [47] uses 10 percent of initial data to test a newly-developed method of DTM interpolation from LiDAR data.
Taking previous research into account and the specifics of our data, we established the following proportions: 95 percent of the initial ground points are used for prediction (1.06 mil. points) while the remaining 5 percent (n = 52,358) are used for validation. The validation dataset has an average density of 1 point per 31.9 m 2 and an average distance between pairs of closest points of 2.48 m 2 (SD = 1.58 m). It is reasonable to assume that this would ensure a sufficient reliability of the accuracy analysis.
In other words, we randomly select 95 percent of observations to represent the input data for DTM interpolation, with the remaining 5 percent used for estimating the accuracy of the resulting models.
The accuracy of a DTM is estimated by calculating the residual (or vertical error) for each validation point [58]: where E (i) is the vertical error (or residual value) for validation point i, Z DTM(i) is the elevation value of the raster cell in which point i is located and Z ALS(i) is the measured elevation for point i.
To clarify, Z DTM(i) is the elevation value estimated via interpolation for the centre of the DTM cell in which i is located, not for its exact location. Therefore, an extra error component is added, because of the distance between validation points and the cell centres of the raster model. This effect is evidently exacerbated in rougher terrain (where elevation has a higher local variance) and for lower modelling resolutions (larger cell sizes). Based on the residual values, accuracy estimates such as mean signed/unsigned error, standard deviation and Root Mean Square Error (RMSE) were calculated.
To get a clearer picture of the overall DTM accuracy, absolute (unsigned) vertical errors were calculated and classified. Error classification is by necessity subjective, as there is no accepted standard in the field of forestry for this aspect. In our case, we took into account the accuracy that is usually strived for in practice. For example, we considered absolute elevation errors of less than 0.20 m to be very low. In the end, we defined the following classification for absolute residual values: very low (less than 0.20 m), low (between 0.21 and 0.50 m), significant (between 0.51 and 1.00 m) and extreme (over 1.00 m).
We emphasize that the lack of independent data for external validation means that any of the reported accuracy estimates should be taken as a measure of the relative performance of interpolation methods, not as absolute indicators of the accuracy of elevation estimates. As previously stated, the main purpose of this paper is to compare the relative performance of different interpolation methods, not to estimate absolute geodetical errors, nor to model the error budget.

Interpolation Algorithms
In a general sense, an interpolation algorithm involves the definition of a prediction function, which can be used to estimate the values for unsampled points, based on the known values of neighbouring points.
Most methods of interpolation use weighted averages of known data points to predict values for unsampled points, with the difference being the method by which these weights are calculated. Therefore, for many of these algorithms the general formula for prediction can be written as follows [59]: whereẑ(x 0 ) denotes the estimated value for point x 0 , z(x i ) is the measured value for point x i , λ i is the weight asociated with observation z(x i ) and n is the total number of reference points used for the estimation ofẑ(x 0 ). Nine interpolation algorithms were tested:

1.
Inverse Distance Weighted (IDW) is one of the most widely used algorithms for surface modelling.
The weight attributed to a reference point is based on its distance from the unsampled point for which the estimation is made, with reference points further away having a lower weight [60]. IDW is an intuitive method of interpolation, in that it assumes that there is a degree of spatial autocorrelation of measured values; in other words, closer points are likely to have similar values.

2.
Nearest Neighbour (NeN) is a simple approach: each unsampled point will be given the value of its closest measured point. This is achieved by constructing Thiessen polygons for all reference points. The Thiessen polygon of a point is the area of influence associated with that point; all unsampled points inside a polygon are closer to its centre than to any other reference point. Therefore, all unsampled points inside a Thiessen polygon will be assigned the value of the reference point in the centre of that polygon.

3.
Natural Neighbour (NN) is similar to Nearest Neighbour, but more complex [61]. The first step is the generation of a network of Thiessen polygons. Unsampled points for which the prediction is made are then overlaid with the network and a new set of Thiessen polygons are generated for them. Each new polygon will overlap with one or more of the original polygons. The centre points of those polygons are called the "natural neighbours" of the unsampled point used to generate the Thiessen polygon. The weight each reference value has in the interpolation of an unsampled point is based on the degree of overlap between initial Thiessen polygons and the polygon associated with the unsampled point.

4.
Delauney Triangulation (DT) involves the generation of a Triangular Irregular Network (TIN) from the input point data. The general principle of a Delauney triangulation network is that no known point is inside the circumcirle of any of the generated triangles [62]. By following this approach, a Delauney network will maximise the minimum angle of all triangles and ensure that triangle edges are relatively short. Each unsampled point is located inside a triangle of the network and its predicted value is calculated via a simple linear or polynomial interpolation using the known values for that triangle's vertices.

5.
Spline-based interpolators are a class of global, non-convex interpolation algorithms, which involve the fitting of a flexible surface (commonly called spline) through a set of measured observations. The interpolation function will pass through (or close to) all measured values. These algorithms work on the assumption that each measured value has an inherent measurement error, which can be reduced by generating a smooth surface of minimal tension [43]. This local smoothing is achieved by generating a spline surface of minimal tension. The main difference between these algorithms is the mathematical form of the function used for surface generation.
For this paper, we tested four spline-based algorithms: • Multilevel B-Spline (BS), which uses a hierarchy of coarse-to-fine control lattices to generate a sequence of B-Spline functions, the sum of which is the final interpolation function [63].

•
Cubic Spline (CS), which is based on the construction of a bivariate cubic spline function [64]. • Thin-Plate Spline (TPS), which uses the namesake function to generate surfaces. Thin-plate Spline is the 2D generalization of the cubic spline function [65].

6.
Ordinary Kriging (OK) is a geostatistical interpolation procedure that aims to quantify the degree of spatial auto-correlation of measured values. This is done by generating a semi-variogram, which depicts the overall shape, magnitude and spatial scale for the variation of the measured variable [66]. Over this graph a model is then fitted, which is used to assign weights to observations. These weights used for interpolation are therefore a function of the spatial co-variance of the observations [46].
A summary of the interpolation methods, their main characteristics and the abbreviations used going further are presented in Table 2. Most algorithms are deterministic, with the exception of Ordinary Kriging, which is geostatistical. Furthermore, prediction methods vary in terms of: scale of analysis (some estimate values on a global scale, rather than local), smoothing of predictions at sampled locations (exact vs. approximate methods) or shape (convex algorithms will always predict values inside the range of known values and predictions for non-convex algorithms can potentially fall outside the range of sampled values).

Factors Considered for DTM Accuracy
The variation of DTM accuracy was analysed by taking into account the following factors: • Spatial resolution, which is the main characteristic of a raster data structure (such as a DTM); four model resolutions were tested: 0. Canopy density was determined in a raster structure, using the initial LiDAR point cloud and the formula proposed by [68], implemented in the FUSION LiDAR processing software: where δ i is the canopy density for cell i of the model, n is the number of LiDAR returns inside cell i that are above a user-established height threshold and N is the total number of returns inside cell i. Height threshold was set to 3 m, assuming that LiDAR returns below this height are likely objects such as understory vegetation, fallen tree trunks, boulders etc.
To ensure that the estimated canopy density is meaningful, the model resolution has to be larger than a typical tree crown diameter. We took into account the fact that, at the time of LiDAR data collection, most tree stands in our test area were close to harvesting age, in which case Romanian forestry guidelines recommend an average distance between trees of 6-8 m [69]. Therefore, we set a value of 15 m for the resolution of the canopy density raster model, in the assumption that most, if not all, tree crowns should have a canopy diameter below this.

General Acuraccy of Ground Surface Interpolation
By interpolation of a LiDAR point cloud containing ground returns, continuous surfaces that model the variation of ground elevation (or DTMs) were generated. A validation dataset was then overlaid with this surface in order to calculate residual values (or elevation errors). Statistical indices of these residuals, such as mean unsigned/signed error or RMSE, provide an estimate of the general accuracy of DTMs. Accuracy measures for DTMs generated from the filtered ALS point cloud are presented in Table 3, for the most accurate model resolution (0.5 m). In all cases, regardless of interpolation method or model resolution, mean signed error is less than 0.01 m. This indicates that there is no apparent bias of interpolation; in other words, no significant tendency to over/under-estimate elevation values. Mean unsigned error and RMSE values are relatively similar between algorithms, for a given resolution. As an example, mean unsigned error at a model resolution of 0.5 m varies between 0.08 m (for five of the algorithms) and 0.20 m (for NeN and OK).
The classification of absolute residual values is presented in Table 4. Since the relative distribution of errors across classes is very similar between model resolutions, only the results for the 0.5 m resolution are presented.  In addition to the quantitative analysis of accuracy, we propose that a qualitative (or visual) analysis of generated surfaces is also of interest, especially in cases where interpolation methods do not have a significant difference in terms of RMSE values. A hillshade generated from the final DTMs (at the 0.5 m resolution) for a subset of the test area is presented in Figure 2. It is apparent that most interpolated surfaces have some degree of visual artefacts. Most distinctive artefacts are present for interpolation with NeN (which has a significant noise-level), DT (mostly in areas of lower point density, where the facets of the initial TIN are larger), IDW (also predominantly in areas of low point density) and CS. Visual analysis shows that spline-based methods generate smoother surfaces, closer in appearance to the real ground surface.

Effect of Model Resolution on DTM Accuracy
RMSE values for each model resolution (0.5, 1.0, 1.5 and 2.0m) are presented in Figure 3. For each interpolation algorithm, a lower general accuracy is observed for coarser resolutions. In other words, for a specific resolution, the relative performance of tested algorithms is for the most part similar.

Effect of Model Resolution on DTM Accuracy
RMSE values for each model resolution (0.5, 1.0, 1.5 and 2.0 m) are presented in Figure 3. For each interpolation algorithm, a lower general accuracy is observed for coarser resolutions. In other words, for a specific resolution, the relative performance of tested algorithms is for the most part similar.

Effect of External Conditions on DTM Accuracy
In order to analyse the manner in which accuracy varies depending on external factors, validation points were categorised according to established classes for each factor. Mean RMSE values for the resulting subsets of validation points were then calculated. Based on these mean RMSE values for each category, r correlation coefficients were calculated (Table 5).  Three external factors were considered in the analysis: ground slope (Figure 4), ground point density of the ALS point cloud ( Figure 5) and canopy cover density ( Figure 6). As we find that RMSE values follow a similar variation regardless of model resolution, only results for the 0.5m resolution are shown.

Effect of External Conditions on DTM Accuracy
In order to analyse the manner in which accuracy varies depending on external factors, validation points were categorised according to established classes for each factor. Mean RMSE values for the resulting subsets of validation points were then calculated. Based on these mean RMSE values for each category, r correlation coefficients were calculated (Table 5).  Three external factors were considered in the analysis: ground slope (Figure 4), ground point density of the ALS point cloud ( Figure 5) and canopy cover density ( Figure 6). As we find that RMSE values follow a similar variation regardless of model resolution, only results for the 0.5 m resolution are shown.

Discussion
Results indicate that, for the most part, tested interpolation algorithms have a relatively similar performance, in terms of overall DTM accuracy. Mean signed errors are very close to zero in all cases, indicating the lack of a prediction bias. It is worth pointing out that some previous studies find a positive bias of elevation prediction, in the case of dense forest vegetation. For example, [70] reports a mean DTM error of 0.31 metres for unlogged coniferous tree stands, while [45] report a mean DTM error of 0.23 metres for uncut aspen tree stands. The effect (or lack thereof) of forest composition and/or harvesting practices on the bias of ground elevation estimates from ALS data is worth further investigation.
Accuracy estimation for different model resolutions shows that, regardless of interpolation algorithm, lower DTM resolutions are associated with higher RMSE values ( Figure 3). This is in accordance with previous research [38,44,46], but it is worth pointing out that at lower resolutions the distance between raster cell centres (the points for which prediction is made during the interpolation) and validation points will likely increase.
With regard to ground slope, previous research indicates that it should have an impact on the accuracy of the ground surface representation [44,45,49,71]. This is indeed the case here, as we find that RMSE values for all algorithms increase with ground slope. However, not all algorithms are equally affected. This is especially observed in the case of very steep terrain (ground slope of over 40 degrees), where some of the algorithms have a comparatively poorer performance (Figure 4). This is confirmed by r correlation coefficient values, which highlight a very significant correlation between mean RMSE values and ground slope categories (p-value < .01) for these algorithms (Table 5). On the other hand, for most spline-based methods (which achieve a better overall performance) we find that the correlation between RMSE and ground slope is insignificant (p-value > .05). There is no clear provable explanation for this behaviour, but our assumption is that it has to do with what we called the 'shape' of algorithms ( Table 2). As previously discussed, some algorithms (called convex) will always predict values inside the range of known values, while non-convex algorithms can predict values outside this range. In very rugged terrain (high slope values) it is unlikely that sampled points will cover the whole range of real elevation values. In other words, local maxima and minima will generally fall outside the sampled interval of elevation values, therefore will not be properly modelled by convex algorithms. On the other hand, non-convex algorithms (generally spline-based) are better suited to predict local maxima and minima even when those values fall outside the range

Discussion
Results indicate that, for the most part, tested interpolation algorithms have a relatively similar performance, in terms of overall DTM accuracy. Mean signed errors are very close to zero in all cases, indicating the lack of a prediction bias. It is worth pointing out that some previous studies find a positive bias of elevation prediction, in the case of dense forest vegetation. For example, [70] reports a mean DTM error of 0.31 m for unlogged coniferous tree stands, while [45] report a mean DTM error of 0.23 m for uncut aspen tree stands. The effect (or lack thereof) of forest composition and/or harvesting practices on the bias of ground elevation estimates from ALS data is worth further investigation.
Accuracy estimation for different model resolutions shows that, regardless of interpolation algorithm, lower DTM resolutions are associated with higher RMSE values ( Figure 3). This is in accordance with previous research [38,44,46], but it is worth pointing out that at lower resolutions the distance between raster cell centres (the points for which prediction is made during the interpolation) and validation points will likely increase.
With regard to ground slope, previous research indicates that it should have an impact on the accuracy of the ground surface representation [44,45,49,71]. This is indeed the case here, as we find that RMSE values for all algorithms increase with ground slope. However, not all algorithms are equally affected. This is especially observed in the case of very steep terrain (ground slope of over 40 degrees), where some of the algorithms have a comparatively poorer performance (Figure 4). This is confirmed by r correlation coefficient values, which highlight a very significant correlation between mean RMSE values and ground slope categories (p-value < 0.01) for these algorithms (Table 5). On the other hand, for most spline-based methods (which achieve a better overall performance) we find that the correlation between RMSE and ground slope is insignificant (p-value > 0.05). There is no clear provable explanation for this behaviour, but our assumption is that it has to do with what we called the 'shape' of algorithms ( Table 2). As previously discussed, some algorithms (called convex) will always predict values inside the range of known values, while non-convex algorithms can predict values outside this range. In very rugged terrain (high slope values) it is unlikely that sampled points will cover the whole range of real elevation values. In other words, local maxima and minima will generally fall outside the sampled interval of elevation values, therefore will not be properly modelled by convex algorithms. On the other hand, non-convex algorithms (generally spline-based) are better suited to predict local maxima and minima even when those values fall outside the range of observations. This assumption is warranted by the fact that three out of the five convex algorithms that were tested have a degrading relative performance when ground slope increases (these being NeN, IDW and OK). This is in line with previous research: [42] tested IDW and three spline-based interpolation methods and noted that IDW has a lower performance in rugged test plots because of its inability to properly model local maxima/minima; [43] tested four interpolation methods in a hilly area and reported that IDW produced a higher rate of error clustering near the top of the hill (local maxima) when compared to non-convex methods like TPS or Multi-Quadratic.
On the other hand, at least one of the algorithms that we tested (DT) follows the opposite trend: even if DT is convex, this method actually has the best accuracy at the highest slope class (relative to the other algorithms). Further research is necessary to establish if this is something that has to do with DT's approach to interpolation or random behaviour.
The next external factor that was considered is the ground point density ( Figure 5). While the reduction of point density has a noticeable effect on RMSE for NeN, IDW and OK algorithms, the other interpolation methods that were tested have a lower variation of RMSE across point density classes. A higher variance of RMSE values between interpolation methods for lower point densities is also reported by [72]. For most algorithms, we find a significant negative correlation between mean RMSE values and point density categories (Table 5), with the sole exception of TPS (r = -.87, p-value > 0.05).
With regard to canopy density, we find that algorithms do not follow the same trend ( Figure 6). Some of the methods that were tested (NN, DT, BS and spline-based methods except CS) have near-constant mean RMSE values across canopy density categories, while others have lower accuracy in dense canopy cover (NeN, IDW and OK). In addition, OK is the only algorithm for which we find a significant positive correlation between canopy density and RMSE (r = 0.95, p-value < 0.05).
It is worth pointing out that the distribution of validation points across canopy density categories is strongly skewed to the right: while only 4 percent of the validation points are assigned to the first canopy density category (0-10 percent canopy cover), the last category (80-100 percent canopy cover) includes 58 percent of the validation points. Therefore, mean RMSE values calculated for the lower canopy density classes have a relatively large 0.95 confidence interval.
There is no significant body of research that focuses on the effect that vegetation conditions have on the accuracy of ground-surface modelling from ALS data, especially for the specific case of forest eco-systems. Furthermore, vegetation conditions can be looked at from different perspectives: while in our study we only considered the density of the canopy cover, other researchers looked at DTM accuracy for different vegetation classes [45] or for varying understory vegetation heights [49]. As a result of this, it is difficult to put the results presented in Figure 6 into perspective.
Leaving aside the lowest canopy density class (0-10 percent) due to its unreliability (a higher 0.95 confidence interval due to a low number of validation points), it is apparent that, generally speaking, canopy density has a similar effect as ground point density ( Figure 5), except inversed (since an increase of canopy cover density should cause a decrease of ground point density). Therefore, if we look at this two factor in conjunction, we can see that: (1) several algorithms are not affected by changes in ground point/canopy density-NN, DT, BS, TPS, TPS TIN ; (2) some algorithms have a decreased performance at higher canopy density/lower ground point density-NeN, OK, IDW and (3) the rest of the algorithms seem to have random changes in terms of accuracy between classes of canopy density/ground point density.
In other words, results would indicate that canopy density acts as a proxy of ground point density, in terms of its influence on the accuracy of ground-surface modelling in dense forest environments. This is contrary to our initial assumption, which was that canopy density (while evidently linked to a certain extent with ground point density) would exhibit a different influence on DTM interpolation. This assumption was made by considering the theoretical effect that canopy density would have on the spatial distribution of ALS ground points. While in open terrain the variation of ground point density (for example by changing the altitude of the LiDAR sensor platform) would be evenly spread (point density is generally homogenous), increased canopy density would not only cause a general lowering of ground point density, but also an increase in the clustering of these points (with most of observations grouped under gaps in the canopy cover).
However, results show that this expected clustering of ground points in areas of higher canopy density is either not significant, or it does not have a noticeable influence on the interpolation of the ground surface model. Further research needs to be carried out before we can state that the effect of canopy density on DTM accuracy is fully explained by changes in terms of ground point density. This additional research should be ideally carried out in areas with different forest compositions and with a distribution of canopy densities closer to the normal (as stated, canopy density values for our test site are skewed to the right).
To sum up, results indicate that for our test conditions (mountainous areas with dense canopy cover), the relatively lower accuracy of some interpolation methods (namely Nearest Neighbour, Inverse Distance Weighted and Ordinary Kriging) is likely attributed to areas of steep slope and/or lower point densities caused by the presence of dense forest cover.

Conclusions
An accuracy assessment of DTM interpolation from LiDAR point cloud data in difficult terrain conditions (mountainous relief with dense forest cover) was carried out. In lack of independent data at a sufficient density and accuracy, we had to carry out the performance analysis of interpolation algorithms using cross-validation, which involves separating a certain proportion of the initial dataset (in our case 5 percent) to use as validation. The limitation of this approach is that any systematic errors (which affect the initial dataset in its entirety) are not taken into account, so none of the reported accuracy estimates are to be taken as global estimates of modelling accuracy. In other words, the results that were presented can only be considered as indicators of the relative performance of the tested interpolation methods in a densely-forested area, which was indeed our main purpose.
Nine interpolation methods were tested and results show that, as far as general DTM accuracy is concerned, algorithm performance is relatively similar. Therefore, we propose that a visual analysis of interpolated DTMs is warranted, alongside the quantitative one. In addition, such an analysis has the advantage that it can easily inform the end user about the quality of the ground surface model [73]. Significant visual artefacts are present for Inverse Distance Weighted, Nearest Neighbour and Ordinary Kriging. On the other hand, surfaces generated by spline-based methods are generally artefact-free; as they apply a degree of smoothing to the generated surface, DTMs obtained using these methods are more appealing (in terms of visualisation). This is not to say that visual analysis is as important as the quantitative one, since a better-looking model does not necessarily mean a more accurate prediction.
With regard to TPS and its variant TPS TIN , results do not highlight any significant difference between their respective DTMs. Therefore, the significantly longer processing time of TPS TIN (relative to TPS) is not justified.
The best overall accuracy is achieved by spline-based methods and the Natural Neighbour algorithm. These methods have a similar response to the variations of ground slope, ground point density and, to a lesser extent, canopy cover density. However, it is worth mentioning that spline-based algorithms generally involve a time-consuming process of parameter optimisation, while Natural Neighbour does not require any user input.
Overall, results indicate that ALS point clouds are a viable source for accurate ground surface modelling even in less than ideal conditions, such as steep slopes covered with dense forest vegetation. However, issues concerning model resolution, interpolation method and parameter optimisation need to be taken into account.
While the effect of model resolution, ground slope and the ground point density of the ALS point cloud have been established for our test area, the influence of canopy density on different interpolation algorithms is not entirely clear and warrants further research.
A secondary objective of this paper was to provide a comparison between the most accurate DTM we obtained and two freely available elevation models that are commonly used in forestry operations and research (especially on a regional scale): ASTER and SRTM (both at v3, the latest version available) which will be identified as DTM ASTER and DTM SRTM going forward. The DTM interpolated from ALS data that was chosen for this comparison was generated using the Natural Neighbour (NN) algorithm, and will be identified as DTM N . This algorithm was chosen not only because of its comparatively good performance (on par with spline-based methods), but also because of its advantages from the end-user perspective: no parameters to be set, easy to understand approach to interpolation and a faster generation of DTMs than most other algorithms. Elevation values of DTM NN were translated to the EGM96 vertical datum, in order to coincide with the datum of ASTER/SRTM data.
The comparison was carried out at five spatial resolutions: 0.5, 1.0, 1.5, 2.0 and 21.0 m. DTM ASTER and DTM SRTM had an initial resolution of 1 arc-second (a cell-size of approx. 21 × 30 m at the latitude of our test site) so had to be resampled in order to get the appropriate resolutions. It is worth pointing out that resampling does not add any new information to the ASTER/SRTM models, but it is necessary in order to overlay and compare the models. Additionally, in order to test DTM ASTER and DTM SRTM at their native resolution, one of the DTM NN models (initially at a 0.5 m resolution) was downgraded to a resolution of 21.0 m by averaging cell values. Figure A1 presents the DTMs for a subset of the test site, for the three datasets that were compared. Visual analysis shows that DTM NN offers a much more detailed representation of the ground surface, highlighting local features such as gulleys or forest tracks. Meanwhile, DTM ASTER and DTM SRTM only offer a more general perspective, representing larger changes of the landscape.
For a quantitative comparison, raster models of vertical deviation ( Figure A2) were generated for the ASTER dataset (by subtracting cell values of DTM NN from DTM ASTER ) and for the SRTM dataset (by subtracting cell values of DTM NN from DTM SRTM ). Since the purpose of this appendix is to compare (rather than validate) SRTM/ASTER data with a DTM interpolated from ALS data, we will refer to the differences of elevation values between these models as vertical deviation or vertical displacement instead of vertical error. General statistics for vertical displacement between ASTER/SRTM data and DTM NN are presented in Table A1.
While the mean signed displacement values show significant differences in terms of deviation from DTM NN for both DTM ASTER and DTM SRTM , the other statistics (for example the root mean sq. of vertical displacement values) indicate that the two models have an overall similar deviation from DTM NN . No effect of changes in terms of model resolution on the overall displacement between ground-surface models is noticeable. This is not surprising, as between the original ASTER/SRTM model at a resolution of approx. 21.0 m and the models resampled at higher resolutions there are no functional differences. Going by the root mean sq. of vertical deviations, we can summarize that an overall difference of around 40-42 m is to be expected between ASTER/SRTM data and DTM NN , regardless of model resolution.
Results show that, for both ASTER and SRTM models, a large proportion of vertical displacement values are over 10.0 m. While the distribution of values per classes is similar between the two models, DTM SRTM seems to deviate less from DTM NN , at least in some areas.
A representation of two vertical displacement raster models is presented in Figure A2. These models are very similar in appearance regardless of resolution so only the result for the 21.0 m resolution is shown. ISPRS Int. J. Geo-Inf. 2020, 9, x FOR PEER REVIEW 17 of 21 resolutions, only the results for the 21.0m resolution (closest to the native resolution of ASTER/SRTM data) are presented (Table A2). Results show that, for both ASTER and SRTM models, a large proportion of vertical displacement values are over 10.0 metres. While the distribution of values per classes is similar between the two models, DTMSRTM seems to deviate less from DTMNN, at least in some areas.
A representation of two vertical displacement raster models is presented in Figure A2. These models are very similar in appearance regardless of resolution so only the result for the 21.0m resolution is shown. Figure A1. DTMs for a subset of the test site, generated from: (a) ASTER data, at the original 21.0m resolution, (b) SRTM data, at the original 21.0m resolution and (c) ALS data, interpolated at 0.5m resolution using the Natural Neighbour algorithm.  Visual analysis of these representations indicates a significant effect of slope exposition on the direction the vertical displacement takes, for both ASTER and SRTM models. Slopes facing westward (on the eastern side of the test plot) have negative deviations (ASTER/SRTM models are below DTM NN ) with deviations seemingly increasing with altitude. Deviations for DTM ASTER reach values as low as −82 m in the highest areas, while DTM SRTM deviations are as low as −90 m in the same areas. Meanwhile, slopes facing eastward (mostly on the western side of the test plot) have positive deviations (ASTER/SRTM models are above DTN NN ) with deviations also seemingly increasing with altitude. On this side, deviations for DTM ASTER go up to values as high as 80 m in the highest areas, while DTM SRTM deviations are as high as 66 m in the same areas. Figure A1. DTMs for a subset of the test site, generated from: (a) ASTER data, at the original 21.0m resolution, (b) SRTM data, at the original 21.0m resolution and (c) ALS data, interpolated at 0.5m resolution using the Natural Neighbour algorithm.