Assessing the Ability of Image Based Point Clouds Captured from a UAV to Measure the Terrain in the Presence of Canopy Cover

Point clouds captured from Unmanned Aerial Systems are increasingly relied upon to provide information describing the structure of forests. The quality of the information derived from these point clouds is dependent on a range of variables, including the type and structure of the forest, weather conditions and flying parameters. A key requirement to achieve accurate estimates of height based metrics describing forest structure is a source of ground information. This study explores the availability and reliability of ground surface points available within point clouds captured in six forests of different structure (canopy cover and height), using three image capture and processing strategies, consisting of nadir, oblique and composite nadir/oblique image networks. The ground information was extracted through manual segmentation of the point clouds as well as through the use of two commonly used ground filters, LAStools lasground and the Cloth Simulation Filter. The outcomes of these strategies were assessed against ground control captured with a Total Station. Results indicate that a small increase in the number of ground points captured (between 0 and 5% of a 10 m radius plot) can be achieved through the use of a composite image network. In the case of manually identified ground points, this reduced the root mean square error (RMSE) error of the terrain model by between 1 and 11 cm, with greater reductions seen in plots with high canopy cover. The ground filters trialled were not able to exploit the extra information in the point clouds and inconsistent results in terrain RMSE were obtained across the various plots and imaging network configurations. The use of a composite network also provided greater penetration into the canopy, which is likely to improve the representation of mid-canopy elements.


UAS Image-Based Point Clouds in Forestry
Unmanned Aerial Systems (UAS) equipped with imaging sensors are increasingly being used to capture 3D information describing forested areas [1][2][3][4][5][6][7]. Estimates of canopy height, canopy cover, tree count and biomass have all been successfully derived from this 3D information [1,2,8,9]. As such, the low cost of UAS and the flexible data collection strategies they allow have led to them being trialled in a range of applications requiring accurate characterisation of forests [7].
Most research making use of UASs equipped with imaging sensors to measure 3D forest structure follow similar workflows. Imagery is captured with high forward-and side-lap (typically >80%) along parallel flight lines [6]. This allows the use of Structure from Motion (SfM) algorithms to estimate the 3D structure of the environment based on the geometry of matching points found within two or more images [5]. This process produces a point cloud from which the required information can then be derived based on the vertical distribution or spatial arrangement of the points [4,5].
Several studies have evaluated the effect of varying the parameters used in this workflow on the canopy representation provided. Dandois et al. [5], for example, undertook a comprehensive examination of the flying parameters (image overlap, camera settings) and environmental conditions (cloud cover) on the outcomes of this workflow. This study showed that the information retrieved about the canopy can be optimised primarily through the use of high image overlap on clear sky days [5]. Similarly, Frey et al. [6] showed that high image overlap combined with a low ground sampling distance produces improved canopy information. Other studies have shown that the outcomes can vary with the use of different software packages and parametrisations used in the image reconstruction and metric extraction stages [5,10].

Describing the Terrain
Having accurate information describing the terrain is essential for forest measurement using 3D data [11]. Accurate terrain height estimation allows the point cloud to be normalized such that each points height attribute represents its nadir distance from the ground. Three main approaches have been employed within the literature to obtain this terrain information for use with UAS image-based point clouds: 1. using information contained in the point cloud to describe the terrain; 2.
using a description of the terrain derived from an alternate source; and 3.
extracting information from the point cloud without description of the a terrain.
Image-based reconstruction of a 3D scene relies on having multiple views of the same feature in order to provide an estimate of the relative 3D location of that feature with reference to other features in the scene [12]. When the target feature is the terrain, canopy elements (along with understorey/ground vegetation) can obscure the terrain reducing the likelihood of multiple views occurring within an image network. This results in limited terrain points being resolved in areas of high canopy cover. Nevertheless, several studies have extracted terrain description from UAS image-based point clouds, primarily in open canopy forests, to provide accurate terrain measurements. Wallace et al. [1], Giannetti et al. [2], Graham et al. [13], Goodbody et al. [14] for example showed that terrain information could be retrieved with similar accuracy (root mean square error (RMSE) of between 0.5-1.5 m) to that obtained using laser scanning datasets.
Given the reliance on having multiple views of the terrain area can not be resolved in all canopy conditions, alternative remote sensing datasets such as Airborne Laser Scanning (ALS) and RADAR have been used to provide a more complete source of terrain information [5,15]. In the case of ALS data terrain representations can be extracted with relative accuracies of between 0.1 and 1 m depending on the environment and the properties of the data capture (i.e., sensor, footprint size, point density and flying height) [16,17]. While using novel approaches to coregister the two datasets, studies such as Jayathunga et al. [18], and Puliti et al. [19] have successfully provided measures of forests metrics, timber volume, Lorey's mean height, stem number and basal area. Although the use of alternative sources of terrain information can be successful, often this data is not available and when it is, the co-registration of two datasets is complicated by the use of low cost positioning sensors onboard UAS [13].
A third approach to provide a description of the forest properties are modelled based on the metric's explanatory variables extracted from the unnormalized point cloud and from a Digital Surface Model [2]. Using this approach Giannetti et al. [2] showed that the growing stock volume of European mixed forests could be modelled with similar accuracy (relative RMSE less than 20%) to models using point clouds normalized using ALS data. Although this technique was successful in retrieving merchantable timber volume in the studied environments, the models used to capture variations in this metric were complex and employed different combinations of variables between the sites. This suggests the approach is non-transferable between sites and is likely to limit the use of this approach to local case studies.

Aims and Objectives
Given the limitations of alternative approaches to obtaining terrain information, understanding the limitation of stand-alone UAS surveys for capturing terrain information is important for their use in operational contexts. This paper seeks to investigate the information content describing the ground that can be achieved from a low cost UAS across a range of canopy conditions, and the accuracy at which this content can be retrieved using automatic classification algorithms. Furthermore, new approaches to data capture are proposed that may allow both information describing the terrain and the canopy to be enhanced within these environments.

Study Area and Reference Data
Data were captured at six sites, spanning a range of typical canopy cover conditions found in Chile (Table 1). Two sites in pine plantations in the Maule region of Central Chile: located in Pantanillos (PNT01) and Empedrado (EMP01). PNT01 was burnt in the 2016 Chilean wildfires and while having a tree height of approximately 10 m the canopy cover was minimal, consisting only of leafless stems. The trees at EMP01 were approximately 15 m tall with overlapping crowns along planting rows and space between rows. Both PNT01 and EMP01 were located in relatively flat terrain. A second site at Empedrado (EMP02) was located in a disturbed native forest with a 15-17 m overstorey consisting of Nothofagus glauca forest ("hualo") and understory with some sclerophyll species such as Lomatia hirsuta, Aristotelia chilensis and Baccharis sp. A final site at Frutillar (FRT01) consisting of natural vegetation is constituted by a relict of the Valdivian Laurifolio Forest with Eucryphia cordifolia (ulmo), Laureliopsis philippiana (tepa) and Aextoxicon punctatum (olivillo) as dominant species. Chusquea quila (quila or "chilean bamboo") is the main understory species. At each site, a 10 m radius plot was established within a representative area. Ground control points (GCPs) for linking the airborne data to terrestrial information comprised of white plastic chopping boards with a reference point marked in black electrical tape. These marks provided accurate identification of the GCP locations in the images, as well as being easy to locate for ground-based measurements. For each of the plots, between 10 to 15 GCPs were distributed around the plot and immediate surrounds, with emphasis placed on locating boards so they could be seen in the airborne imagery (i.e., below canopy gaps). The position of each GCP was measured within a local coordinate system with a Sokkia total station. The instrument was set as near as practicable to the central point of each plot, and an arbitrary local reference direction was set.
Reference terrain points were also captured using the same total station. If the density of vegetation at the near-surface level was low enough to allow for it, ground heights were recorded by taping out a 2.5 m × 2.5 m grid and measuring the height at each point on the grid. The understorey vegetation at EMP02 and ENS02 made taping a grid impractical. As such spot heights were measured across the plot where accessible, at a point density equal to or greater than a 2.5 m grid.

Data Collection and Point Cloud Generation
Images were captured from a DJI Phantom Pro-4 multirotor UAS using the platform's on-board 20 megapixel camera. The camera has a 25 mm CMOS sensor with an 84 • field of view and a nominal focal length of 8.8 mm/24 mm (35 mm format equivalent). At each plot two UAS flights were flown autonomously at a flying height of approximately 50 m above the ground surface. Vehicle speed was typically about 2 m/s, with shutter speeds at 1/200 s or faster, to reduce the possibility of image blur. The first flight was flown to capture nadir looking imagery, with images taken at 3 s intervals to achieve a nominal forward overlap of 90 %, while a 90 % side overlap was maintained for all flights. The second flight was flown along the same flight orientation with the same flying speed but this time capturing oblique imagery at 15 • off nadir. Flight lines for this image capture were extended to accommodate for the loss of image coverage on the trailing side of the platform.
From the two flights, three different image networks were constructed for use in generating a point cloud. The first and second point clouds were constructed with the nadir images and oblique images respectively. The third point cloud was generated by selecting two thirds of the nadir image and oblique imagery and merging these into a composite image network. The image selection was designed to maximize the average minimum distance between images within 5 by 5 m subsets across the image network space, where the average minimum distance d x was calculated as; where d x is the distance between each image and its nearest neighbour in the image set X [20]. This approach was taken to ensure there were sufficient images to achieve adequate image matching across the entire target area, whilst avoiding images captured from collocated positions being used in the SfM solution. Point clouds were generated for each image network using Photoscan Pro v1.4.1, Agisoft, Russia. The high quality setting was used for the align photo stage and the high setting with mild depth filtering for the build dense cloud phase. The camera calibration parameters solved in the composite solution were used and held fixed when processing the nadir and oblique image networks. This approach was taken as the composite solution provides a stronger imaging network and the control points were typically better distributed across the image space, resulting in a more robust set of camera calibration parameters.
To register each point cloud within the local coordinate system, each GCP was identified in a minimum of five images from each image set. Any GCPs which could not be identified in five images in any of the image sets for each site were eliminated from use as control. This resulted in between six and 12 GCPs available at each site. From this set of GCPs, a minimum of four were used to register the point cloud, while a consistent set of two to four GCPs were used as check points to ensure a similar registration accuracy was achieved for the point cloud from each image network.
After the point clouds were generated and registered they were clipped to a 25 m radius area surrounding the plot center. Commonly, image based point clouds within forests will contain a number of points below the expected ground level. To remove these points, a noise filter following Rusu et al. [21] was applied to the point cloud to identify any spurious points. This filter calculates the mean distance to the 100 nearest neighbours and then identifies any points with a mean distance 1 standard deviation (σ) greater than mean of the population as noise. Any points identified as noise were not used in the ground identification process, however, any points identified as noise but occurring above the ground were included in the calculation of vegetation profiles. Thus this process eliminated any below ground points from evaluation.

Ground Filtering
In this study, three approaches were used to identify ground points. The first approach involved manually identifying the ground points in each point cloud. This was completed using a custom user interface that displayed a 0.50 m wide transect of the point cloud, and the estimated ground surface based on the linear interpolation of the reference surface network. Ground points were then identified by an operator utilising the reference information, the point cloud geometry, the point colour, as well as knowledge of the site conditions.
In addition, two existing ground filters, developed for use with ALS data, were applied. The two filters trialled were the lasground filter implemented in LAStools, which is a Progressive Triangulated Irregular Network (TIN) densification algorithm [22], and the Cloth Simulation Filter (CSF) [23]. Both filters require parameterisation to achieve an accurate result. These parameterisations were achieved by running the filters multiple times across the parameter sets outlined in Table 2. The optimum result was determined based on the classification accuracy in comparison to the set of manually identified ground points. Mathews Correlation Coefficient (MCC) was used as the classification metric. MCC was used as it is a balanced measure of accuracy even in the presence of classes of different sizes and in this case it was expected that there would be significantly more vegetation points than ground points [24].
In order to account for differences in point density across each scene the points were projected into a 0.01 m resolution voxel space prior to determining the classification outcome. This approach was taken as the contribution of isolated ground points to improving the final accuracy of the terrain representation is likely to be greater than the inclusion of all points from a densely covered area.

DTM Creation
From the set of ground points obtained through filtering, a Digital Terrain Model was constructed. This is typically achieved by interpolating between the available ground points and generating a surface model. There are a number of different algorithms available to achieve this. Given the small size of the plot areas and the likelihood that these algorithms are all likely to perform similarly across such small areas [25], a Triangular Irregular Network (TIN) interpolation approach was adopted and applied. To achieve this, first the set of ground points were decimated by removing all points that were not the minimum within a 0.01 m radius of their location. This removed any noise within the ground definition and decreased the number of TIN facets required to be built. A TIN was then constructed based on this set of minimum points. The terrain height at the location of a point (non-ground or control) was then determined as the height of the TIN facet at that point.

DTM Evaluation
The ability to identify the ground from the point cloud result of each image set was determined based on the amount of ground present within a 15 m radius of the plot centre, and the distribution of these points classified as ground. The area observed was calculated as the percentage of DTM grid cells containing at least one point identified as ground. The quality of the point distribution was determined based on the mean minimum distance of each GCP to an identified ground point in four directional quadrants (d quad ). To achieve this, the angle and distance between the point being assessed and every other point was determined. Each point was then assigned to one of four 90 • quadrant (with quadrants defined between 1-90, 91-180, 181-270 and 271-360 • ) based on the calculated angle. The minimum distance in each quadrant was then determined, with d quad being determined as the mean of these four distances. If no point was present in a quadrant, a distance of 10 m was assigned to this quadrant. This approach was taken as interpolation outcomes are likely to be improved if there is a nearby point in each direction, particularly in variably sloped terrain. The accuracy of the DTM was also evaluated based on the RMSE between the height of the DTM and the GCPs at their location.

Changes to Forest Characteristics
The influence of the image capture strategy on structural forest metrics was also evaluated. For each point within the point cloud the above ground height (AGH) was calculated based on a TIN created using the reference grid points and on the identified ground points for each strategy as outlined in Section 2.4.
Several metrics describing the vertical distribution of vegetation elements were computed and compared at the plot level between image sets and ground reference sources. These metrics included plot level mean canopy height and canopy cover and AGH percentiles (25th, 50th, 95th). Canopy height and cover were calculated based on the maximum above ground height within each cell of a 0.05 m grid, where any cell with an AGH of greater than 1.3 m was considered to belong to the canopy. While AGH percentiles (25th, 50th, 95th) were calculated on the distribution of 0.01 m resolution voxels containing one or more points.

Pointcloud Co-Registration
The accuracy of the registration based on GCP checks was consistent across most of the sites and image network configurations (Table 3). ENS01 and FRT01 had comparatively high horizontal RMSE against the check points in all image network configurations. At these sites only a small number of control points were visible due to difficulties locating the boards in a sufficient number of images (ENS01: 6 and FRT01: 8) limiting the information available to determine the source of this increased error. Nevertheless, the accuracy achieved in all sites was still considered sufficient for the study. Table 3. The number of control and check points along with the horizontal (H) and vertical (V) root mean square error (RMSE) of the check points in each photoset. Although up to 15 ground control points were distributed in the area surrounding each plot these were not always present in four or more images of each photoset. As such a common set of control and check points were used at each site for each photo set.

Manual Ground Identification and Terrain Modelling
As expected, the amount of ground manually identified was strongly correlated with the canopy cover present at each site ( Figure 1). In the low canopy cover sites (ENS01 and PNT01), point clouds from each of the image networks were able to recover ground information for the majority of the plot area (greater than 80% of plot area) (Table 4). Conversely,in sites with high canopy cover (EMP02, FRT01 and ENS02), point clouds from each of the image network configurations recovered minimal ground information (points covering less than 5% of plot area). Two of the biggest differences in ground content retrieval were seen in the two plots at Empedrado (EMP01 and EMP02). At EMP01 the percentage of the area for which ground information was retrieved varied from 12 % with only oblique imagery to 26 % when the composite image set was used ( Table 4). The increased information was observed to be a widening of the ground area retrieved between the planting rows. Conversely, ground cover varied by only 2 % between image network configurations at EMP02. However, at this site, the point cloud produced from the composite image network contained ground observed in smaller canopy gaps (Figure 1). This increase in ground point patches and improved distribution at EMP02 was represented by a reduction in d quad ranging from 4.34 m to 3.26 m (Table 4). A similar effect was seen at the FRT01 and ENS02 sites, where d quad was reduced and only a minimal decrease in ground cover percent was found when using the composite image set. This is due to the increased likelihood of the ground being observed from multiple images through the small gaps. At the FRT01 site this improved distribution resulted in a decrease in RMSE of 0.11 m (Table 4). Whilst the RMSE typically decreased through the inclusion of both types of imagery across the other sites, only minimal variations (within the coregistration accuracy of the point clouds) were observed.

Ground Filter Performance and Effects on Terrain Modelling
Both the CSF and LAStools lasground filters were able to successfully identify ground points in point clouds, with ground points in more than 20 % of the plot area observed within the point cloud (Table 5). At the low canopy cover sites (ENS01 and PNT01) the RMSE calculated using the DTMs produced using the filtered ground points did not increase markedly above the RMSE found using the manually identified ground points (see Tables 4 and 5). The only exception to this is the lasground filtered result of the ENS01 nadir point cloud where the RMSE increased from 0.07 m to 0.43 m. This increase was the result of canopy points (up to 10 m above the ground) being identified as ground. A similar issue occurred for all point clouds at the EMP02, FRT01 and ENS02 sites and the oblique pointcloud captured at EMP01. In these cases, the sparseness of the ground information below canopy elements resulted in increases in RMSEs (of up to 14.78 m at FRT01) due to lasground identifying canopy points as ground. It is likely that the RMSE are inflated at these sites through the use of classification accuracy to optimise the parametrisation. Improved parameters would most likely be found using a large step size and foregoing the correct classification of dispersed patches. At the high canopy cover sites the DTM built on the CSF identified control points also resulted in increased RMSEs (up to 0.44 m) ( Table 5).

Effects on Vegetation Metrics
Variations in forest metrics were seen to be caused by both image capture strategy and ground source ( Table 6). Canopy cover and mean height were consistent (cover within 5 % and height within 0.5 m) between the nadir and composite datasets of all sites except for PNT01 and ENS01. The oblique pointcloud produced similarly consistent results, however differences in mean height (of 0.81 m and 0.64 m between the nadir and composite mean heights respectively) were observed at EMP01. The greatest variation in canopy cover was seen at the ENS01 site where the reference cover was estimated to be 10 % lower in the nadir dataset in comparison to the oblique and composite estimates. Examination of the point cloud suggests that canopy elements were not captured completely in the nadir point cloud (Figure 2a). This also resulted in significantly lower percentiles (Figure 3c).
At PNT01 the stems (which were both leafless and branchless) were not well represented in any of the image sets. This resulted in significant inconsistencies in canopy cover and height as well as AGH percentiles (Figure 3f).
Whilst canopy cover and mean canopy height at the high canopy cover sites EMP02, ENS02 and FRT01 were similar (within 5 % for cover and 0.05 m for height), small differences in the lower AGH percentiles were observed Figure 3. These metrics were typically lower in the composite image set suggesting that the vertical profile of the sites is being better observed by this point cloud. This is confirmed upon examination of the point clouds as seen in Figure 2b at the ENS02 site where the stems have improved definition in the composite (blue) pointcloud.
As expected the error in the DTM introduced by the ground filters also caused corresponding variations in the mean canopy height and AGH percentiles (Table 5). These differences were typically greater than seen in the RMSE of ground points. This is due to the ground being poorly defined in the point clouds under canopy and the DTM relying on interpolated results in these areas. Differences in the estimate of canopy cover only occurred in FRT01 with the issues caused through the optimisation of the lasground filter.  show the distribution of Above Ground Height (AGH) calculated using the reference ground points and the dashed lines to the right show the distribution of AGH calculated using the manually defined ground points. The AGH 95th (dots), 50th (triangles) and 25th (dashes) percentiles for the reference distribution (ρ r ) and manual ground distributions are given in the two side panels.

Discussion
The ability to represent the terrain is crucial in extracting structural information from UAS captured image based point clouds [13,14]. Whilst previous studies have developed strategies to overcome the lack of ground points present in this type of data, the availability of ground information in the presence of different canopy conditions has only been quantified on a few occasions (see Wallace et al. [1], Graham et al. [13], Goodbody et al. [14] for examples). In these studies reference ground information is based on an alternate remote sensing dataset. In this case, there is likely to be errors in the reference ground representation. For example, in ALS data these errors can be up to 1 m [17]. In these studies, errors in the co-registration of the two point clouds are also likely propagate into terrain height errors [13,14]. In order to counter these effects, this study utilised a local coordinate system. The coregistration of the image based point clouds in this system was achieved with low RMSE (less than 4 cm). Highly accurate reference terrain information was also measured using a total station within this coordinate system. This allows the best case scenario for assessing the accuracy of the ground representations in these point clouds.
While the amount of ground present in the point cloud was strongly correlated to canopy cover, the accuracy of the final terrain model, in terms of RMSE, was not as consistent. Using the manually identified ground points and nadir only imagery, RMSE in the three high canopy cover plots ranged between 0.07 and 0.30 m. There is likely to be a number of causes for these differences. FRT01, which had the highest RMSE, had a significantly taller canopy vegetation in comparison to the plots at EMP02 and ENS02. This resulted in images that had minimal information describing the ground and where canopy gaps were present the ground was in deep shadow and the dynamic range of the sensor was unable to capture information describing the terrain. Other factors that are likely to affect the accuracy of the final terrain model include understorey vegetation cover and terrain complexity (both highest at ENS02).
Whilst the filtering algorithms trialled in this study achieved high classification accuracies at a number of the sites, the RMSE typically increased significantly over the RMSE if the terrain models generated with manually identified ground points. The only exceptions to this were at the PNT01 and ENS01 sites where the minimal leaf area allowed the flat terrain to be easily identified by the filters. It is acknowledged within the literature that the filters often applied to identify ground with image based point clouds were developed for ALS datasets [13]. The PNT01 and ENS01 datasets contained even distributions of ground points across the plots which is similar to what ALS data can capture. As such, the filters determined a set of points that accurately represented the ground. The sparse set of ground points at the other sites present data eccentricities that these filters were not intended to overcome. A key step to fully exploit the ground information in image based point clouds is the development of a filter that can cope with the eccentricities of image based point clouds.
There are a number of factors that are likely to change the availability of ground within a point cloud. The interaction of other flight variables such as flying height and speed, image overlap camera quality and settings as well as software settings, wind and lighting conditions throughout the canopy profile all play a role in the ability of image-based point clouds to observe terrain features [5,6,14,26]. The standard approach to capturing forested areas using image based point clouds is to use highly overlapping nadir imagery [1,5,13]. In this study we introduced oblique imagery captured at 15 • off-nadir in an attempt to increase amount of information retrieved below the canopy. Whilst oblique imagery alone provided a less complete representation of the plots, using a composite strategy (combining vertical and oblique) optimized the spacing between image captures over two flight was shown to improve the availability of ground points as well as improve the representation of mid-canopy elements.
The most significant improvements from the inclusion of oblique imagery were seen at FRT01 in distribution of the ground area represented (Figure 1) and the canopy representation at ENS02 (Figure 2). However, the improvements shown at the other sites and in the final terrain model produced at FRT01 following ground filtering were only small. Furthermore, the composite image network strategy results in an increased number of images which may present issues due to poor image geometry caused by exposure at nearby locations. The minor improvements seen through the use of a composite strategy suggest that the extra effort required to undertake multiple flights and the increased processing time is in most cases unwarranted.