Comparison of 3D Point Clouds Obtained by Terrestrial Laser Scanning and Personal Laser Scanning on Forest Inventory Sample Plots

In forest inventory, trees are usually measured using handheld instruments; among the most relevant are calipers, inclinometers, ultrasonic devices, and laser range finders. Traditional forest inventory has been redesigned since modern laser scanner technology became available. Laser scanners generate massive data in the form of 3D point clouds. We have developed a novel methodology to provide estimates of the tree positions, stem diameters, and tree heights from these 3D point clouds. This dataset was made publicly accessible to test new software routines for the automatic measurement of forest trees using laser scanner data. Benchmark studies with performance tests of different algorithms are welcome. The dataset contains co-registered raw 3D point-cloud data collected on 20 forest inventory sample plots in Austria. The data were collected by two different laser scanning systems: (1) A mobile personal laser scanner (PLS) (ZEB Horizon, GeoSLAM Ltd., Nottingham, UK) and (2) a static terrestrial laser scanner (TLS) (Focus3D X330, Faro Technologies Inc., Lake Mary, FL, USA). The data also contain digital terrain models (DTMs), field measurements as reference data (ground-truth), and the output of recent software routines for the automatic tree detection and the automatic stem diameter measurement.


Summary
The sustainable management of forest ecosystems requires the regular monitoring of the status and changes in renewable natural resources. Data for these criteria are usually provided by forest inventories. The forest inventory data are not only used for decision making in forest management practice, but often serve as an empirical platform for various research activities. In traditional forest inventories with multiple sample plots, tree attributes are manually measured using mechanical or optical instruments, such as calipers, hypsometers, compasses, and measuring tapes [1][2][3][4]. The most relevant outcome from a forest inventory is the estimate of the growing stock timber volume. For this purpose, average per-area-unit values from the multiple sample plots are up-scaled to the entire survey region. The calculation of the per-area-unit values of the growing stock requires diameter and height measurements of single trees in each sample plot. These diameter and height measurements are used as input variables in stem taper models providing single-tree volume estimates that are system (ZEB Horizon, GeoSLAM Ltd., Nottingham, UK [26]) and (2) a static TLS system (Focus 3D X330, Faro Technologies Inc., Lake Mary, FL, USA [27]). The dataset also includes digital terrain models (DTMs) and reference data. The latter was obtained by field measurements and can serve as ground truth data. The reference data are comprised of tree positions, tree species information, and measurements of diameter at breast height (dbh), crown base height, and tree height. Finally, we also included the results of our software routines for automatic tree detection and automatic stem diameter measurement, which were recently published in Gollob et al. [28].
The entire dataset will enable fair comparisons of different algorithms using both PLS and TLS data collected from the same sample plots. The PLS device (ZEB Horizon, GeoSLAM Ltd., Nottingham, UK [26]) is a new technology that, to the best of our knowledge, has so far only been used by Gollob et al. [28] in a forest inventory context. The publication of the PLS point cloud data will help others to enhance their software routines using these novel data.

Data Description
The provided dataset includes PLS and TLS point clouds of forest inventory sample plots, digital terrain models (DTMs) for PLS/TLS, reference data of single trees standing at the sample plots, and a plot-wise and an overall evaluation of the algorithms presented in Gollob et al. [28] based on the reference data. The different data subsets are described in detail below.

PLS/TLS Point Clouds
Point clouds from 20 PLS scans and 17 TLS scans are provided in LAZ format, which is an optimized compression of the common LAS format. The 17 TLS point clouds have corresponding counterparts among the PLS clouds, meaning that they were collected in the same sample plots. Three TLS scans could not be used for the comparison with PLS, due to harvest-related differences in the stand structure. The individual files are labeled with numeric sample plot IDs ranging from 1 to 20. The LAS format is commonly intended for the exchange and archiving of the LiDAR point cloud data. It is an open, binary format specified by the American Society for Photogrammetry and Remote Sensing (ASPRS). According to standard conventions, a LAS file contains a header block, variable-length records, and the point cloud data. LAS files can be read, visualized, and processed with common software programs for point cloud processing (e.g., CloudCompare [29]), and they can also be handled with the free statistical software R (R Foundation for Statistical Computing, Vienna, Austria) [30], using the lidR [31] and rlas [32] packages in particular. Table 1 describes the number of 3D points and the required scan time for the PLS/TLS point clouds. Table 2 describes the point data header fields.

Digital Terrain Models (DTMs)
The DTMs from the 20 PLS and 17 TLS 3D point clouds are provided in TIF format and have the same local reference system as the LAZ and LAS files of the point clouds. The individual files are named according to the sample plot IDs ranging from 1 to 20. The DTMs were generated using the grid_terrain() function in the R package lidR [31] with a 20 × 20 cm pixel resolution. The pixel values of the TIF files represent the estimated ground height in m units. Table 3 summarizes the DTMs in the 20 sample plots.

Reference Data and Results of an Automatic Tree Detection and dbh Estimation Algorithm
The reference dataset is provided in a comma-separated values (CSV) file (reference_data.csv) and contains the manual measurements of the single-tree attributes. This CSV file also contains binary (true/false) variables that indicate the automatic detection in the TLS and PLS point clouds, providing the dbh estimates from various approaches; both were achieved with the methodology presented in Gollob et al. [28]. Missing values are represented by "NA" entries. Each row represents a single tree. Table 4 summarizes the reference measurements in the 20 sample plots using a lower dbh threshold of 5 cm. Table 5 describes the CSV header fields.

Evaluation of Automatic Tree Detection and dbh Estimation
The detailed results of the automatic tree detection and diameter estimation presented in Gollob et al. [28] are provided as three comma-separated value (CSV) files. The plot-wise evaluations with the PLS and the TLS data are contained in two separate files, PLS_plotwise.csv and TLS_plotwise.csv, respectively. An extra file, PLS_TLS_overall.csv, contains overall performance measures that were evaluated across the entire set of sample plots. The performance of the algorithms was assessed in comparison with the reference data (see Section 3.3.) and in terms of the following characteristics: Detection rate, commission rate, overall accuracy, root-mean-square error (RMSE), and bias. Table 6 describes the CSV header fields.

Study Area, Sample Plots, and Reference Data
The data were collected in the training areal of the University of Natural Resources and Life Sciences Vienna, Austria (BOKU) located in the forest district Ofenbach near Forchtenstein village. Throughout the entire forest area, the BOKU Institute of Forest Growth maintains a permanently repeated measures forest inventory. The measurements of the total of 554 sample plots started in 1989, and each year, one-fifth (approx. 111 plots) of the total sample size is regularly remeasured. The sample plots were systematically aligned on a regular grid with a mesh width of 141.4 × 141.4 m. As a standard inventory method, Bitterlich relascope sampling [36][37][38] was conducted at each sample plot using a constant basal area factor of 4 m 2 /ha coupled with a lower dbh threshold of 5 cm.
The provided dataset contains a subsample of 20 plots representing all possible variations with respect to forest type (broadleaved, coniferous, and mixed), forest structure (from even-aged and mono-layered to uneven-aged and multi-layered stands), and the inclination of the terrain (flat to steep). As the Bitterlich method yields incomplete tree patterns resulting from the size-related and distance-dependent sampling strategy, extra measurements were recorded in March 2019 to provide complete reference data for 20 m radius fixed-area sample plots. At each of these fixed-area plots, the position and dbh were measured of all trees having a dbh of 5 cm or greater. In addition, relevant extra trees were mapped that had a dbh of slightly less than 5 cm to enable the precise discrimination of false-positive interpretations with the automatic stem-finding software routines.
Tree height and crown-base height were measured on a subsample of 422 randomly selected trees using a Vertex IV (© Haglöf Sweden AB, Långsele, Sweden). This sample represents the complete set of existing tree species and the entire range of occurring dbhs.

Instrumentation and Data Collection
The 20 sample plots were scanned in March 2019 using a GeoSLAM ZEB HORIZON [26] (see Figure 1a) personal mobile laser scanner (PLS) (GeoSLAM Ltd., Nottingham, UK); for further technical information on the scanner, the reader is referred to Gollob et al. [28].
The data acquisition with ZEB HORIZON started with the initialization of the inertial measurement unit (IMU) to establish the coordinate reference system. For this purpose, the scanner was mounted on a tripod, which was positioned at the sample point. When the initialization was accomplished, usually after approximately 15 s, the scanner was demounted from the tripod, and was carried single-handedly along the walking path. During the scan process, the 3D measurement data were stored on the hard drive of the portable data logger in a highly compressed data format. After some initial trials, an optimum solution was determined for the walking path: The SLAM algorithm achieved both a relatively low scanner range noise and a low drift. By using this optimum walking path, the surveyor started at the sample point and moved northwards to the sample plot boundary. Afterward, the sample plot was completely circumvented once plus an extra quarter of the circumference. Then, the surveyor crossed the sample plot through the center to reach the opposite boundary. Subsequently, the surveyor walked along an additional quarter on the boundary, and moved afterward toward the center, where the scanner was finally fixed on the tripod again. A record of the walking path coordinates is shown in Figure 1b. was mounted on a tripod, which was positioned at the sample point. When the initialization was accomplished, usually after approximately 15 s, the scanner was demounted from the tripod, and was carried single-handedly along the walking path. During the scan process, the 3D measurement data were stored on the hard drive of the portable data logger in a highly compressed data format. After some initial trials, an optimum solution was determined for the walking path: The SLAM algorithm achieved both a relatively low scanner range noise and a low drift. By using this optimum walking path, the surveyor started at the sample point and moved northwards to the sample plot boundary. Afterward, the sample plot was completely circumvented once plus an extra quarter of the circumference. Then, the surveyor crossed the sample plot through the center to reach the opposite boundary. Subsequently, the surveyor walked along an additional quarter on the boundary, and moved afterward toward the center, where the scanner was finally fixed on the tripod again. A record of the walking path coordinates is shown in Figure 1b. The entire walk, including the scan process and the data recording, consumed approximately 7-15 min. As a major advantage compared with the static terrestrial laser variant, the survey with the portable laser scanner does not require any target marks. These target marks are mandatory for a precise co-registration of multiple scans with a static TLS system; here, the spatial referencing of PLS data is performed on-the-fly using the SLAM algorithm. In February and March 2018, 17 out of the 20 sample plots surveyed with the PLS system were also scanned with the static FARO Focus 3D X330 TLS system (Figure 2a) using MSA; for further information on the scanner and the hardware parameter settings, the reader is referred to Gollob et al. [6]. In our MSA setting, a single scan was conducted from the sample point position, and for three The entire walk, including the scan process and the data recording, consumed approximately 7-15 min. As a major advantage compared with the static terrestrial laser variant, the survey with the portable laser scanner does not require any target marks. These target marks are mandatory for a precise co-registration of multiple scans with a static TLS system; here, the spatial referencing of PLS data is performed on-the-fly using the SLAM algorithm.
In February and March 2018, 17 out of the 20 sample plots surveyed with the PLS system were also scanned with the static FARO Focus 3D X330 TLS system (Figure 2a) using MSA; for further information on the scanner and the hardware parameter settings, the reader is referred to Gollob et al. [6]. In our MSA setting, a single scan was conducted from the sample point position, and for three additional scans, the scanner was positioned at the corners of a triangle at a distance of 15 m from the center (Figure 2b). We found that this alignment provided the best compromise between labor cost and data quality. As the co-registration of the multiple scans required the extra positioning of target marks, nine Styrofoam spheres were placed on the sample plot. With static TLS and MSA, the total workload per sample plot was 49.6 min, out of which 32.6 min were consumed by the pure scan time plus photography, 12 min by the scanner installation, and 5 min by the installation of the target marks.
When data collection was accomplished with the portable laser scanner, the measurement data were transferred from the data logger onto a desktop computer using a USB flash drive. GeoSLAM Hub 5.2.0 software [39] was used for data pre-processing. The implemented SLAM algorithm performs the geo-tracking of the scanner in an unknown environment and the generation of the 3D point cloud raw data using both IMU data and feature detection algorithms. In so doing, the scanner position is registered using a moving time window across the raw data [21]. Since new data are added, the algorithm uses a linearized model to minimize the error of the IMU measurements together with the minimum search for the discrepancy between the 3D data points for each respective time segment [21]. For further details on the SLAM algorithm, the reader is referred to the manufacturer's website: https://geoslam.com/slam/. According to the manufacturer, the accuracy of the measured points of the registered point cloud is 1-3 cm under normal sunlight conditions. The coordinates of the registered point cloud are represented by a local reference system with the start position of the walking path (sample point) being fixed at triple zero for the x, y, and z coordinates. The PLS data for each of the 20 sample plots were exported in LAS format, which is compatible with various point cloud software. For data export, the parameter settings "100% of points", "time stamp: scan", and "point color: none" were selected in GeoSLAM Hub 5.2.0. When data collection was accomplished with the portable laser scanner, the measurement data were transferred from the data logger onto a desktop computer using a USB flash drive. GeoSLAM Hub 5.2.0 software [39] was used for data pre-processing. The implemented SLAM algorithm performs the geo-tracking of the scanner in an unknown environment and the generation of the 3D point cloud raw data using both IMU data and feature detection algorithms. In so doing, the scanner The raw TLS scan data were co-registered with the FARO SCENE 6.2 program [40] and by using the coordinates of the Styrofoam spheres as reference points to merge the multiple scans into a single comprehensive point cloud. A constant cutoff distance of 30 m was chosen for each of the four scanner positions. The registered point cloud was represented by a local coordinate system using the first scan position (reference scan) at the plot center as zero references. The point cloud data were separately exported as CSV files for each of the 17 sample plots. The CSV TLS point clouds were then imported, transformed, and exported as LAS files using the functions fread() and write.las() in the R-packages data.table [41] and rlas [32].
The PLS and TLS point clouds were imported and clipped by an upright-oriented cylinder with a radius of 21 m and centered at the sample plot using the readLAS() function of the lidR package [31]. Hence, a 1-m-wide outer buffer was constructed around 20 radius plots. The lasground() function in the lidR package was then used to classify the data into ground points and vegetation points. The methodology of the classification implemented in lasground() uses a cloth simulation filter algorithm introduced by Zhang et al. [42]. Finally, the classified ground points were interpolated with the k-nearest neighbor method using grid_terrain() function implemented in the lidR package.

Accuracy of Tree Detection and dbh Measurement
In Gollob et al. [28], the quality of tree detection and dbh measurement was assessed in terms of three criteria: The detection rate d r (%), the commission error c(%), and the overall accuracy acc(%): where n match is the number of correctly found reference trees, n ref is the total number of reference trees, n falsepos is the number of tree positions that could not be assigned to an existing tree in the reference data, n extr is the number of automatically detected tree positions (n match + n falsepos ), and o(%) is the omission error, defined as 100% − d r (%). The detection rate d r (%) measures the proportion of the correctly detected tree locations, the commission error c(%) evaluates the proportion of falsely detected tree locations, and the overall accuracy acc(%) is a combination of the latter two metrics and represents a global quality criterion. The precision of the automatic dbh measurements was assessed by means of root-mean-square error (RMSE) calculated as the square root of the average quadratic distance between the automatic measurementd bh i and the corresponding reference measurement dbh i : The accuracy of the automatic dbh measurements was assessed in terms of bias: The latter two criteria were also calculated as relative measures, that is, as percentage RMSE (RMSE%) and percentage bias (bias%): with dbh = 1 n match n match i=1 dbh i (8) being the average dbh of the reference data.
To guarantee meaningful comparisons with the results presented in Gollob et al. [28], future users of this dataset are likewise encouraged to evaluate their results using these performance measures. In this dataset, we provide the above-described performance measures on two different aggregation levels. Thus, besides the overall performance measures evaluated across the complete sample, these measures are also separately reported per sample plot (Section 2.4.).

User Notes
The chosen data repository supports versioning of datasets; this data descriptor refers to version 1.0. The dataset will be updated in the future when new 3D data and reference measurements are collected on this or other sample plots. Users can find updated and original versions of the dataset under the same doi:10.5281/zenodo.3698956 [25].

Discussion
Providing datasets is regarded as essential for benchmarking and comparative performance tests of the different algorithms, especially in the field of forest inventory, where novel measurement technology for LiDAR-based sensors is currently being introduced. Only if the same data basis and the same criteria are used, it is possible to reveal the true potential of PLS compared with TLS in particular. Beyond the approaches for tree detection and dbh measurement that were presented in Gollob et al. [28], the dataset also offers the possibility of automatically estimating tree heights and crown bases from PLS and TLS data, and it also enables the comparison of the achieved results with the provided reference data. Besides creating new DTMs, other developers can alternatively access the DTMs provided with this dataset. This will probably help to avoid possible confounding effects that may occur when different routines are used for vegetation/ground classification or for the spatial interpolations of DTM grids. However, a direct comparison of the DTMs from both sensors (static TLS and portable PLS) is not possible because the DTMs derived from both sensors differ in an offset, which varies per plot. It should be noted that the created DTMs were only valid locally. If DTMs were required for larger areas, it would be beneficial to use ALS data. In addition to the abovementioned possible use cases of the dataset, it is also worth noting that, to the best of our knowledge, it is the first publicly available dataset of the GeoSLAM ZEB HORIZON in a forest inventory context. This enables the reader to assess the data quality provided by the novel device.
Regarding the reference dataset, it is worth noting that these field measurements were recorded with pencils on paper and thereafter manually transcribed into an electronic database. Thus, the existence of possible entry errors cannot be excluded. Other possible errors in the manual measurements may result from imprecise height, diameter, and position measuring, or tilted trees. For the comparison of algorithms, however, this is only a minor problem, since all algorithms are evaluated against the same reference data. Regarding the point cloud quality, it is worth noting that in the presented dataset, instrumental drift and registration inaccuracies were neither a problem for PLS nor TLS.
Initial tests on large experimental stands showed that for PLS, longer recording times (greater than 30 min) yielded instrumental drift and registration problems, while for TLS, an increasing number of scans complicates co-registration and adds additional noise. However, a list of tree positions and dimensions could be created more completely and efficiently with PLS. With TLS, the tree dimensions (especially the diameters) were more precise.
Funding: The open access publishing was supported by BOKU Vienna Open Access Publishing Fund.