Predicting Forest Inventory Attributes Using Airborne Laser Scanning, Aerial Imagery, and Harvester Data

The aim of the study was to develop a new method to use tree stem information recorded by harvesters along operative logging in remote sensing-based prediction of forest inventory attributes in mature stands. The reference sample plots were formed from harvester data, using two different tree positions: harvester positions (XYH) in global satellite navigation system and computationally improved harvester head positions (XYHH). Study materials consisted of 158 mature Norway-spruce-dominated stands located in Southern Finland that were clear-cut during 2015–16. Tree attributes were derived from the stem dimensions recorded by the harvester. The forest inventory attributes were compiled for both stands and sample plots generated for stands for four different sample plot sizes (254, 509, 761, and 1018 m2). Prediction models between the harvester-based forest inventory attributes and remote sensing features of sample plots were developed. The stand-level predictions were obtained, and basal-area weighted mean diameter (Dg) and basal-area weighted mean height (Hg) were nearly constant for all model alternatives with relative root-mean-square errors (RMSE) roughly 10–11% and 6–8%, respectively, and minor biases. For basal area (G) and volume (V), using either of the position methods, resulted in roughly similar predictions at best, with approximately 25% relative RMSE and 15% bias. With XYHH positions, the predictions of G and V were nearly independent of the sample plot size within 254–761 m2. Therefore, the harvester-based data can be used as ground truth for remote sensing forest inventory methods. In predicting the forest inventory attributes, it is advisable to utilize harvester head positions (XYHH) and a smallest plot size of 254 m2. Instead, if only harvester positions (XYH) are available, expanding the sample plot size to 761 m2 reaches a similar accuracy to that obtained using XYHH positions, as the larger sample plot moderates the uncertainties when determining the individual tree position.


Introduction
Forest-and wood-procurement planning requires detailed information on forest stand structure [1,2]. Remote sensing methods are increasingly used for obtaining forest structural information covering large areas that are often also remote [3][4][5]. In Scandinavia, airborne laser scanning (ALS) and aerial imagery are used in conjunction with field-measured sample plots when forest inventory attributes such as basal area (G), stem volume (V), and basal-area weighted mean diameter (D g ) and basal-area weighted mean height (H g ) are predicted over inventory area for forestand wood-procurement planning purposes [3,6,7]. In this approach, also known as the area-based approach (ABA), prediction models are developed between features extracted from the remotely sensed data and field-measured forest inventory attributes from the sample plots [8,9]. Then, prediction models are applied at the grid level over the area of interest. The size of the prediction grid corresponds to the size of the field sample plot, typically varying from 100 to 900 m 2 . Finally, stand-level forest inventory attributes are obtained by aggregating grid-level predictions within each stand [9].
The amount and quality of the field plots is essential for ABA [10][11][12][13]. In Finland, the typical inventory area varies from 100,000 to 200,000 ha, and approximately 700 field sample plots are measured from the inventory area for enabling the development of the prediction models for forest inventory attributes (e.g., References [13][14][15]). In addition to the field-measured sample plots, required modeling data could be supplemented with data measured by sensors already integrated into the harvesters and recorded by the harvester's information system [16][17][18]. The harvester data is recorded according to the Standard for Forest Data and Communication (StanForD), which is used by all major manufacturers of cut-to-length (CTL) harvesters across the world [19]. Modern CTL harvesters are equipped with many useful features for measuring cut trees, such as the ability to automatically measure stem diameters and stem lengths, and to record the position of the harvester and possibly also the position of the harvester head each time a tree is cut [7,17,18,20]. Harvesters' measurement sensors record stem diameter at 10 cm intervals, and length for that part which goes through the harvester head. Additionally, tree species, timber assortments, and the quality of the logs are determined by the driver and recorded into the stem files (STM) and harvester-production (HPR) files. In a remote sensing-based prediction of forest inventory attributes, it is crucial to be able to link harvester data with the remote sensing data (see, e.g., Reference [10]). The location accuracy of the single-tree data recorded by harvester has limited the use of the recorded information [17,18,20,21]. The position of the harvester is determined based on the global satellite navigation system (GNSS) integrated into the harvester. It has been shown that the recorded location accuracy under the forest canopy varies from 4.2 to 9.3 m using GNSS receivers typically integrated into harvesters (e.g., Reference [22]). The position of the cut tree is determined either by using the harvester's GNSS position or by combining the harvester's GNSS position and specific position parameters for the harvester head [19]. In the latter system, the position of the harvester head during the time of the cut can be derived from the harvester bearing and boom angles (e.g., Reference [17,18]). In a clear-cut stand, species-specific forest inventory attributes can be derived from the harvester data for any given area within the stand. However, it should be noted that the harvester data may not be directly applicable for use in compiling sample plot information, and it requires further processing. For example, tree diameter-at-breast height (dbh) and height must be predicted using the recorded stem dimensions and lengths (e.g., Reference [7]). Currently, the recorded raw positions for the harvester as well as for the harvester head need to be further processed, taking into account the movements and bearing of the harvester [23].
ABA requires unbiased and precise mean and summary information on growing stock at the plot level to link sample plot data with remote sensing data. Therefore, applicable harvester data for ABA is only collected from clear-cut stands where it can be assumed that all or most of the trees are cut, and attributes for those stems are recorded to production files (STM or HPR files). However, if the accuracy of the predicted forest inventory attributes in mature stands could be improved with the harvester data currently available, it could be beneficial for wood-procurement planning. Therefore, the aim is to improve the understanding of how harvester data could be used in the remote sensing-based prediction of forest inventory attributes in mature stands.
In this study, harvester data were used to compile field sample plots when predicting the stand-level forest inventory attributes by an area-based approach. The effect of the positioning accuracy for individual cut trees was studied by using two alternative positions: the harvester's GNSS position (XY H ) during the time of cut and the computationally improved harvester head position (XY HH ) based on measurements of the harvester head position. At the same time, the effect of the sample plot size on the accuracy of the predicted forest inventory attributes was investigated. The results are expected to provide new knowledge for organizations that have access to harvester data and could use it for supporting wood-procurement planning.

Study Area
The study area is located in the Uusimaa region of Southern Finland (Figure 1). The landscape of the region is undulating, with elevation varying from 0 to 174 m above sea level and has a complex mosaic of agricultural, forest, and urban land use. The study area is one of the operational inventory areas from which the Finnish Forestry Centre (FFC) produces forest resource information to be used in forest management planning in privately-owned forests. The study area covers a total of 477,000 ha, of which 257,000 ha is forest land. The FFC had collected 614 field sample plots from the study area that had been allocated to the forests with varying structure based on their existing stand register information (excluding seedling stands). The field sample plots from the area of interest contain 21% mature (regeneration-ready) plots where the dominant tree species are Norway spruce (45% of the total volume) and Scots pine (37% of the total volume). In mature stand field samples, the basal area (G) is on average 29.6 m 2 /ha, varying from 6.4 to 72.2 m 2 /ha. Respectively, the stem volume (V) is on average 322 m 3 /ha, varying from 57.9 to 924 m 3 /ha, as the basal-area weighted mean height (H g ) is on average 23.7 m, varying from 15.3 to 32.7 m. Compared to other regions in Finland, the forests in the Uusimaa region have slightly more structural variation (see, e.g., References [6,24,25] In this study, harvester data were used to compile field sample plots when predicting the standlevel forest inventory attributes by an area-based approach. The effect of the positioning accuracy for individual cut trees was studied by using two alternative positions: the harvester's GNSS position (XYH) during the time of cut and the computationally improved harvester head position (XYHH) based on measurements of the harvester head position. At the same time, the effect of the sample plot size on the accuracy of the predicted forest inventory attributes was investigated. The results are expected to provide new knowledge for organizations that have access to harvester data and could use it for supporting wood-procurement planning.

Study Area
The study area is located in the Uusimaa region of Southern Finland ( Figure 1). The landscape of the region is undulating, with elevation varying from 0 to 174 m above sea level and has a complex mosaic of agricultural, forest, and urban land use. The study area is one of the operational inventory areas from which the Finnish Forestry Centre (FFC) produces forest resource information to be used in forest management planning in privately-owned forests. The study area covers a total of 477,000 ha, of which 257,000 ha is forest land. The FFC had collected 614 field sample plots from the study area that had been allocated to the forests with varying structure based on their existing stand register information (excluding seedling stands). The field sample plots from the area of interest contain 21% mature (regeneration-ready) plots where the dominant tree species are Norway spruce (45% of the total volume) and Scots pine (37% of the total volume). In mature stand field samples, the basal area (G) is on average 29.6 m 2 /ha, varying from 6.4 to 72.2 m 2 /ha. Respectively, the stem volume (V) is on average 322 m 3 /ha, varying from 57.9 to 924 m 3 /ha, as the basal-area weighted mean height (Hg) is on average 23.7 m, varying from 15.3 to 32.7 m. Compared to other regions in Finland, the forests in the Uusimaa region have slightly more structural variation (see, e.g., References [6,24,25]).

Remotely Sensed Data
ALS data were acquired between June and August 2015 using a Leica ALS60 SN6114 system (Leica Geosystems AG, Heerbrugg, Switzerland). The flying altitude was 2050 m with a speed of 311 kn, a scan angle of 20 • , a beam divergence of 0.22 mrad (1/e), and a pulse rate of 114.6 kHz. The density of the first pulse echoes returned was 1.77 hits per m 2 . A digital terrain model (DTM), and, accordingly, heights above ground level, were computed during the process for the operational forest inventory project. The expected accuracy of ALS-derived DTMs varies in boreal forest conditions by around 10-50 cm [26]. The aerial images were acquired between June and August 2015. The imaging sensors used were Vexcel (Denver, Colorado, United States) Ultra Cam UCXp -and S/N UC-SXp. The area was covered by 194 images in total. The flying height was 5 km and ground sample distance (GSD) was approximately 0.3 m. The images were delivered as 16-bit visible light (RGB) and Color Infrared (CIR) composites. Additionally, 8-bit orthorectified images were provided by the data vendor (Blom Kartta Oy, Helsinki, Finland).

Tree and Location Measurements
Harvester data were collected from six harvesters operating in the study area between August 2015 and September 2016 ( Figure 1). Four of the harvesters were made by Ponsse (Vieremä, Finland), one by Komatsu Forest (Umeå, Sweden), and one by John Deere (Moline, Illinois, United States). The used harvester data were recorded under StanForD standard [19] and included STM and HPR files. In total, the harvester data consisted from 635,350 stems. All the six harvesters collected data of the stems during the stem processing. For each harvested stem, the harvesters recorded the stem number, tree species, and diameters of the stem at 10 cm intervals along the stem, starting from the felling cut onwards for the rest of the stem going through the harvester head. For each log, the harvesters recorded the length, diameters (including top diameter of the log), volume, and timber assortment information. The length of the commercial part of the stem and the top diameter of the last cut were also recorded. The GNSS location of the harvester was recorded for each stem at the time of the felling cut. One of the harvesters (the Komatsu Forest 931.1) also collected information on the position of the harvester head during the time of the felling cut. In the Komatsu Forest 931.1, the harvester's position was determined using a Garmin 16x-HVS GPS GNSS receiver (Garmin Ltd., Olathe, Kansas, United States) and recorded for each stem. Additionally, this harvester recorded specific position parameters for the harvester head, including harvester direction (harvester bearing), boom angle relative to the harvester bearing, and the length of the boom, corresponding to the distance of the harvester head. The boom of the Komatsu Forest 931.1 harvester is extensible, reaching a maximum length of 10.1 m, however the amount of extension of the last part of the boom cannot be recorded. The gross length of the furthest part of the boom is 2.5 m. From the data from the Komatsu Forest 931.1 harvester, it was possible to derive two alternative positions for each of the cut trees: (1) GNSS position of the harvester (XY H ); and (2) position based on XY H and harvester head parameters (XY HH ). For obtaining improved XY HH positions for each tree, recorded GNSS positions and harvester head parameters were post-processed using an in-house algorithm developed in Metsäteho (Metsäteho Oy, Vantaa, Finland) [23]. In the processing, GNSS observations were averaged to obtain more realistic strip road, and harvester bearings were improved. Then, final XY HH positions were obtained by using averaged GNSS position, harvester bearing, direction of the boom, and distance of the harvester head relative to the harvester. It should be emphasized that the XY HH positions for each tree were first processed using a specific algorithm [23] from the parameters recorded by the information system, and as such the recorded raw values for parameter were not used. This was required since GNSS positioning data contain so much variation that deriving single tree positions with adequate accuracy is not directly feasible. It was also noticed that in the current dataset the harvester bearing needed slight modification.

Deriving Tree and Stand Attributes from Harvester Data
All the harvester data recorded by the six harvesters were processed for stand boundaries, stem-wise volumes, and stem dimensions using in-house processing tools and algorithms developed by Metsäteho Oy (see description of semi-automatic stand delineation in Reference [27]). First, harvester data were extracted into the database and transferred to the geographic information system. For each stand, a unique identifier based on contract number, stand identification number (ID), start time of logging, and sub-stand ID was generated. The logging type information was obtained from the data-providing companies, and that information was linked to the data. Then, the data were checked manually and, if the trees were located on the same stand or sub-stand but the ID was different, their current identity was corrected based on the location.
To obtain tree-wise attributes, the butt-end diameters of the stems were unified (diameters from felling cut to the 1.3 m) based on the existing butt-end functions for all stems [28]. During this stage, multi-branched stems were, when possible, recognized, re-built, and labeled. Then, volume was calculated for each entire stem by fitting the stem curve [29] into the stem profile measured by the sensors integrated into the harvester head. Stem curve fitting [29] was done for each stem separately, and the stump height was estimated separately for each stem during the curve fitting. Based on the fitted stem curve, the required stem parameters, such as dbh, stem volume, and height, were derived.
For determining the stand attributes, the stand delineations were generated based on the GNSS location of the harvester using a semi-automatic delineation algorithm [27]. Simultaneously to stand delineation, the strip roads leading to the stands were recognized and classified into their own category. After the pre-processing of the stem data, a Delaunay triangulation was generated over the harvester positions. The triangles that were inside the stand were chosen to form the stand boundaries as the other triangles were deleted. The triangles inside the stand boundaries were dissolved based on unique identifier resulting in a preliminary stand delineation. Subsequently, the preliminary stand boundaries and strip roads were buffered. The used buffer was 6.5 m for stands and 2 m for strip roads. After buffering, the individual trees or tree groups (small micro-stands) were combined with the main stands. Then, the area of each stand was determined, and stand-level forest inventory attributes were calculated based on the processed individual tree stem data. The harvester data included 158 clear-cut stands, including 207,073 stems, after removing strip roads, stands that were not clear-cut (e.g., thinnings and seed tree removals), and stands that were smaller than 0.5 ha. The stand-level forest inventory attributes included basal area per hectare (G), volume per hectare (V), basal-area weighted mean diameter (D g ), and basal-area weighted mean height (H g ).

Selection of Modeling and Validation Data
Harvester data from eight stands recorded by the Komatsu Forest 931.1 harvester were used as modeling data. From this harvester data, it was possible to use two alternative positions for each of the cut trees (XY H and XY HH ) when compiling sample plot information. Modeling data included 8820 trees in total. All the other 150 stands without the position of the harvester head were available to be used for stand-level validation. Unfortunately, the operator of the Komatsu Forest 931.1 harvester changed contractor during data recording, which interrupted the data recording earlier than expected. The area of the modeling stands varied from 0.5 to 3.1 ha, with a mean of 1.3 ha, while the area of the validation stands varied from 0.5 to 12.2 ha, with a mean of 2.3 ha. The variation in forest inventory attributes in modeling and validation stands based on harvester data is presented in Table 1. Table 1. Variation in basal-area weighted mean diameter (D g ), basal-area weighted mean height (H g ), basal area (G), and stem volume (V) in modeling and validation stands based on harvester data.

Generation of Modeling Plots
Harvester data from clear-cut stands included single-tree data covering the whole stand. Therefore, sample plots can be generated anywhere within the stand and forest attributes were compiled from the single-tree data within sample plot boundaries ( Figure 2). In this study, the uses of the four different circular sample plot sizes were tested. The used radiuses (areas) were 9.0 m (254 m 2 ), 12.73 m (509 m 2 ), 15.56 m (761 m 2 ), and 18 m (1018 m 2 ). Before sampling the plot center locations, the stand boundaries were buffered inwards to ensure that sample plots were located completely inside the stand. A total of 100 sample plot locations were generated as a random sample with replacement for each modeling stand and for each sample plot size. Therefore, 800 sample plots were used for model development for each sample plot size (eight modeling stands, 100 plots per stand). Two sets of forest inventory attributes were calculated for each sample plot. The first set was compiled using harvester measurements of trees with position XY H , and the second was compiled with positions XY HH . Figure 2 illustrates the impact of the tree positioning method on the allocation of trees within plots.

Generation of Modeling Plots
Harvester data from clear-cut stands included single-tree data covering the whole stand. Therefore, sample plots can be generated anywhere within the stand and forest attributes were compiled from the single-tree data within sample plot boundaries ( Figure 2). In this study, the uses of the four different circular sample plot sizes were tested. The used radiuses (areas) were 9.0 m (254 m 2 ), 12.73 m (509 m 2 ), 15.56 m (761 m 2 ), and 18 m (1018 m 2 ). Before sampling the plot center locations, the stand boundaries were buffered inwards to ensure that sample plots were located completely inside the stand. A total of 100 sample plot locations were generated as a random sample with replacement for each modeling stand and for each sample plot size. Therefore, 800 sample plots were used for model development for each sample plot size (eight modeling stands, 100 plots per stand). Two sets of forest inventory attributes were calculated for each sample plot. The first set was compiled using harvester measurements of trees with position XYH, and the second was compiled with positions XYHH. Figure 2 illustrates the impact of the tree positioning method on the allocation of trees within plots.

Extraction of Predictors from Remote Sensing Data
A prediction grid with a size of 16 × 16 m was placed over the validation stands (n = 150). The grid size was chosen to be the same as in operative ALS-based inventories in Finland and has the same area as the standard field plot (9-m radius circular plot) used in these inventories. Therefore, the results and the findings obtained here can be applied directly in operative inventories. The grid cells were clipped with the stand boundaries. Close to stand boundaries, there were grid cells that were not complete. If the area of a cell was below 128 m 2 (half of the area of grid cell) it was merged with the grid cell next to it. Then, the features from ALS data and aerial imagery were extracted for each modeling plot and for each grid cell to be used in the prediction models. The features were extracted and calculated using the ArboLiDAR software (Arbonaut Oy, Joensuu, Finland). The features included various point height metrics and laser return intensity features that were calculated from ALS data, as well as textural and spectral features that were derived from the aerial imagery based on previous research [8,[30][31][32][33][34][35].

Building Prediction Models for Forest Inventory Attributes
A non-parametric nearest-neighbor estimation approach was used for imputing forest inventory attributes from the sample plots generated within the modeling stands to each grid cell located within the validation stands. In total, eight separate prediction models were developed. First, four prediction models were developed with sample plot data that were generated using the XY H positions for each tree; the used sample plot radius varied between the models, being 9.0, 12.73, 15.56, or 18 m. Then, the last four prediction models were developed with sample plot data that was generated using the XY HH locations for each tree with respective four radiuses.
The total number of features from which the predictors were chosen was 271. In each modeling, the number of the used predictors was first reduced. Features extracted from the ALS data mainly describe tree height, variation in tree heights, as well as vegetation density at the sample plot. The reasoning for feature selection was based on the assumption that a sample plot with similar height, density, and height variation characteristics should correspondingly have similar values for forest inventory attributes, and therefore should be an adequate neighbor to be used in the imputations (see, e.g., References [36,37]). In addition to the features describing forest structure, spectral features were included in the imputation model for separating tree species. However, it should be noted that most of the stands (n = 131) were dominated by Norway spruce.
Linear correlation analyses were used in feature selection. First, ALS features were classified as features describing tree height, height variation, or canopy cover. The feature describing H g with the highest correlation with the plot-level H g was selected for the imputation model. Then, the feature describing G with the highest correlation with the plot-level G was selected. Before selection, the correlation between the two predictor features was investigated, and if it was above 0.9, the latter feature was replaced with the predictor having the second highest correlation with the G of plots. Then, the feature describing height variation with the highest correlation with the standard deviation of the harvester-data-derived single-tree heights at plot level were searched and added to the model after evaluating the correlation with the other two predictors. Finally, three spectral features that were able to separate spruce from pine, pine from deciduous trees, and spruce from deciduous trees were added to the model. The final predictors used in the imputations were selected for each sample plot size and positioning method separately. Nearest-neighbor plots for each grid cell in validation stands were defined by applying the most similar neighbor (k-MSN) method using the R package "yaImpute" [38], which computes the nearest-neighbor distances in projected canonical space. The number of neighbors was set to 5.

Aggregation of Forest Inventory Attributes for Validation Stands and Accuracy Evaluations
The prediction performance was tested at the stand level using validation-stand data. For each validation stand (n = 150), the forest inventory attributes were aggregated from the grid-level predictions. The attributes G and V were calculated as a grid-area weighted mean as D g and H g were calculated as a grid-area and basal-area weighted mean from the grid predictions. Then, predicted forest inventory attributes were compared to the reference values that were obtained from the processed harvester data for each validation stand. Bias and root-mean-square error (RMSE) were calculated using the following equations: where n is the number of validation stands,X i is the predicted value for stand i based on remotely sensed data, and X i is the reference value for stand i. The relative bias and RMSE were calculated as a relation to the mean of reference value X:

Features Selected for the Prediction Models
The results show that the best predicted attributes were rather independent of sample plot size and positioning method. Remote sensing features that were most correlated with H g included 90th height percentile calculated from the first returns, 70th height percentile calculated from the first returns, or 85th percentile of cumulative return height. Features that were most correlated with G were relation between low vegetation (first returns below 5 or 7 m) and all of the vegetation first returns. The standard deviation of first return heights above 5 m was always the feature that was most correlated with the variation in tree heights within the sample plots. In all of the sample plot combinations, the feature that was used for separating spruces from deciduous trees was the mean value of the near-infrared (NIR) channel calculated from the aerial images of the sample plot area. The features that were used for separating pines and spruces, or pines and deciduous trees, were always based on normalized difference vegetation index values that were assigned to ALS returns (25th, 50th, 75th, or maximum percentile).

Accuracy of the Stand-Level Predictions
The stand level results contain the root-mean-square errors (RMSEs) and biases of the attributes D g , H g , G, and V of the validation stands, indicating the accuracy of the predictions with respect to the harvester-based stand attributes. The results are presented in Table 2 and in Figures 3 and 4. The predictions for D g and H g were not sensitive to the used sample plot size or to the method that was used for deriving tree positions. The RMSEs for D g did not vary notably, and RMSEs for H g slightly increased as sample plot size increased. Biases for D g and H g varied in a similar manner to RMSEs. Overall, the accuracy of D g and H g was good, i.e., the RMSEs and biases were small. The RMSEs for D g varied from 2.9 to 3.2 cm, and H g varied from 1.3 to 1.9 m. The biases for D g and H g varied from -0.6 cm to 0.9 cm and from −1.0 m to −0.1 m, respectively.
In general, both position alternatives, XY H and XY HH , resulted in similar accuracies for G and V at best. For G and V prediction models compiled using XY H positions for single trees, the size of the sample plot improved the accuracy when sample plot size was increased from 254 to 761 m 2 . With XY HH positions, the sample plot size did not notably affect the predictions of G and V, since the three smallest plot sizes of 254-761 m 2 produced almost similar results for both relative RMSE and bias. In detail, for more accurate positioning (XY HH ), the relative RMSE of G varied from 23.2 to 23.9% for sample plot sizes of 254-761 m 2 . Respectively, the bias varied from −11.1 to −12.7%. Similarly, for V, the relative RMSE varied from 25.9 to 28.2% for sample plot sizes of 254-761 m 2 . Respectively, the bias varied from −14.5% to −15.7%.
It is worth noting that, for the largest sample plot size of 1018 m 2 , the accuracy of results is lower than those for smaller plot sizes. This decrease in accuracy is seen in both relative RMSE and bias, as shown in Figures 3 and 4. The effect is more pronounced when the more accurate harvester head position (XY HH ) is used in modeling, however it is also seen when using the harvester position (XY H ). This observation is judged to be due to the larger internal variation of sample plots-the species-wise tree distributions spread when the size of the sample plot increases. This in turn affects those features of the sample plots that are used in the k-MSN nearest-neighbor search and imputation.
detail, for more accurate positioning (XYHH), the relative RMSE of G varied from 23.2 to 23.9% for sample plot sizes of 254-761 m 2 . Respectively, the bias varied from −11.1 to −12.7%. Similarly, for V, the relative RMSE varied from 25.9 to 28.2% for sample plot sizes of 254-761 m 2 . Respectively, the bias varied from −14.5% to −15.7%.
It is worth noting that, for the largest sample plot size of 1018 m 2 , the accuracy of results is lower than those for smaller plot sizes. This decrease in accuracy is seen in both relative RMSE and bias, as shown in Figures 3 and 4. The effect is more pronounced when the more accurate harvester head position (XYHH) is used in modeling, however it is also seen when using the harvester position (XYH). This observation is judged to be due to the larger internal variation of sample plots-the species-wise tree distributions spread when the size of the sample plot increases. This in turn affects those features of the sample plots that are used in the k-MSN nearest-neighbor search and imputation.    It is worth noting that, for the largest sample plot size of 1018 m 2 , the accuracy of results is lower than those for smaller plot sizes. This decrease in accuracy is seen in both relative RMSE and bias, as shown in Figures 3 and 4. The effect is more pronounced when the more accurate harvester head position (XYHH) is used in modeling, however it is also seen when using the harvester position (XYH). This observation is judged to be due to the larger internal variation of sample plots-the species-wise tree distributions spread when the size of the sample plot increases. This in turn affects those features of the sample plots that are used in the k-MSN nearest-neighbor search and imputation.   . See also data from Table 2.

Discussion
The use of harvester data in the remote sensing-based estimation of forest inventory attributes in mature stands was investigated. Two alternative methods for deriving single tree positions and four different sample plot sizes were tested. Currently, the most widely used method for recording positions of the cut trees to harvester data is to assign the GNSS position of the harvester during the time of the cut for each tree. This kind of position (i.e., XY H positions) was recorded for each tree in the modeling and validation datasets used. For comparison, improved harvester head position (i.e., XY HH positions) during the time of tree cut was computationally derived from the harvester data for each tree by processing the GNSS positions and bearings of the harvester [23]. This was required since GNSS positioning data contain so much variation that deriving single tree positions with adequate accuracy is not directly feasible. It was also noticed that, in the current dataset, when the harvester was, e.g., moving backwards, the harvester bearing needed slight modification.
The optimal conditions for using harvester measurements in ALS inventories depend on the current results as well as background processes. Based on this study, both modeling approaches (XY H and XY HH ) obtained the same level of accuracy. Referring to the best practices in ALS-based forest inventories which are used in Nordic countries, sample plot size should correspond to the size of the prediction grid [9]. The current results show that for harvester head positions (XY HH ) the achieved accuracy of G and V is rather independent of the sample plot size, as expected given the better inherent single-tree positioning accuracy of the XY HH dataset when compared to XY H positions. The sample plot size began to affect the XY HH results when internal variation within the sample plots increased and took over the uncertainty at the largest plots, decreasing the accuracy via the k-MSN search results. The same effect of decreasing accuracy of G and V was also present when using the harvester positions (XY H ). However, for the XY H results, the accuracy also decreased when smaller sample plot sizes were used. This finding is logical, as the effect of a single incorrectly linked tree is more significant when smaller sample plots are used. Due to these facts, smaller sample plot sizes should be favored, when possible, in measuring the sample plots, and therefore, using the harvester head positions (XY HH ) with a smallest sample plot size of 254 m 2 is advisable to obtain the best achievable results. If the harvester position (XY H ) is the only position data that is recorded, the larger plot area of 761 m 2 is found to be necessary with this inherently less accurate dataset to achieve the same accuracy. The prediction accuracies of D g and H g were not sensitive to the used sample plot size or to the positioning method. This can be expected, as these attributes were calculated as weighted means from all the trees within a sample plot, instead of summing up single-tree attributes. Gobakken and Naesset [10] investigated the effect of the sample plot positioning error in the ALS-based prediction of forest inventory attributes and concluded that positioning errors affect the accuracy of G and V more than diameter and height attributes.
The amount of reference sample plots is crucial, and if the harvesters collect the reference data instead of manual measurements, the amount-cost ratio of the collected data is significantly better for the alternative of using harvesters, keeping in mind that the obtained accuracy has to be good enough for the purpose which is confirmed in the current study for its part (best obtained RMSE approximately 25% for G and V). Both alternatives of harvester position achieved roughly the same level of accuracy as manually measured reference sample plots [39]. It is also important to note that the obtained accuracy for harvester head positions combined with manually measured sample plot data improves accuracy [39]. However, in clear-cut stands, the costs do not increase if larger sample plot sizes are used with harvester data, and this is beneficial if only the harvester positions (XY H ) are available.
The obtained results should be compared to those of other studies with caution. The current results indicate that the basal-area weighted tree diameter and tree height are predicted to have roughly the same RMSE as that reported in other related works [40,41]. The RMSE of volume is higher than in related works [40,41], however the bias of the best predictions of this study is at the same level as in References [40,41]. Although the number of modeling plots was 800 for each sample plot size, it should be noted that these plots were generated using harvester data from only eight stands. In typical operational forest inventory projects, modeling data tends to be collected from a wider range of stand structures, and the prediction models are also applied to a wider range of forest conditions (e.g., Reference [9]); this was also aimed for in this study, however it was only partially completed, and therefore the amount of data was limited. Presumably, the predictions were biased (see Table 2) mainly because forest structural variation in the eight modeling stands did not capture all the variation that was present in the 150 validation stands. Therefore, it would be beneficial to collect a more comprehensive dataset from harvesters and repeat the study. Nevertheless, with the current study setup it was still possible to concentrate to the effect of the positioning method and generated sample plot size to the prediction accuracy with harvester data, which was the main aim of the study. In traditionally performed ALS studies, the RMSEs for G and V are lower (for example, in volume, at best 11-14%) than in these results, however, the number of sample plots and methods used to perform reference measurements differ [42].
In an area-based approach, it is essential that the trees measured in the field can be linked to the correct sample plot and that the derived forest inventory attributes can be further linked to the remote sensing features extracted from the respective area. Therefore, when harvester data are used to generate sample plot information, the prerequisite is that the measured trees can be linked to the correct sample plots. In theory, the prediction accuracy should improve when single trees are positioned using information from the harvester head and GNSS instead of using only GNSS information, as it should be possible to correctly link a higher proportion of the trees to the trees of sample plots, and correspondingly to remote sensing features. The geographical accuracy of the single-tree data collected by harvester has been widely seen as a limiting factor for many forestry applications including the generation of modeling data for remote sensing [17,18,20,21].
The positioning accuracy for single trees is essential for using harvester data to predict stand variables in ABA inventory methods. Based on this study, it appears that the use of larger sample plots reduces part of the negative effects of the positioning errors, as can be seen from results obtained by only using XY H positions, which have higher uncertainty than those obtained using XY HH positions. By combining real-time kinematic (RTK) GNSS locations and harvester head positions relative to the harvester, Hauglin et al. [18] were able to obtain a mean position error of 0.94 m for single trees. In the current study, this level of accuracy was not obtained for single trees, even with the XY HH method, since GNSS positions were obtained using a Garmin GPS 16x-HVS GNSS receiver, not an RTK receiver. Presumably, the accuracy of the used GNSS receiver was between 5 and 10 m, as such accuracies have been obtained with similar GNSS receivers under forest canopies (e.g., Reference [22,43]). Additionally, the harvester used for single-tree data collection in this work did not record the position of the last extensible part of the boom, which caused an additional positioning error component in the data. As shown in Reference [18], the positioning accuracy of the harvester head can be less than one meter if the GNSS system in the harvester can achieve a positioning accuracy of less than one meter. It will be more common in modern CTL harvesters to record specific position parameters for the harvester head according to StanforD2010 [19]. In general, this will improve the usability of the harvester data for many applications, including remote sensing-based forest mapping and monitoring of forest operations. To improve single-tree positioning using harvester data, it seems that it is still essential to use more accurate GNSS receivers in harvesters, in addition to harvester head positioning relative to the harvester. Therefore, in the present study, the obtained improvement in the recorded harvester head parameters remained limited, due to the positioning uncertainties and lack of data. In the future, if harvester data is collected automatically from every stand with modern accurate GNSS devices and mobile laser scanners connected to the harvester, the amount of reference sample plots can be significantly larger per inventory area, which also opens the possibility to validate more specifically how many and what kind of reference plots are appropriate to use as reference data. More study is needed, with larger amounts of data.

Conclusions
In this study, a new method was developed and tested to replace part of the manually measured sample plots from mature stands with the exact volume data produced by harvesters. The advantage of the method is that the data is accumulated along with the operative logging in a standardized form and the plots can be formed afterwards. Harvester data can therefore be used to support forestand wood-procurement planning by providing auxiliary information that is required in the remote sensing-based estimation of forest inventory attributes for mature stands. This research confirms that harvester measurements can be used for measuring tree dimensions as reference data for ALS inventories. Based on the current results, it is advisable to predict the forest inventory attributes by using harvester head positions (XY HH ) and a smallest plot size of 254 m 2 , which corresponds to the plot size of current ALS measurements of forest inventories in Nordic countries. Instead, if only harvester positions (XY H ) are available, expanding the sample plot size to 761 m 2 allows a similar accuracy to be reached as that obtained using harvester head positions (XY HH ). It is important to note that using a small dataset of harvester head positions (XY HH ) with current positioning uncertainties provides as good accuracy as the maximum obtained using harvester positions (XY H ). Additionally, the results obtained using XY HH positions are rather independent of the sample plot size, as the dataset has better inherent accuracy. Both position alternatives achieve the same level of accuracy as that achieved when using field sample plot data. This fact suggests that the sample plots from clear-cut stands could be replaced by harvester sample plots without decreasing the accuracy of inventory. Furthermore, combining harvester sample plots from clear-cut stands with reference sample plots from field inventories improves the accuracy of interpretation, as the amount of harvester sample plots can be increased with minor costs. In the future, more accurate positioning via harvester measurements at the tree level could help to further improve the accuracy of the positioning data, thus, enabling the use of empirical diameter distributions in the inventories. The data can also be extended to younger stands if the remaining trees can be mapped with mobile laser scanners in harvesters in the future.