Estimation and Spatial Mapping of Residue Biomass following CTL Harvesting in Pinus radiata Plantations: An Application of Harvester Data Analytics

: The utilization of forest harvest residues for renewable bioenergy production and bioproducts has increasingly become an integrated part of forestry that helps to meet the needs of climate change mitigation and a future carbon neutral economy. An essential element in the planning of any harvesting residue recovery operation is a reliable assessment of the quantity and quality of residue biomass and its composition over a harvest area. With the now widely adopted cut-to-length (CTL) at the stump harvesting system in Pinus radiata plantations in Australia, harvesting residues left on site are signiﬁcantly larger in quantity and spatially more dispersed over a harvest area in comparison to the more traditional whole-tree harvesting. The conventional approach of assessing forest harvest residues through sample plots, transects, or small study blocks has provided site-speciﬁc estimates of residue biomass. However, these estimates cannot be readily extrapolated over the plantation landscape, which varies in silviculture, site, and stand conditions. To overcome this limitation, this study relied on harvester data analytics to obtain spatially explicit estimates of residue biomass using an example data set from harvested plantations of three stand types: unthinned (T0), thinned once (T1), and thinned twice (T2). Three methods of integrating biomass equations with harvester data were compared for residue biomass estimation: (1) applying individual tree biomass equations to harvested stems, (2) applying stand-level biomass equations to gridded harvester data, and (3) integrating estimates from the ﬁrst approach with recorded and estimated waste volumes of harvested stems. The estimates of total residue biomass obtained using the three methods through harvester data analytics varied between 56.2 and 156.4 t/ha in green weight across the three stand types. These estimates were validated indirectly through ex situ sample plots and proved to be comparable to the quantities of residue biomass assessed using conventional sample plots, transects, or small blocks following CTL harvesting of rotation age P. radiata plantations elsewhere in Australia. Among the three methods, the third method made the most intensive use of the harvester data and provided the most realistic estimates of residue biomass. Spatial mapping of the estimated total and component residue biomass will assist the operational planning of residue recovery and site-speciﬁc nutrient management for the long-term sustainability of P. radiata plantations.

individual tree or stand level from tree or stand variables derived from pre-operational inventory plots. For rotation age P. radiata plantations grown under different thinning regimes, systems of equations have been developed specifically to estimate the biomass in commercial logs and harvest residues in fresh and dry weight at both individual tree and stand levels [23,29]. The inventory and sampling-based estimates are then scaled up on a pro rata basis to provide estimates of total and recoverable residue biomass for a harvest area or a plantation estate. Such scaled-up estimates, although useful for broad, high-level management planning, lack the spatial resolution for more detailed operational planning that considers the collection and transportation costs. At an operational level, reliable and spatially explicit estimation of residue biomass and its composition is needed to assist planning for clearfell operations that integrate conventional CTL harvesting with residue biomass collection-i.e., integrated biomass recovery (e.g., [30,31]).
Since CTL harvesters were introduced into Australia for plantation harvesting in the early 1990s [32], cut-to-length at the stump has become the predominant harvesting system used in the country [13]. The present-day CTL harvesters are equipped with an on-board computer and a GPS receiver together with a computerized harvester head to optimize cutting and thereby reduce volume of waste and simultaneously collect production data in real time during harvesting. A harvester dataset usually contains anywhere between tens of thousands and millions of stems. The assortment, length, and end diameters of each log and waste section cut from each stem were recorded together with its spatial location, time stamp, and many other variables associated with harvesting operation and log production (see [33,34]). However, stump height and total tree height are not available because the harvester head does not process the stump and the treetop beyond the last cutting point and so cannot measure their lengths. Nonetheless, the spatially explicit and comprehensive log production data, when combined with biomass equations developed specifically for rotation age stands (e.g., [23,29]), would help overcome the limitations of conventional plot or transect-based sampling in the estimation of residue biomass. This approach of using harvester data to estimate residue biomass was adopted for the most common Scandinavian tree species in Finland [35][36][37]. Most recently, Woo et al. [38] developed a software tool attempting to integrate harvester data and individual tree biomass equations for the estimation of residue biomass, but the estimates they demonstrated for a patch of P. radiata plantation in Tasmania, an island state of Australia, were out of the range in comparison with previously reported residue and total aboveground stand biomass for P. radiata plantations in Australia [21][22][23][39][40][41].
For more reliable estimations of log product and residue biomass, the DBHOB (diameter at breast height overbark) and total tree height of the bucked stems contained in a harvester dataset need to be estimated first using predictive equations developed specifically for this purpose (see [42,43]). Further, the spatial locations of stems and their estimated DBHOB and height can be used to calculate the stand level attributes such as stand density, basal area, and stand height of tessellated grid cells or otherwise defined areas (e.g., [44][45][46][47]). As predictors for biomass, these tree and stand attributes provide an essential linkage between harvester data and biomass equations at both the tree and stand levels. A case in point is the systems of equations developed by Wang et al. [29] and Qiao et al. [23] for predicting biomass in commercial logs and residue components for the rotation age P. radiata trees and stands. When such biomass equations are applied to harvester data from P. radiata plantations, their applicability and limitations for predicting component biomass need to be evaluated and recognised. The need to do so arises from the fact that biomass equations are usually based on a limited number of sample trees with stems largely free from defects and, consequently, their cutting patterns during destructive sampling tend to be more regular than those encompassed in a big harvester dataset. At least for some stems in harvester data, the proportional biomass allocated to commercial logs and waste sections, a major component of residue biomass, may differ substantially from the allocation patterns covered by the limited number of sample trees for developing biomass equations. This study aimed to introduce harvester data analytics for the estimation and spatial mapping of residue biomass in plantations following CTL harvesting and to demonstrate the usefulness of this novel approach in overcoming the limitations of the conventional inventory and sampling-based residue biomass estimation. In doing so, recently harvested P. radiata plantations that underwent three different thinning regimes were used as examples. Three approaches were adopted and compared for accomplishing the objective: (1) applying individual tree biomass equations to stems contained in the harvester data, (2) linking stand level biomass equations to gridded harvester data, and (3) refining estimates of component biomass for individual trees using the cutting pattern, log, and waste volumes of each bucked stem in the harvester data. As standing dead trees in a plantation are not logged in harvester data during harvesting, adjustment factors were determined and incorporated for thinned and unthinned stands when the refined individual tree estimates were aggregated and scaled up to the stand level. The residue biomass that was calculated through the three methods were validated indirectly and compared with previously reported conventional estimates of residue biomass in P. radiata plantations elsewhere. The comparative results led to recommendations on the best approach for the estimation and spatial mapping of residue biomass to assist harvest planning for operations that integrate CTL log production with residue biomass collection in P. radiata plantations.

Study Area
The study area was within the Bathurst management area of the Forestry Corporation of New South Wales (FCNSW), the largest manager of P. radiata plantations in Australia. This management area encompasses a total of 70,000 hectares of P. radiata plantations across 18 disjunct state forests, spreading across an east-west distance of about 130 km and a northsouth distance of about 190 km on the central west slopes of NSW. Its geographical location, geology, soil types, climatic and site conditions, history of plantation establishment, and the silvicultural regimes and rotation ages of the plantations were described in detail by Knott and Ryan [48], Wang et al. [29], and Qiao et al. [23]. Since the 1980s, the initial planting stocking has been set to 1000-1300 trees/ha, except for steeper slopes where stands are established with a lower stocking of 700-900 trees/ha and no thinning operation is intended [49]. For more productive sites, either one or two thinnings are prescribed to reduce the stocking down to 450-550 trees/ha at the age of first thinning, which is around 14 years, and down to 200-300 trees/ha at the age of second thinning, which is around the age of 23 years. However, the prescribed thinning regime may be applied to a plantation with some necessary variations because the decision to thin a particular area is also dependent on other factors, such as drought exposure, tree health, customer product requirements, and economic and market considerations [49]. For poorer sites, no thinning is specified before the final harvest, which is generally at the rotation age of 30 to 35 years.

Harvester Data
As part of a larger harvester data analytics project, harvester data from recently harvested P. radiata plantations across nine state forests in the Bathurst management area were screened, cleaned, and processed through a workflow based on previously established protocols and procedures [42,43]. The harvesters currently operating in the Bathurst management area are mostly tracked John Deere, Komatsu, Tigercat, and Timberpro base machines with a 9-10 m boom reach. They are fitted with a Waratah or Komatsu harvesting and processing head together with Waratah's TimberRite TM or Komatsu's MaxiXplorer TM control, measurement, and information system. These harvesters are still using the old ASCII-based StanForD (Standard for Forest Machine Data and Communication), which was originally developed by the Forestry Research Institute of Sweden in 1988 [50], and not the new xml-based StanForD2010 released in 2011 [33,51]. Machine operators are required to calibrate their harvester heads on a regular basis through standard procedures according to the guidelines of FCNSW. An unpublished internal study on the accuracy of calibrated harvester heads showed the mean error of log length measurements for a Forests 2022, 13, 428 5 of 24 5 m log was 0.11 cm and the percentage of logs within the 6 adjacent 1 cm error classes with the highest number of logs was 90.1%. For a log end with a DOB of 30 cm, the mean measurement error of DOB ranged from −0.3 to 0.05 cm depending on the log positions. Following harvesting, all information recorded and stored in the onboard computer of each harvester was uploaded to the harvester database management system of FCNSW through the STICKS software developed by ForestPHD Pty. Ltd., Rotorua, New Zealand, which is widely used in Australia at present.
The harvester data contained the length, small end diameter overbark (SEDOB), and volume overbark of every log and waste section cut from each stem. In addition, log assortment was recorded according to customer product specifications. Offcut sections were tagged as waste in product assortment regardless of their lengths. SEDOB was recorded in mm, log length in cm, and overbark log and waste volume in m 3 , but DBHOB was not contained in the harvester data. The latitude and longitude of each stem were in the harvester data. Given that the GPS receiver is mounted on the cabin of harvesters, the recorded spatial location was the position of the harvester cutting the tree but not that of the tree itself. The actual position of the tree relative to the harvester could be anywhere within the range of the boom in any direction as the boom orientation could not be measured. Such positional error for a tree was compounded with the GPS receiver error for the harvester. When the accuracy of the GPS receivers in a forest environment was taken into account, the positional error for the individual stems was expected to be in the typical range of 10-20 m, as assessed by Lindroos et al. [52]. For the objectives of this study, a small subset of data was selected and extracted from the larger harvester dataset to represent plantations of three stand types: unthinned (T0), thinned once (T1), and thinned twice (T2). These plantations were all established in 1988 and harvested over a period of ten months between April 2020 and February 2021. The plantations consisted of 19 disjunct patches with a total area of approximately 136 ha and 74,094 harvested stems. There were 9 patches of T0 plantations varying between 2.4 and 9.4 ha with a total area of 46.4 ha, 5 patches of T1 plantations varying between 2.5 and 31.5 ha with a total area of 48.0 ha, and 5 patches of T2 plantations varying between 5.8 and 11.4 ha with a total area of 41.4 ha. The disjunct patches of each stand type were within close proximity to each other as they belonged to the same State Forest in the study area.

Harvester Data Analytics
The entire process of harvester data analytics started from data exploration, diagnostics, tagging, and screening. This initial data processing was carried out in multiple rounds and was facilitated through descriptive as well as diagnostic data analysis and visualization. In each round, the exploratory analysis and data screening detected data errors, discovered duplicated records, and uncovered problems with variables in different ways. After this, necessary steps were taken to correct data errors and rectify problems. As this initial phase of data processing was carried out as part of the larger harvester data analytics project across the nine state forests in the Bathurst management area, detailed descriptions would be lengthy and beyond the scope of this paper. After the harvester data were cleaned and tidied up, the following major steps were carried out: (1) reconstructing the size of individual trees, (2) calculating stand-level attributes using gridded harvester data, (3) estimating residue biomass by applying biomass equations at both individual tree and stand levels and also through converting the recorded log and waste volumes of stems into biomass, and (4) producing residual biomass heat maps for the harvested areas.

Reconstructing the Size of Individual Trees
Following the screening and pre-processing of the harvester data, the DBHOB and total tree height of each bucked stem were estimated using equations previously developed by Lu et al. [42] and Shan et al. [43] for the purpose of processing harvester data from P. radiata plantations. DBHOB was estimated from the SEDOB and the height above groundlevel of the butt log, which was calculated as the length of the butt log plus a stump height of 0.15 m. This stump height was the average height of more than 4600 stumps measured in post-harvesting surveys over 31 sites as part of the routine residue assessment for clearfell operations using CTL harvesters in P. radiata plantations in NSW [42]. A set of simple linear regression equations and correction factors for log transformation bias was tabulated by Lu et al. [42] for estimating DBHOB at specified heights between 0.05 and 7.05 m above ground-level with an interval of 0.1 m and between 7.05 and 9.55 m with an interval of 0.5 m. The total tree height of bucked stems was estimated using the nonlinear equations developed by Shan et al. [43]: and where H is the total tree height from ground level to tip in m of a stem, L is its total log length in m-i.e., the sum of lengths of all logs and waste sections cut from the stem-H s represents the stump height of 0.15 m, SEDTL stands for the SEDOB of the top log in cm, d = (SEDTL DBHOB) is the relative diameter in the unit interval of (0, 1), T =(DBHOB − SEDTL)/L is the average taper of the cut stem in cm/m, and a i and b i are parameters. For stems with d ≤ 0.8, Equation (1) was used because its predictive performance was superior to that of Equation (2) within this relative diameter range. When d > 0.8, Equation (2) was used as its performance was slightly better than that of Equation (1) [43].

Delineation and Tessellation of Harvester Data for Calculating Stand Level Attributes
The spatial boundaries of each harvest area were delineated by computing in QGIS an alpha convex hull of all stem positions using the method of Asaeedi et al. [53], which uses polygons and angles to find the shape of the hull. The convex polygon was the minimum area that included all stems in the area, but it did not account for the crown extension and the positional error of stems on the polygon borders or forest edges. More than 99% of the harvested stems across the three stand types had an estimated DBHOB less than 65 cm. For open grown P. radiata of the same diameter, the maximum crown extension would be about 7 m, as calculated from the linear relationship between crown width and DBHOB derived by Leech [54]. The boom reach of harvesters operating in the management area ranged from 6.5 to 10.3 m, but was mostly in the range of 9-10 m. To account for the maximum crown extension of edge trees and the positional errors of their stems, a 7 m buffer was added to the polygon borders ( Figure 1). The areas covered with harvester data were tessellated initially into regular grid cells of 25 m × 25 m, 50 m × 50 m, and 100 m × 100 m in size. The three grid sizes were compared in a preliminary analysis of the harvester data. Based on the comparisons, the grid cell size of 50 m × 50 m was chosen for all three stand types (T0, T1, and T2) due to consideration of (1) the positional error of tree locations in the harvester data and (2) the number of trees in a grid cell that was adequate for calculating the stand level attributes. At rotation age, the density of thinned P. radiata stands in the management area could be less than 200 trees/ha in certain places [29]. In such cases, grid cells smaller than the chosen size would not contain enough number of trees for the calculation of stand density, basal area, and stand height. For other types of forests reported in the harvester data literature, segmentations and grid cells smaller or larger than 0.25 ha have been used (e.g., [44,46,55,56]).
All trees within a grid cell were allocated to that cell for the calculation of stand-level attributes including stand density, basal area, and stand height, which is customarily defined as the mean height of the 100 largest diameter trees/ha, calculated on a pro rata basis. These three stand variables together with stand age and stand type are the predictor variables in the systems of biomass equations that were recently developed for rotation-age P. radiata stands in the management area [23]. For grid cells that were completely within the delineated harvest area, a uniform area of 0.25 ha was used in the calculations. For partial grid cells along the boundaries of a delineated harvest area (Figure 1), the stand variables were calculated in two ways, depending on the number of harvested stems within the grid cell. For grid cells with at least 35 stems, the calculations were based on the partial area of each grid cell that was within the delineated harvest area. This number of trees was greater than the number of trees contained in a routine pre-operational inventory plot in the plantations. It was also above the minimum number of trees that was required for the reliable estimation of stand attributes (c.f. [57][58][59]). For grid cells with less than 35 trees, the width of the grid cell was stretched both vertically and horizontally by 5 m on each side to take in a greater number of stems. If still not enough, this step was repeated one or two times until there was enough of a number of trees contained in the stretched grid cell (Figure 1). The stand-level variables were then calculated using the partial area of the stretched grid cell that was within the harvest area.

Residue Biomass Estimation
Three approaches were compared for estimating residue biomass in both green and dry weight: (1) applying individual tree biomass equations to the estimated DBHOB and The areas covered with harvester data were tessellated initially into regular grid cells of 25 m × 25 m, 50 m × 50 m, and 100 m × 100 m in size. The three grid sizes were compared in a preliminary analysis of the harvester data. Based on the comparisons, the grid cell size of 50 m × 50 m was chosen for all three stand types (T0, T1, and T2) due to consideration of (1) the positional error of tree locations in the harvester data and (2) the number of trees in a grid cell that was adequate for calculating the stand level attributes. At rotation age, the density of thinned P. radiata stands in the management area could be less than 200 trees/ha in certain places [29]. In such cases, grid cells smaller than the chosen size would not contain enough number of trees for the calculation of stand density, basal area, and stand height. For other types of forests reported in the harvester data literature, segmentations and grid cells smaller or larger than 0.25 ha have been used (e.g., [44,46,55,56]).
All trees within a grid cell were allocated to that cell for the calculation of standlevel attributes including stand density, basal area, and stand height, which is customarily defined as the mean height of the 100 largest diameter trees/ha, calculated on a pro rata basis. These three stand variables together with stand age and stand type are the predictor variables in the systems of biomass equations that were recently developed for rotation-age P. radiata stands in the management area [23]. For grid cells that were completely within the delineated harvest area, a uniform area of 0.25 ha was used in the calculations. For partial grid cells along the boundaries of a delineated harvest area (Figure 1), the stand variables were calculated in two ways, depending on the number of harvested stems within the grid cell. For grid cells with at least 35 stems, the calculations were based on the partial area of each grid cell that was within the delineated harvest area. This number of trees was greater than the number of trees contained in a routine pre-operational inventory plot in the plantations. It was also above the minimum number of trees that was required for the reliable estimation of stand attributes (c.f. [57][58][59]). For grid cells with less than 35 trees, the width of the grid cell was stretched both vertically and horizontally by 5 m on each side to take in a greater number of stems. If still not enough, this step was repeated one or two times until there was enough of a number of trees contained in the stretched grid cell ( Figure 1). The stand-level variables were then calculated using the partial area of the stretched grid cell that was within the harvest area.

Residue Biomass Estimation
Three approaches were compared for estimating residue biomass in both green and dry weight: (1) applying individual tree biomass equations to the estimated DBHOB and total tree height values of harvested stems, (2) applying stand-level biomass equations to the gridded harvester data, and (3) integrating estimates from the first approach with waste volumes of stems in the harvester data. In the first approach, the biomass of commercial logs and harvest residues were predicted for individual stems using the systems of additive and allocative equations developed by Wang et al. [29] for individual trees of P. radiata at rotation age. These equations were based on data from 239 trees that were destructively sampled from among 61 sample plots established across the three stand types in the same study area. The predictor variables in the equations were DBHOB, total tree height, and two dummy variables for the three stand types (T0, T1, and T2). For each stem, the total residue biomass was estimated first through the three additive equations for log product, harvest residue, and total aboveground biomass. Then the estimated total residue biomass was allocated to the stump, branch, and waste components through the three allocative equations (see [29]). In the second approach, the systems of six additive equations developed by Qiao et al. [23] for predicting stand-level biomass in sawlogs, pulpwood, stumps, branches, waste sections, and total aboveground biomass were applied to each grid cell over the gridded harvester data as demonstrated in Figure 1. These additive equations were developed using stand-level biomass estimates obtained for the same 61 sample plots as described above. These stand-level biomass estimates included both live trees and the standing dead trees that were found primarily in unthinned stands. For standing dead trees, either whole or broken, the stem biomass was allocated to the stump and waste components, while branch biomass was regarded as negligible and not included in the biomass calculations as described in Qiao et al. [23]. The predictors of the six additive equations included stand age, stand density, basal area, stand height, and the dummy variables for stand types.
In the third approach, the total overbark volume of commercial logs and that of the waste sections cut from a stem were calculated using the product assortment and volume information contained in the harvester data. The calculated volumes were converted to biomass in green as well as dry weight for the stem by using volume-to-biomass conversion factors. Two conversion factors were derived from the units of kg/m 3 for each stand type for green and dry weight conversions, respectively. They were 789 and 413 for unthinned stands, 892 and 398 for T1 stands, and 876 and 408 for T2 stands. These conversion factors were the average biomass-to-volume ratios calculated using the fresh and dry weight and overbark volume of the whole merchantable stem (i.e., not including the stump and the top stem section beyond the last cutting point) of a total of 239 trees sampled across the three stand types in rotation age P. radiata plantations in the study area. The green and dry weight of the different biomass components of individual sample trees were calculated by Wang et al. [29] and described in detail therein. The overbark merchantable stem volume of each sample tree was derived using the overbark stem taper equation developed by Wang et al. [29] for each stand type using the trigonometric variable-form taper model form of Bi [60].
However, the waste sections and volumes recorded for a stem in the harvester data might not be all there was for the stem. This was because the harvester operators tended not to process a stem any further after cutting the last commercial log from the stem, leaving one or more potential waste sections in the upper stem unaccounted for in the harvester data. The SEDTL (SEDOB of the top log) distribution of stems in the harvester data had a lower quartile of 11.4 cm, a median of 13.3 cm, a mean of 15.8 cm, an upper quartile of 17.1 cm, and a 90th percentile of 23.2 cm, which were all greater than the minimum SEDOB of 8 cm that was specified for a pulp log. Such a positively skewed distribution indicated that a substantial proportion of trees had unprocessed stem tops that could be included as waste sections in the residue biomass calculation. To do so, the SEDTL of each bucked stem was evaluated against the minimum SEDOB for a pulp log. If it was greater than 8 cm, the overbark volume of the top stem section between the height of the last cut and the height of the minimum SEDOB was calculated through numerical integration of the overbark stem taper equation developed by Wang et al. [29] for each stand type. The volume was then converted to fresh as well as dry weight using the corresponding conversion factors as described above and included as the additional waste biomass for the stem.
As the second approach provided stand-level residue biomass estimates for individual grid cells, the three approaches for residue biomass estimation were compared at the scale of the grid cells. For the first and third methods, the residue biomass estimates of individual trees within the same grid cell were summed up and converted to stand-level estimates on a per hectare basis. Unlike the second approach, residue biomass estimates obtained through the first and third methods did not include standing dead trees, as they were not harvested and processed through the harvester head and so were not present in the harvester data. As a further refinement of the third method, its stand-level estimates of stump and waste biomass in both green and dry weight for unthinned stands in T0 grid cells were expanded by a factor of 1.10 and 1.13, respectively. These two factors were derived using stand-level biomass data calculated and compiled by Qiao [23], which contained the stump and waste biomass of both live and dead trees for the 61 sample plots across the three stand types. For each plot, a ratio of the total waste biomass of live and dead trees to that of live trees only was calculated in fresh weight. In the same way, a ratio for stump biomass was calculated. The average ratio for each residue component among the ten unthinned plots of comparable stand age to the plantations in this study was taken as the expansion factor for the T0 stands. As there were very few standing dead trees among the sample plots in the T1 and T2 stands, the calculated biomass expansion factors took the value of one after being rounded to two decimal places. Therefore, no adjustment was made to the residue biomass estimates for T1 and T2 grid cells. After the stand-level estimates were finalized, statistical graphics and simple descriptive statistics were used to compare the differences in the scaled-up estimates among the three approaches across the three stand types.

Spatial Mapping of Residue Biomass Estimates
Using the geographic locations of the harvested stems, the estimated total and component residue biomass were spatially mapped in the form of heatmaps for visualization of each harvest area. The heatmap renderer and its underlying processing algorithm in QGIS were used for this purpose. In addition, grid cell maps were made to show the quantity of total residue biomass and its proportional allocation to the stump, branch, and waste components for delineated harvest areas across the three stand types.

Evaluation of the Accuracy of Residue Biomass Estimation
As the harvester data analytics described above were performed for several months to a year after the routine harvesting operations, in situ data of residue biomass were not available for directly validating the calculated and spatially mapped residue biomass in the harvested plantations. In an ideal world, at least 30 survey plots of the same size as the 50 m × 50 m grid cells would need to be established and measured to obtain the tree and stand attributes for each of the three stand types prior to harvesting. During the operations, all harvest residues within a plot would have to be collected, sorted out into piles of different components, and weighed instantaneously after trees were felled and processed to avoid any substantial loss of moisture in the determination of their fresh weights. In addition, samples would have to be collected to determine the dry-to-fresh weight ratio of each component. This conventional approach of direct validation would require a large amount of time, manpower, and additional machinery working on site in synchronization with the harvester and forwarder in routine harvesting operations at a landscape scale. Such a requirement can prove to be restrictive or even prohibitive in terms of cost and operational complexity in practical situations, especially when research funding is limited. Consequently, this conventional approach has seldom been adopted. The only such example has been the case study reported by Palander et al. [36] and Vesa and Palander [37] in central Finland, where they used harvester data to estimate the stump biomass with attached roots of small to medium Norway spruce trees, mostly with DBH less than 40 cm in mixed-species stands.
To overcome the lack of in-situ data for direct validation, an alternative indirect approach was formulated to evaluate the accuracy of residue biomass estimation through using the 61 ex-situ sample plots established in rotation-age P. radiata stands in the same study area as described previously in the section on residue biomass estimation. Among the 61 plots, there were 31 plots in unthinned (T0) plantations and 15 plots each in T1 and T2 plantations. These plots were circular and generally had a radius of 15, 20, or 25 m for T0, T1, and T2 stands. The radius was varied in order to take in at least 30 to 40 trees in all the plots across the three stand types. For these plots, four stand attributes were tabulated and displayed graphically in Wang et al. [29] and Qiao et al. [23], including stand density, basal area, height, and age. The biomass in commercial logs and residues in both fresh and dry weight were calculated and scaled up to per hectare values by Qiao et al. [23] for developing the stand-level biomass equations. This data set would naturally serve as an alternative basis for the indirect validation of the residue biomass estimates obtained for the grid cells through harvester data analytics.
However, the quantities of biomass of the sample plots were not directly comparable to the estimated biomass of the grid cells. The sample plots and grid cells had different spatial scales as well as different stand attributes. To overcome these differences, two further steps of harvester data analytics were taken. Firstly, virtual plots of the same shape and size to the sample plots were drawn over the T0, T1, and T2 harvest areas using the spatial locations of individual stems as plot centres through a computer program. Not all virtual plots drawn in such a way fell completely within the boundaries of the harvested areas.
Once partial plots at the edges were excluded, the number of virtual plots was greater than 21,000 for T2 stands and greater than 31,000 for T0 and T1 stands, respectively. For every virtual plot, stand attributes and biomass quantities were calculated and scaled up to per hectare values in correspondence with the data content and structure of the 61 sample plots as described above. In the second step, for each sample plot of a particular stand type, a virtual plot was selected from among all the virtual plots of the same stand type to match its stand attributes as closely as possible. To facilitate the selection process, a normalized similarity index (SI) was calculated for every virtual plot as follows: where log stands for common logarithm; N s , G s , and H s represent the stand density, basal area, and stand height of the sample plot; N V , G V , and H V are the corresponding stand variables of a virtual plot; and N max , G max , and H max stands for the maximum values of the respective stand variables of all the virtual plots belonging to the stand type. Stand age was not incorporated in the formulation of SI in Equation (3), although it was one of the predictor variables in the stand-level biomass equations of Qiao et al. [23]. In comparison to the other three stand variables, differences in stand age between the sample and virtual plots only varied within a discrete and much narrower range of several years. The small differences in stand age generally resulted in a less than 1% difference in the predicted values of all the biomass components when holding constant the values of the other three stand variables in the biomass equations of Qiao et al. [23]. Including stand age in the formulation of SI would only add to the complexity of its calculation but make little difference in the calculated values. The normalization was intended to scale the values of SI between 0 and 1 within a unit cube where no single stand variable could exert a disproportional influence. For each sample plot, the virtual plot with the smallest SI, and so the closest similarity, was selected as its matching plot for the intended indirect validation (Figure 2). Although being the closest match, the sample plot and its virtual counterpart still differed somewhat in their stand attributes (Figure 3). The differences in stand attributes would naturally translate into differences in the quantities of all biomass components between the sample and virtual plot, which would consequently confound the validation results. To minimize the confounding effects, an adjustment factor was incorporated in calculating the difference in biomass between the sample and the virtual plot during validation: where and represents the quantities of a certain biomass component at stand level in t/ha for the sample and the virtual plot, respectively, is the adjusted difference between and , and is the adjustment factor, which was derived as follows: where (•) represents the stand level equation for the biomass component as given in Equation (7) of Qiao et al. [23], and are the stand age in years of the sample and the virtual plot, respectively, and the other predictors of (•) were previously defined in Equation (3). If the sample and the virtual plot had the same predictor values, would be 1 and no adjustment would be made. In other cases, would be either slightly greater or smaller than 1 depending on how the predicted value of the biomass for the sample plot in the numerator weighed against that for the virtual plot in the denominator.
The adjusted difference was taken as the 'error of estimation' of the biomass component in the indirect validation of the biomass estimates obtained through harvester data analytics for the virtual plot. The size, direction, and dispersion of the estimation errors were summarized through four benchmarking statistics, including the mean error of estimation (MEE), the mean percentage error of estimation (MPEE), the mean absolute error of estimation (MAEE), and the mean squared error of estimation (MSEE). Another simple Although being the closest match, the sample plot and its virtual counterpart still differed somewhat in their stand attributes (Figure 3). The differences in stand attributes would naturally translate into differences in the quantities of all biomass components between the sample and virtual plot, which would consequently confound the validation results. To minimize the confounding effects, an adjustment factor was incorporated in calculating the difference in biomass between the sample and the virtual plot during validation: where M S and M V represents the quantities of a certain biomass component at stand level in t/ha for the sample and the virtual plot, respectively, ε is the adjusted difference between M S and M V , and θ is the adjustment factor, which was derived as follows: where f (·) represents the stand level equation for the biomass component as given in Equation (7) of Qiao et al. [23], A S and A V are the stand age in years of the sample and the virtual plot, respectively, and the other predictors of f (·) were previously defined in Equation (3). If the sample and the virtual plot had the same predictor values, θ would be 1 and no adjustment would be made. In other cases, θ would be either slightly greater or smaller than 1 depending on how the predicted value of the biomass for the sample plot in the numerator weighed against that for the virtual plot in the denominator.
so the variance of the estimation error is larger than the variance of the observed data. Four sets of benchmarking statistics were calculated: a general set for all stand types combined by taking the number of sample plots = 61 and one set for each of the three stand types. In addition to the benchmarking statistics, the values of were plotted against that of with a line of unity slope passing through the origin for a graphical evaluation of the accuracy of residue biomass estimation.  The adjusted difference ε was taken as the 'error of estimation' of the biomass component in the indirect validation of the biomass estimates obtained through harvester data analytics for the virtual plot. The size, direction, and dispersion of the estimation errors were summarized through four benchmarking statistics, including the mean error of estimation (MEE), the mean percentage error of estimation (MPEE), the mean absolute error of estimation (MAEE), and the mean squared error of estimation (MSEE). Another simple statistic, the coefficient of estimation efficiency (R 2 E ) from [61] was also calculated to provide a simple measure of estimation performance. These statistics were calculated as follows: where ε i is the 'error of estimation' of the biomass component for the ith plot in t/ha, n is the number of sample plots, and M S is the average quantity of the biomass component for all n sample plots. These statistics have been commonly used in evaluating the predictive performance of forest and other empirical models. In particular, the MSEE is a common measure of estimation accuracy in the statistical literature since it incorporates both the bias and variance of estimation [62]. The Nash-Sutcliffe coefficient of estimation efficiency (R 2 E ) is sometimes referred to as 'modelling efficiency' or the 'prediction coefficient of determination' (e.g., [63][64][65][66]). It is analogous to the coefficient of determination R 2 , which ranges from 0 to 1. However, R 2 E can range from −∞ to 1 [61]. When R 2 E <0, the estimated values shift further away than the observed values from the observed mean and so the variance of the estimation error is larger than the variance of the observed data. Four sets of benchmarking statistics were calculated: a general set for all stand types combined by taking the number of sample plots n = 61 and one set for each of the three stand types. In addition to the benchmarking statistics, the values of M S were plotted against that of M V with a line of unity slope passing through the origin for a graphical evaluation of the accuracy of residue biomass estimation.

Tree and Stand Attributes Obtained through Harvester Data Analytics
For the 74,094 stems in the harvester data, the estimated DBHOB ranged from 8.3 to 74.7 cm and the estimated total tree height varied from 7.7 to 46.6 m. The range of tree size was similar across the three stand types because of the large number of trees in the harvester data. However, trees in unthinned plantations were generally smaller. The DBHOB distributions for T0, T1, and T2 plantations had a median of 23.6, 34.8, and 33.9 cm; an upper quartile of 28.0, 43.1, and 39.2 cm; and a 95th percentile of 38.1, 52.3, and 47.3 cm, respectively. The median and 95th percentile of the tree height distribution for T0 plantations were 26.5 and 32.7 m, respectively, which were about 3 and 4 m shorter than the same percentiles for T1 and T2 plantations ( Figure 4). The number of commercial logs cut from a stem ranged from 1 to more than 6, but 95% of the stems produced 2 to 6 logs. The number of recorded waste sections for a bucked stem varied mostly from 0 to 4-only with a smaller number of exceptions. For the 74094 stems in the harvester data, the estimated DBHOB ranged from 8.3 to 74.7 cm and the estimated total tree height was from 7.7 to 46.6 m. The range of tree size was similar across the three stand types because of the large number of trees in the harvester data. However, trees in unthinned plantations were generally smaller. The DBHOB distributions for T0, T1, and T2 plantations had a median of 23.6, 34.8, and 33.9 cm; an upper quartile of 28.0, 43.1, and 39.2 cm; and a 95th percentile of 38.1, 52.3, and 47.3 cm, respectively. The median and 95th percentile of the tree height distribution for T0 plantations were 26.5 and 32.7 m, respectively, which were about 3 and 4 m shorter than the same percentiles for T1 and T2 plantations ( Figure 4). The number of commercial logs cut from a stem ranged from 1 to more than 6, but 95% of the stems produced 2 to 6 logs. The number of recorded waste sections for a bucked stem varied mostly from 0 to 4-only with a smaller number of exceptions. The 19 delineated harvest areas contained in the harvester data were covered by a total of 704 grid cells. For each stand type, there were more than 200 grid cells. The standlevel attributes were calculated for the individual grid cells and showed a high degree of variation within and across stand types. Stand density varied from 100 to 2356 trees/ha across the three stand types, but only several grid cells had values greater than 1500 trees/ha ( Figure 5) The central 90% of the stand density distribution was in the range of 281-1312, 173-840, and 192-528 trees/ha for T0, T1, and T2 plantations, respectively. When grid cells with the lowest 5% and highest 5% of stand densities were excluded, the 5% trimmed mean was 698, 460, and 357 trees/ha for T0, T1, and T2 plantations. The stand basal area ranged from 10 to 112 m 2 /ha across the three stand types, but only 4 grid cells had values greater than 90 m 2 /ha and 19 grid cells were less than 15 m 2 /ha. The range was the widest for T0 plantations and narrowed to 11-92 and 12-57 m 2 /ha for T1 and T2 plantations. The mean stand basal area was 37, 48, and 34 m 2 /ha for T0, T1, and T2 plantations, respectively, while the corresponding 5% trimmed means had almost the same values. The stand height varied from 22 to 38 m among the 704 grid cells across the three stand types. The average stand height was 29.3 m for T0 plantations, 33.8 m for T1, and 32.5 m for T2 plantations. The T0 stands were the least uniform and T2 stands the most uniform, as shown by the boxplots of the three stand-level variables for the three stand types (Figure 5). For each stand type, the three stand-level variables calculated for the grid cells were closely correlated ( Figure 6).  Figure 4. Boxplots of estimated DBHOB and total tree height for the bucked stems contained in the harvester data set for each of the three stand types. The number next to each boxplot on the right indicates the number of stems for the corresponding stand type.
The 19 delineated harvest areas contained in the harvester data were covered by a total of 704 grid cells. For each stand type, there were more than 200 grid cells. The standlevel attributes were calculated for the individual grid cells and showed a high degree of variation within and across stand types. Stand density varied from 100 to 2356 trees/ha across the three stand types, but only several grid cells had values greater than 1500 trees/ha ( Figure 5) The central 90% of the stand density distribution was in the range of 281-1312, 173-840, and 192-528 trees/ha for T0, T1, and T2 plantations, respectively. When grid cells with the lowest 5% and highest 5% of stand densities were excluded, the 5% trimmed mean was 698, 460, and 357 trees/ha for T0, T1, and T2 plantations. The stand basal area ranged from 10 to 112 m 2 /ha across the three stand types, but only 4 grid cells had values greater than 90 m 2 /ha and 19 grid cells were less than 15 m 2 /ha. The range was the widest for T0 plantations and narrowed to 11-92 and 12-57 m 2 /ha for T1 and T2 plantations. The mean stand basal area was 37, 48, and 34 m 2 /ha for T0, T1, and T2 plantations, respectively, while the corresponding 5% trimmed means had almost the same values. The stand height varied from 22 to 38 m among the 704 grid cells across the three stand types. The average stand height was 29.3 m for T0 plantations, 33.8 m for T1, and 32.5 m for T2 plantations. The T0 stands were the least uniform and T2 stands the most uniform, as shown by the boxplots of the three stand-level variables for the three stand types ( Figure 5). For each stand type, the three stand-level variables calculated for the grid cells were closely correlated ( Figure 6).

Estimates and Spatial Maps of Residue Biomass
The three approaches resulted in similar estimates for the branch and stump biomas but not for the waste and total residue biomass across the three stand types. The estimated branch biomass in green weight, when averaged over the grid cells, was in the range o 23.2-23.6, 52.7-55.8, and 28.5-30.5 t/ha among the three methods for the T0, T1, and T2 plantations. The corresponding dry weight ranges were 9.8-9.9, 17.2-17.9, and 11.1-11. t/ha for the T0, T1, and T2 plantations, respectively (Figure 7). The average stump biomas estimates in green weight were in the range of 6.7-7.9, 11.0-12.1, and 5.3-7.2 t/ha among the three methods for the T0, T1, and T2 plantations. The corresponding dry weight range were 4.1-4.5, 5.6-5.7, and 3.7-3.9 t/ha among the three methods. The highest estimates o waste biomass came from the third method, while the other two methods resulted in lower estimates. The waste biomass estimates obtained through the third method, when averaged over the grid cells, were 65.2, 88.5, and 60.

Estimates and Spatial Maps of Residue Biomass
The three approaches resulted in similar estimates for the branch and stump biomas but not for the waste and total residue biomass across the three stand types. The estimated branch biomass in green weight, when averaged over the grid cells, was in the range o 23.2-23.6, 52.7-55.8, and 28.5-30.5 t/ha among the three methods for the T0, T1, and T plantations. The corresponding dry weight ranges were 9.8-9.9, 17.2-17.9, and 11.1-11. t/ha for the T0, T1, and T2 plantations, respectively (Figure 7). The average stump biomas estimates in green weight were in the range of 6.7-7.9, 11.0-12.1, and 5.3-7.2 t/ha among the three methods for the T0, T1, and T2 plantations. The corresponding dry weight range were 4.1-4.5, 5.6-5.7, and 3.7-3.9 t/ha among the three methods. The highest estimates o waste biomass came from the third method, while the other two methods resulted in lower estimates. The waste biomass estimates obtained through the third method, when averaged over the grid cells, were 65.2, 88.5, and 60.

Estimates and Spatial Maps of Residue Biomass
The three approaches resulted in similar estimates for the branch and stump biomass but not for the waste and total residue biomass across the three stand types. The estimated branch biomass in green weight, when averaged over the grid cells, was in the range of 23.2-23.6, 52.7-55.8, and 28.5-30.5 t/ha among the three methods for the T0, T1, and T2 plantations. The corresponding dry weight ranges were 9.8-9.9, 17.2-17.9, and 11.1-11.6 t/ha for the T0, T1, and T2 plantations, respectively (Figure 7). The average stump biomass estimates in green weight were in the range of 6.7-7.9, 11.0-12.1, and 5.3-7.2 t/ha among the three methods for the T0, T1, and T2 plantations. The corresponding dry weight ranges were 4.1-4.5, 5.6-5.7, and 3.7-3.9 t/ha among the three methods. The highest estimates of waste biomass came from the third method, while the other two methods resulted in lower estimates. The waste biomass estimates obtained through the third method, when averaged over the grid cells, were 65.2, 88.5, and 60. The total residue biomass estimated through the third method, when averaged over the grid cells, was 96.7, 156.4, and 97.9 t/ha in green weight and 48.6, 63.0, and 43.5 t/ha in dry weight for the T0, T1, and T2 plantations ( Figure 8, Table 1). In comparison, the average estimates of the total residue biomass obtained through the first and second methods were 66.0 and 62.6, 117.6 and 111.6, and 68.8 and 56.2 t/ha in green weight, and 31.4 and 29.3, 43.5 and 42.1, and 29.8 and 24.4 t/ha in dry weight for the T0, T1, and T2 plantations These estimates of the total residue biomass were 25-43% lower compared to those obtained through the third method ( Figure 8). Among the three stand types, the total residue biomass estimates were the highest for T1 plantations. Waste biomass estimates comprised the greatest proportion of total residue biomass in T0 plantations regardless of the method of estimation. For the third method in particular, the estimated waste biomass in green weight accounted for 67.7% of the total residue biomass in T0 plantations, which was 11.7% and 9.2% higher than in T1 and T2 plantations ( Table 1). The total and component residue biomass varied spatially within a harvest area and across stand types, as illustrated by the heatmaps and grid cell maps for the three example harvest areas in Figure 9. The total residue biomass estimated through the third method, when averaged over the grid cells, was 96.7, 156.4, and 97.9 t/ha in green weight and 48.6, 63.0, and 43.5 t/ha in dry weight for the T0, T1, and T2 plantations ( Figure 8, Table 1). In comparison, the average estimates of the total residue biomass obtained through the first and second methods were 66.0 and 62.6, 117.6 and 111.6, and 68.8 and 56.2 t/ha in green weight, and 31.4 and 29.3, 43.5 and 42.1, and 29.8 and 24.4 t/ha in dry weight for the T0, T1, and T2 plantations. These estimates of the total residue biomass were 25-43% lower compared to those obtained through the third method ( Figure 8). Among the three stand types, the total residue biomass estimates were the highest for T1 plantations. Waste biomass estimates comprised the greatest proportion of total residue biomass in T0 plantations regardless of the method of estimation. For the third method in particular, the estimated waste biomass in green weight accounted for 67.7% of the total residue biomass in T0 plantations, which was 11.7% and 9.2% higher than in T1 and T2 plantations ( Table 1). The total and component residue biomass varied spatially within a harvest area and across stand types, as illustrated by the heatmaps and grid cell maps for the three example harvest areas in Figure 9.    Table 1. Estimates of total aboveground stand biomass (Total), total residue biomass (Residue), and its proportional allocation to the three residue components in green and dry weight that were obtained through each of the three methods (M) for the three stand types when averaged over grid cells.

Accuracy of Residue Biomass Estimation through Indirect Validation
The benchmarking statistics of indirect validation for all the stand types combined showed small and negligible biases in the residue biomass estimates obtained through the first method ( Table 2). The mean error of estimation (MEE) in dry weight was −0.45 t/ha (i.e., a slight overestimation) for the total residue biomass and fell within ±0.35 t/ha for the stump, waste, and branch biomass. The MEE values in fresh weight showed a slightly larger overestimation, but still less than 3.3% for all residue components as indicated by the values of the mean percentage error of estimation (MPEE). The average size of estimation error, as shown by the mean absolute error of estimation (MAEE), varied between 0.40 and 4.75 t/ha in dry weight and between 2.0 and 8.7 t/ha in fresh weight for the three residue components. The Nash-Sutcliffe coefficient of estimation efficiency (R 2 E ) ranged between 0.86 and 0.96 for the residue components in dry weight, and between 0.78 and 0.91 in fresh weight. The benchmarking statistics for the second method indicated little bias and high precision in the estimation of all biomass components in both dry and fresh weight. The values of MAEE and MSEE were the smallest among the three methods. In comparison to the first two methods, the third method produced more waste biomass and consequently greater quantities of total residue biomass. The MEE values in fresh and dry weight were −68.8 and −37.1 t/ha for waste biomass and were −67.7 and −34.8 t/ha for total residue biomass. The corresponding R 2 E values were all negative ( Table 2).

Figure 8.
Boxplots of total residue biomass estimates in green and dry weight obtained through the three estimation methods for T0, T1, and T2 plantations. The numbers on the right-hand side of the left graphic panel indicate the total number of grid cells for each of the three stand types. Figure 9. Heatmaps of waste biomass in green weight for three harvested areas representing T0, T1, and T2 plantations (Left). Grid cell maps of total harvest residue biomass in green weight and its proportional allocation to the three residue components (Right). The biomass quantities were calculated using the third method through harvester data analytics.  Figure 9. Heatmaps of waste biomass in green weight for three harvested areas representing T0, T1, and T2 plantations (Left). Grid cell maps of total harvest residue biomass in green weight and its proportional allocation to the three residue components (Right). The biomass quantities were calculated using the third method through harvester data analytics.
The benchmarking statistics for the individual stand types showed that the first method had a slight underestimation of total residue biomass in dry weight of 0.31 and 1.2 t/ha for T0 and T1 stands, respectively, but an overestimation of 3.7 t/ha for T2 stands ( Figure 10). The MEE values for waste biomass in dry weight was 1.1, 1.6, and −2.9 t/ha for T0, T1, and T2 stands, while the corresponding MPEE values were 2.6%, 10.6%, and −14.0%, respectively. The second method again showed the least bias and highest precision among the three methods in the estimation of all biomass components in both dry and fresh weight as indicated by the benchmarking statistics ( Figure 10). In comparison to the first two methods, the third method produced greater amounts of waste biomass across the three stand types, with the MEE values being −79.5, −21.8, and −93.8 t/ha in fresh weight and −45.3, −11.6, and −45.7 t/ha for T0, T1, and T2 stands. The corresponding MPEE values in fresh as well as dry weight fell between −68.6% and −26.3% across the three stand types (Figure 10).  T0  T1  T2   T0  T1  T2   T0  T1  T2   T0  T1  T2 Dry weight M1 M2 M3 Figure 10. Dot plots displaying five benchmarking statistics for indirect validation of residue bio mass estimates in fresh and dry weight (upper and lower halves) that were obtained through har vester data analytics for three types of stands: unthinned (T0), thinned once (T1), and thinned twice (T2). Total residue biomass is labelled as 'Residue' in the yellow strip at the right end of each graph M1, M2, and M3 in the legend represent the first, second, and third method of residue biomass calculation used in data analytics (see text). Due to space limitations, dot points showing < 0 were indicative only, and do not represent the actual values.

Discussion
Among the three methods of residue biomass estimation used in harvester data analytics, the first method provided the easiest and most straightforward link to the integration of harvester data with biomass equations. The DBHOB and total tree height of harvested stems are the only variables to be estimated from harvester data for the estimation of their residue biomass. This method could be easily implemented in a software tool, such as that developed by Woo et al. [38] for real-time monitoring and mapping of residue biomass during or soon after harvesting operations. However, this method cannot take into account the biomass in standing dead trees as they were usually fallen for safety reasons-if not knocked over by harvesting of surrounding trees-but not processed through the harvester head and thus were not contained in the harvester data. The biomass in standing dead trees could be a substantial part of the total residue biomass in unthinned plantations, particularly in areas with heavy self-thinning, drought induced mortality, or wind damage (e.g., [22,67,68]). If this biomass is left unaccounted for, the total residue biomass will be underestimated.
Unlike the first method, which relies on tree-level variables for residue biomass estimation, the second method was more involved as it required the stand-level attributes to be derived first from rasterised or gridded harvester data within a GIS system. The size of the grid cells would have to be determined first by considering (1) the stand density and its local-scale variations within plantations of different stand types that were present in the harvester data and (2) the extent of positional errors of the georeferenced stems. A grid of 50 m × 50 m was overlaid onto the harvester data in this study. Too small a grid cell size would result in the derived stand-level attributes varying outside of the applicable ranges of the stand-level biomass equations developed by Qiao et al. [23]. On the other hand, a very large cell size would smooth over local-scale variations in the estimated weight of residue biomass. Besides the grid size, whether to include partial grid cells along the boundary of a delineated forest area deserves careful consideration. In some other studies, grid cells more than half or fully within the delineated forest area were included in the analysis [44,46,55]. In this study, all grid cells, no matter full or partial, were included in the calculation of the stand-level attributes. The calculated density, basal area, and stand height were lower on average for the partial than for the full grid cells for all three stand types. These differences were likely due to the inclusion of 7 m buffer strips around the perimeter of the initially delineated plantation area (Figure 1). They also explained the slightly lower estimates of residue biomass than that obtained through the first method, as shown in Table 1. Therefore, buffer width presented a further factor to be considered when applying stand-level biomass equations to gridded harvester data. As all these issues in the gridding of the harvester data and the data analytics thereafter need to be resolved within a GIS system, this second method cannot be easily implemented in a software application for real-time stem-by-stem processing and analysis of the harvester data.
The third method was similar to but more refined than the first method and the refinement provided the most realistic and reliable estimates of waste biomass. The feature of this method lies in its more intensive use of harvester data in estimating the biomass of waste and offcut sections. Harvester recorded waste volumes were converted to biomass for individual trees using purposely derived conversion factors. For stems with a SEDTL greater than 8 cm, the specified minimum SEDOB for a pulp log, an offcut stem section above the top log that was not processed through the harvester head, was accounted for in the calculation of waste biomass. The volume of this unprocessed stem section was calculated using a taper function and converted to biomass using the same conversion factors. This approach of estimating waste biomass overcame certain limitations of the existing residue biomass equations of Wang et al. [29] and Qiao et al. [23] that were based on a finite number of sample trees. The samples included 59 trees from unthinned stands and 90 trees for each of the two thinned stand types. Although the sample size of 239 trees was relatively large in the parlance of the literature on biomass equations, the cutting patterns of this finite number of sample trees could not fully represent all actual cutting patterns of the large and almost infinite number of bucked stems captured by harvesters. Another limitation arose from the fact that the sample trees for developing biomass equations were usually selected to represent normal stems, while trees with malforms, defects, damages, or other anomalies were avoided. These trees would produce more and/or longer waste sections and greater waste volumes than similar sized trees with normal stems. Their stems were more likely not to be processed right through the harvester head up to the specified minimum SEDOB for commercial logs, as in the case of normal stems. Therefore, their recorded SEDTL would be far greater than the specified minimum SEDOB. The SEDTL distribution of stems contained in the harvester data had a much wider range than that of biomass sample trees, with a median of 13.5 cm, an upper quartile of 17.3 cm, and a 95th percentile of 28.7 cm. These distributional characteristics highlighted the necessity of including the unprocessed stem tops in the calculation of residue biomass for a substantial proportion of the harvested stems. Although the third method was more complicated than the first, it can still be implemented in a software application for real-time stem-by-stem processing and analysis of the harvester data. However, the estimated stump and waste biomass for T0 plantations would have to be adjusted to account for standing dead trees at stand level.
Due to the expanse of routine harvesting operations and the spatial scale of harvester data analytics, it was difficult to obtain in situ data for residue biomass to validate the calculated and spatially mapped residue biomass in the harvested plantations. To overcome this difficulty, this study tried indirect validation through ex-situ sample plots previously established in rotation-age P. radiata stands in the same study area. Although the benchmark statistics of indirect validation were informative and reassuring to a certain extent, they must be understood and interpreted with a clear understanding of the context and limitations of such indirect validation. Firstly, the sample plots and their matching virtual plots, albeit mostly similar in key stand attributes, might still differ somewhat in diameter and height distributions, in the stem from and branch architecture of individual trees, in site and stand conditions, and in the history of natural disturbances. Differences in these factors would also lead to the sample and the matching virtual plots having different quantities of residue biomass, thereby confounding, to a certain extent, the 'error of estimation' ε as calculated in Equation (4). Secondly, when residue biomass was estimated through the first two methods, i.e., using individual tree and stand-level biomass equations, an unstated implicit assumption was that the trees in the sample and the virtual plots had the same or similar cutting patterns. Consequently, the benchmarking statistics for the first two methods looked far better than those for the third method (Table 2, Figure 10). In the case of the second method, where stand-level biomass equations were used, the calculated residue biomass for a virtual plot was M V = f (N V , G V , H V , A V ). When this relationship is substituted into Equation (4), the adjusted difference between M S and M V becomes ε = M S − f (N S , G S , H S , A S ), i.e., the prediction error of the biomass equation. Therefore, the benchmark statistics would reflect the predictive performance of the biomass equations over the sample data. In comparison to the first two methods, the third method reflected the real cutting patterns of stems in the virtual plots and generated a greater amount of waste biomass than the sample plots. Although the most realistic, this method had the worst benchmarking statistics (Table 2, Figure 10). The two contrasting interpretations highlighted the limitations of such indirect validation through ex-situ sample plots. Future validation studies are being planned at larger operational scales in another plantation area where log production and residue biomass recovery are integrated into plantation harvesting.
The estimates of total residue biomass obtained using the three methods through harvester data analytics varied between 56.2 and 156.4 t/ha in green weight across the three stand types in this study. These estimates were comparable to the quantities of residue biomass assessed using conventional sample plots, transects, or small blocks following CTL harvesting of rotation age P. radiata plantations elsewhere in Australia [22,28,69]. These quantities largely fell within the range of 50-150 GMT (Green Metric Tonnes)/ha, with an average that was slightly higher than 100 GMT/ha, except for one case study in a southern Tasmania plantation where an exceptionally high level of 238.6 GMT/ha was left after CTL harvesting [12]. However, there were differences between our estimates and the previously reported quantities of residue biomass. Firstly, stumps were not assessed and included as harvest residues in those previous studies. Although not a major residue component, stump biomass would account for 8-11% of the total residue biomass in green weight, as shown in Table 1. Secondly, the harvest residues in the previous studies were not always assessed at the time of harvesting, but 3-4 months following harvesting (e.g., [28,69]). Over this time, harvest residues could lose between 10% and 25% of their moisture depending on the season of harvesting, according to the natural drying curves for P. radiata shown in Strandgard et al. [13]. If this loss of moisture content were accounted for, some of the previously reported green weight quantities would have been at least 10% greater.
The conventional approach of assessing harvest residues through sample plots, transects, or small study blocks has provided site-specific estimates of residue biomass. For example, Strandgard and Mitchell [22] set up a study block of approximately 3 ha in a 29-year-old P. radiata plantation in Western Australia and found 104 GMT/ha of harvest residue biomass-not including stumps-following harvesting through the conventional CTL method that used a harvester-processor to fell/process and a forwarder to extract. This plantation had been thinned once at age 15 years and had subsequently suffered sporadic windthrow over approximately 80% of its area. Walsh et al. [69] assessed a 100-tree plot of approximately 0.5 ha in a 34-year-old T2 plantation of P. radiata near Tumut in NSW. The stand was high-yielding with trees of good form and quality, for which the total residue biomass was found to be 151 GMT/ha following CTL harvesting-about 24.3% of the total aboveground stand biomass not including stumps. Although such site-specific assessments of harvest residue biomass were informative, they could not be readily extrapolated over the plantation landscape to examine how harvest residue biomass and its components would vary with different thinning regimes and site and stand conditions. This limitation of the conventional approach can be easily overcome through harvester data analytics as demonstrated in this study.
Furthermore, the estimated residue biomass from harvester data analytics can be easily mapped for all residue components individually to reveal their spatial variation at different spatial scales. Such maps will assist operational planning of biomass recovery, particularly when residue harvesting is run as a separate operation. They will also assist in understanding the spatial variation of available residue biomass and in refining preharvest predictions of residue biomass that are based on LiDAR and inventory metrics when planning for integrated log harvesting and residue biomass recovery. For both types of operations, these maps will provide the basis for estimating site-specific nutrient removal for the composition and rate of the planned residue biomass recovery. A good understanding of biomass and nutrient pools at harvest and the impact of any planned residue biomass recovery on these pools is essential to maintain site productivity and nutrient balance in site-specific management for the long-term sustainability of P. radiata plantations (see [70][71][72][73]).

Conclusions
Pinus radiata plantations are a major source of harvest residue biomass for renewable bioenergy production in Australia. The utilization of this resource requires reliable assessments and estimation of the quantity and quality of residue biomass available for collection in rotation-age plantations over large geographical areas across the southern half of the Australian continent. This study demonstrated a novel approach through harvester data analytics to estimate and spatially map harvest residue biomass in plantations following CTL harvesting. Three methods of integrating biomass equations with harvester data at the tree and stand level were employed and compared for the estimation of stump, waste, branch, and total residue biomass in harvested plantations of three stand types: unthinned (T0), thinned once (T1), and thinned twice (T2). The estimates of total residue biomass obtained using the three methods varied between 56.2 and 156.4 t/ha in green weight and between 24.4 and 63.0 t/ha in dry weight across the three stand types. Among the three methods of residue biomass estimation employed in the harvester data analytics, the third method made the most intensive use of harvester data, relied on the cutting patterns of stems in routine harvesting, and thus provided more realistic estimates of waste biomass. The spatial mapping of the estimated total and component residue biomass will assist operational planning of biomass recovery for bioenergy production and site-specific nutrient management to ensure the long-term sustainability of P. radiata plantations.