The prediction of forest biomass at the landscape scale can be achieved by integrating data from field plots with satellite imagery, in particular data from the Landsat archive, using k-nearest neighbour (kNN) imputation models. While studies have demonstrated different kNN imputation approaches for estimating forest biomass from remote sensing data and forest inventory plots, there is no general agreement on which approach is most appropriate for biomass estimation across large areas. In this study, we compared several imputation approaches for estimating forest biomass using Landsat time-series and inventory plot data. We evaluated 18 kNN models to impute three aboveground biomass (AGB) variables (total AGB, AGB of live trees and AGB of dead trees). These models were developed using different distance techniques (Random Forest or RF, Gradient Nearest Neighbour or GNN, and Most Similar Neighbour or MSN) and different combinations of response variables (model scenarios). Direct biomass imputation models were trained according to the biomass variables while indirect biomass imputation models were trained according to combinations of forest structure variables (e.g., basal area, stem density and stem volume of live and dead-standing trees). We also assessed the ability of our imputation method to spatially predict biomass variables across large areas in relation to a forest disturbance history over a 30-year period (1987–2016). Our results show that RF consistently outperformed MSN and GNN distance techniques across different model scenarios and biomass variables. The lowest error rates were achieved by RF-based models with generalized root mean squared difference (gRMSD, RMSE divided by the standard deviation of the observed values) ranging from 0.74 to 1.24. Whereas gRMSD associated with MSN-based and GNN-based models ranged from 0.92 to 1.36 and from 1.04 to 1.42, respectively. The indirect imputation method generally achieved better biomass predictions than the direct imputation method. In particular, the kNN model trained with the combination of basal area and stem density variables was the most robust for estimating forest biomass. This model reported a gRMSD of 0.89, 0.95 and 1.08 for total AGB, AGB of live trees and AGB of dead trees, respectively. In addition, spatial predictions of biomass showed relatively consistent trends with disturbance severity and time since disturbance across the time-series. As the kNN imputation method is increasingly being used by land managers and researchers to map forest biomass, this work helps those using these methods ensure their modelling and mapping practices are optimized.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited