1. Introduction
Tropical forests have been receiving increasing attention from scientists in the past couple of decades due to their significant contribution to the global carbon cycle. Forests by sequestering and storing great quantities of carbon act as natural ‘brakes’ on global climate change [
1,
2]. The Amazon rainforest is notable as the largest continuous area of tropical forest covering approximately 400 million hectares. Given its size, the volumes of carbon dioxide that it can emit and sequester are significant; it stores one-fifth of the total carbon in global terrestrial vegetation and is the largest carbon reservoir in the form of biomass [
3,
4].
Forest stand development and mortality are subject to natural and anthropogenic disturbances that alter carbon fluxes over time [
5,
6]. Consequently, economic incentives such as REDD+ exist to alter fluxes in favour of sequestration in forests; and depend on reliable monitoring, reporting, and verification (MRV) protocols [
7]. Owing to the potential of tropical forests for sequestration, especially in comparison to other terrestrial ecosystems, an accurate estimate of the forest structure and biomass is necessary to better understand the global carbon cycle [
8,
9]. However, monitoring in tropical regions is a resource-intensive challenge resulting in infrequent and limited field surveys [
10]. Thus, there is a need for reliable LiDAR-based AGB models, an area that is still developing.
Selective logging has been an important activity affecting land use [
11] and modifying carbon fluxes within the Amazon, and can degrade the forest environment if logging exceeds the sustainable forest yield [
12]. Consequently, reduced impact logging (RIL) techniques are being introduced to permit sustainable resource use of the Amazonian forest. RIL involves intensive planning and monitoring techniques, such as mapping and tree inventories, to minimize negative environmental impacts. It has been shown that well-planned logging can allow close to full recovery of carbon stocks [
13,
14].
Forest management, REDD+ MRV, and carbon cycle modelling all rely upon accurate estimates of forest aboveground biomass (AGB) stocks and their changes over time [
15]. Previous studies have aimed at improving the accuracy of AGB estimation and forest inventory from LiDAR in regional and national level MRV systems such as those for REDD+ programs [
16]. Such studies seek to adhere to the common interpretation of the IPCC guidelines stating that the uncertainty of the AGB should not be greater than 20% of the mean [
17]. Airborne LiDAR data collection has become recognised in several forest ecosystems as the most reliable technique for estimating AGB [
17,
18,
19]. LiDAR can be used to observe and facilitate the study of biomass carbon change at multiple scales [
20], and to observe the impact of activities such as selective logging [
15,
21]. Research on the application of modelling methods for identifying these low-intensity logging practices using LiDAR is currently at an infant stage, though it is gaining momentum [
22].
AGB can be estimated from LiDAR-derived attributes using a variety of statistical modeling approaches ranging from linear regression techniques to the state-of-art non-parametric methods such as Random Forest (RF), k-Nearest neighbour (k-NN), and Support Vector Machine (SVM), each depending on the underlying assumptions and complexities [
16,
23,
24,
25]. Additionally, a recent study by Shao et al. [
26] on temperate hardwood forests highlighted the applicability of employing multiplicative nonlinear regression models for estimating AGB. In this case, the authors were able to leverage information on soil-based site productivity classes along with LiDAR-derived metrics to build an optimized model that could account for the variations in site productivity; including an index of site productivity which enhanced their model’s ability to explain the overall variability by 14%.
In recent years, there have been a number of studies focused on comparing the accuracy and precision of multiple machine learning approaches estimating biomass. For instance, Domingo et al. [
27] performed a comparison of multiple linear regression model (MLR) with four non-parametric models—namely SVM, RF, locally weighted linear regression (LWLR), and a linear model with a minimum length principle (MDL)—to estimate total biomass (tree and shrub biomass fractions) in
Pinus halepensis Miller forest stands using low-density LiDAR and field data. MLR was found to outperform other nonparametric methods in terms of RMSE (15.14 tons/ha) and bias (0.01) values, though no statistically significant differences existed between the methods considered. Similarly, Domingo et al. [
28], compared the performance of nine regression models in quantifying biomass losses and CO
2 emissions due to combustion in an Aleppo pine forest using LiDAR data. Here too, the best model for pre-fire AGB estimation was found to be MLR, and no significant statistical differences were observed among the high performing models. Latifi et al. [
29] on the other hand, made use of a wide range of forest variables extracted from multiple remotely sensed data, such as orthorectified colour infrared (CIR) images, medium-resolution Thematic Mapper (TM) imagery, and high-density normalized LiDAR point clouds, for estimating the total volume and biomass in a mixed temperate forest landscape. When comparing the performance of various plot-level nonparametric predictions, which comprised of three distance measures of Euclidean, Mahalanobis, and Most Similar Neighbour, as well as RF, and multiple remotely sensed datasets, the authors showed the superior predictive capability of LiDAR-based metrics and RF combination. Application of evolutionary genetic algorithms was also tested to prune the original high dimensional dataset and improve the performance of modeling techniques; however, intercorrelation related issues proved to be a major hurdle causing unstable results during multiple runs. Meanwhile, Gagliasso et al. [
30], on examination of the predictive performance of linear regression, geographic weighted regression (GWR), gradient nearest neighbor (GNN), most similar neighbor (MSN), random forest imputation, and k-nearest neighbor (k-nn), observed that the k-nn (k = 5) had the lowest RMSE and least amount of bias while predicting biomass across 19,000 acres on the Malheur National Forest. Notwithstanding the ever-increasing interest in modeling paradigms, comparative modeling studies for AGB change prediction in selectively logged tropical forests remains nominal.
Even though airborne LiDAR can facilitate spatially explicit and timely estimates of tropical forest structure, trade-offs still exist between modeling techniques and AGB stocks, and AGB change estimations. For instance, it is unclear how much the models can be simplified and still maintain an adequate level of accuracy for AGB stocks estimation, and through the differences between estimates, report its AGB change in tropical forests. Thus, in this study, we aimed to estimate AGB stock and report the changes at the plot and landscape levels using multi-temporal LiDAR data for a selectively logged tropical forest in Amazonia, Brazil. Specifically, we compared nine machine learning approaches to traditional linear regression with the following objectives included in the scope of this study:
- (i)
Evaluate the performance of ordinary least squares (OLS) regression modelling and nine machine learning algorithms: random forest (RF), several variations of k-nearest neighbour (k-NN), support vector machine (SVM), and artificial neural networks (ANN)
- (ii)
Estimate AGB stocks and report AGB change at the landscape level using the best model from the previous step and multi-temporal LiDAR datasets.
4. Discussion
Tracking change in AGB is vital for monitoring, reporting, and verification protocols (MRV) in support of REDD+. For accurate and satisfactory estimations, proper modelling techniques and data acquisition procedures are necessary. In our study, we developed maps of AGB stocks using multi-temporal LiDAR data and advanced modelling techniques that show the variation of AGB stocks over the years for logged forests in eastern Amazonia. Owing to the subtle and short-term changes occurring, logged forests are one of the hardest in which to detect changes. Results from our study highlight the robustness of our framework, the potential of multi-temporal LiDAR, and the importance of appropriate modelling techniques in support of climate change mitigation initiatives. By comparing AGB change between logged and intact forests, we gained insight into tropical forest resilience to disturbance. Specifically, our findings indicate that tropical forests have great potential for AGB recovery even after disturbances such as selective logging.
To predict biophysically important forest attributes such as basal area, mean stem diameter, and AGB, LiDAR measurements derived from point clouds can be used in empirical models [
57,
58,
59,
60]. In our study, the metrics selected to compose the models are corroborated by previous studies [
59,
61]. For instance, numerous AGB estimation studies [
62,
63,
64] had indicated the metric ‘mean canopy height’ to be one of the most significant attributes, and this is reflected by our PCA results. Likewise, metrics such as Standard Deviation and Coefficient of Variation of Height were found to provide information on the vertical complexity and heterogeneity of canopy components [
65]. In addition, our results support the findings of some previous studies that assessed the capacity of LiDAR data point-based metrics to describe forest biophysical parameters, using results obtained from point density [
61] and the Canopy Cover metric [
66,
67]. Nonetheless, it might be possible to estimate AGB with the help of several other ALS-derived metrics [
68,
69] as well as with more simplified model structures, while being able to attain similar levels of accuracy as reported in our study. This would be an interesting area to explore in the near future.
In studies aimed to estimate AGB stock and AGB change, the selection of the appropriate modelling approach is one of the most critical steps [
59]. We found, through the use of LOOCV, that OLS performed better than non-parametric approaches; a finding which has been reported in other studies comparing modelling methods in predicting various forest attributes [
70,
71,
72]. By comparing the performance of OLS with other methods, we further evaluated how much variation is happening with AGB estimation varies and what trade-offs may be associated with different methods while working with logged forests. Additionally, we demonstrate that methods such as RF and SVM that performed close to the OLS can be used to estimate and make inferences when necessary; that is, in situations where there exist non-linear or diverse relationships between dependent and independent variables [
73]. The performances of RF and SVM, as the best among non-parametric approaches, may have been affected not only by the number of field plots but also by other factors such as bootstrapping of data to avoid overfitting
R² values [
74].
In the case of
k-NN based methods, we noticed comparatively less satisfactory results, even after feature scaling. Hudak et al. [
46] compared different
k-NN imputations to simultaneously impute the basal area and plot density per species from topographic variables and LiDAR-derived canopy structure. They concluded that
k-NN was inferior to RF, reflecting our results. This can be tied to the fact that this algorithm uses the training data for classification rather than for learning and improving the model, and is very sensitive to noisy data, missing values, outliers, and dimensionality; additional difficulty rests in determining the value of parameter K on a case-by-case basis. We also found the computational cost to be quite high here as we had to calculate the distance of each instance under consideration to all the training samples. In general, apart from the dilemma with attribute selection—which might have contributed to the poor performance in our case—we had selected the same metrics for all the models built; another critical issue while employing k-NN is the uncertainty in choosing the appropriate kind of distance-based learning. In our study, we did include six different types of distances and were able to compare their performances. Given the low performances and minimal variations, in terms of RMSE and MD, between different distance-based learning methods, further research is encouraged before considering
k-NN based techniques for similar studies. As previous studies have recommended,
k-NN-based approaches can be more reliable when a design-based framework of forest inventory with non-parametric based estimators is involved, because this method accounts for dependence and heteroscedasticity in the data [
73,
75].
Asner et al. [
76] compared the accuracies of non-parametric AGB models integrating LiDAR and optical data in a forest in northwestern China and found that among non-parametric approaches, RF performed best, followed by Back Propagation Neural Networks and SVR. The results of Asner et al. [
76] show improvements in AGB estimates by integrating LiDAR and optical data and present a pattern similar to that of this study. Görgens et al. [
77] conducted a study with very similar results to our own, in which the authors found superior RF performance as compared with other machine learning approaches such as ANN and SVR. On the other hand, there have been studies conducted based on non-linear regression models as well. For instance, Shao et al. [
26], for estimating AGB, employed a multiplicative non-linear regression model that took into consideration both lidar-derived metrics as well soil-based site productivity class data. Herein, the authors were able to address a few critical issues associated with mixed forests, such as the overlooked differences in height-diameter relationships with respect to sites and species found within, resulting from varied site productivities and the similarity of the vertical height profiles with varied tree volume/density arising from the deliquescent growth form of hardwood trees; not to mention, these concerns are ubiquitous and extremely challenging in the realm of tropical forests. The authors reported the relationship between AGB and LiDAR-based metrics to be nonlinear in case of low productivity sites and predominantly linear on high productivity sites.
When making comparisons to other studies, we concluded that the results of our research, in terms of R² for AGB estimates, fall within the bounds of that which has been found in tropical forest areas [
18,
76]. Asner et al. [
76] in four tropical regions located in Madagascar, Peru, Panama, and Hawaii, reported R² varying between 0.68 to 0.85. A study in selectively logged tropical forest by d’Oliveira et al. [
78] found values of R² ranging from 0.63–0.72 for linear regression models, as the authors expected, owing to their restriction to a single allometric AGB equation exclusively based upon the diameter for all species, akin to our study. Englhart et al. [
16] emphasized multi-temporal LiDAR’s power in accurately quantifying tree height change and associated AGB, necessary for REDD+, even for very small areas/plots.
Regarding the mean AGB densities in Mg/ha, the values we found agree with findings from other studies conducted in tropical forests. Authors such as d’Oliveira et al. [
78] used airborne LiDAR data to estimate AGB and to identify regions impacted by selective logging across tropical forests in the western Brazilian Amazon, and the mean AGB they found was of 231.6 Mg/ha. Andersen et al. [
25] estimated the AGB for two years, 2010 and 2011, and obtained the mean values of 232.1 and 223.0 Mg/ha, respectively, and AGB change of −9.1 Mg/ha for the period evaluated. The mean change in AGB stocks observed in Andersen et al. [
25] is similar to the values found in this study. It should be noted that the higher estimated values in this study are justified owing to the analysis intervals also being larger. Furthermore, in our results, we noticed the largest decrease (−30.24 Mg/ha) in AGB stocks between 2014 and 2017 in a logged area (−10.0 Mg/ha per year); however, when analyzing the entire period (2012 to 2017), it shows a gain of approximately 8 Mg/ha. In other words, over the entire evaluated period, it was possible to verify the increase in biomass stocks and not the decrease as seen first, perhaps reflecting the balance between increased growth and increased mortality in the explored locations [
35]. For this reason, we believe that studies in this scope need longer assessment times as they may otherwise result in hasty conclusions. Rangel Pinagé et al. [
35] cited a series of studies on mortality after logging and also commented on the need for investigations with larger time spans and different logging intensities to determine the persistence of logging impacts on the canopy. In addition, we estimated slightly higher gains in forests logged before the first LiDAR acquisition (2012) than in intact forests (mean biomass gains of 20.0 Mg/ha in logged areas and for unlogged forests mean biomass gains of 7.0 Mg/ha, both for the period 2012–2014). Moreover, the logged areas and the unlogged areas do not differ widely in terms of values and appear to be gaining biomass at similar rates.
Based on our results, there are four areas where future studies could focus: scaling approaches, quantifying impacts of other phases of logging on AGB, and exploring other concomitant factors that affect carbon release, and the influence of site productivity variations and multiple tree species presence on AGB change. For instance, scaling up could be done through the use of full-waveform LiDAR, which can cover larger areas. Recent studies have shown that the results from discrete return and full-waveform LiDAR are of comparable accuracy [
79]. Since LiDAR surveys are expensive, another option is to scale up regional estimations of AGB and AGB change with satellite imagery [
16,
80]. Through merging data sources, it is possible to perform a classification approach versus regression approach as well—for example, by making use of modelling techniques such as random forest—for identifying and isolating logged areas and then for comparing their AGB estimation capabilities. It should be noted that the primary changes in AGB in this study were caused by felling trees; however, future studies should also focus on quantifying the impacts on AGB caused by other phases of selective logging; such as the construction of roads and log landings. Additionally, it could be intriguing to explore how other concomitant factors such as increased forest fires due to logging, damages from machinery related to logging, and forest degradation affect carbon. Lastly, we recommend more research to investigate the influence of different tree species being present and forest types in AGB change, as previous studies [
16] have reported that logged forests experience higher growth rates and accumulate more AGB than unaffected primary forests. On a similar theme, if we could stretch the data collection paradigms whenever possible to include direct or indirect measures of site productivity details, that would allow us to substantially improve the predictive capability of AGB models, as issues associated with lidar-height-based metrics can be kept minimal as discussed in Shao et al. [
26].