The Effect of Synergistic Approaches of Features and Ensemble Learning Algorith on Aboveground Biomass Estimation of Natural Secondary Forests Based on ALS and Landsat 8

Although the combination of Airborne Laser Scanning (ALS) data and optical imagery and machine learning algorithms were proved to improve the estimation of aboveground biomass (AGB), the synergistic approaches of different data and ensemble learning algorithms have not been fully investigated, especially for natural secondary forests (NSFs) with complex structures. This study aimed to explore the effects of the two factors on AGB estimation of NSFs based on ALS data and Landsat 8 imagery. The synergistic method of extracting novel features (i.e., COLI1 and COLI2) using optimal Landsat 8 features and the best-performing ALS feature (i.e., elevation mean) yielded higher accuracy of AGB estimation than either optical-only or ALS-only features. However, both of them failed to improve the accuracy compared to the simple combination of the untransformed features that generated them. The convolutional neural networks (CNN) model was much superior to other classic machine learning algorithms no matter of features. The stacked generalization (SG) algorithms, a kind of ensemble learning algorithms, greatly improved the accuracies compared to the corresponding base model, and the SG with the CNN meta-model performed best. This study provides technical support for a wall-to-wall AGB mapping of NSFs of northeastern China using efficient features and algorithms.


Introduction
The Asian temperate mixed forest in northeastern China is one of the three major temperate mixed forests in the world (i.e., northeastern North America, Europe, and East Asia) [1], which is of great strategic importance to the carbon trading of China. The forests of Northeast China have experienced three periods of excessive timber harvesting in the last century, including the period of Russian and Japanese aggression , the period of encouraging excessive harvesting for timber production , and the period of national economic reforms and the broadening of international relations   [2]. The excessive logging and neglected cultivation of forests nearly exhausted exploitable forest reserves in the region [3]. Since the Natural Forest Conservation Program (NFCP) was put into practice in 1998, there was a profound shift in focus from timber production to environmental protection by rehabilitating damaged forest ecosystems, afforesting desertified and degraded areas, and banning logging in natural forests [2]. In this context, natural secondary forests (NSFs) are gradually expanding and gaining importance. NSFs, which account for as much as 70% of the forests of northeastern China, refer to as the natural-regeneration forests after stand-replacing disturbances of primary forests by anthropogenic activities or by extreme natural events [4,5]. Nowadays, the NSFs the most widely used method in previous AGB studies due to their simplicity and interpretability (e.g., [53,64,65]). Other parametric models, like non-linear regression (e.g., an exponential, power, or polynomial fitting function), were also applied for AGB estimations (e.g., [59,66,67]). Unlike parametric models, nonparametric models are distribution-free methods in which the predictor does not take a predetermined form but is constructed according to information derived from the data. Most machine learning models belongs to non-parametric, such as artificial neural network (ANN), random forest (RF), k-nearest neighbor (KNN), support vector machine (SVM), cubist (CB), classification and regression tree (CART), convolutional neural networks (CNN) and so on. Without the assumption of distribution, the non-parametric machine learning models are extremely flexible and capable of capturing the complex relationships between remote sensing variables and AGB, and widely applied in AGB estimation (e.g., [43,[68][69][70][71][72][73]).
Ensemble learning, a branch of machine learning, is designed to learn tasks by constructing and then integrating multiple learners to produce a strong learner for improving accuracy [74,75]. There are three basic categories of ensemble learning: bagging, boosting, and stacking. RF and adaptive boosting (AdaBoost) algorithms are classic representatives of bagging and boosting algorithms, respectively. RF builds trees using subsamples and a random subset of predictors and can be very effective for estimating AGB due to its robustness to overfitting and noise in the training dataset [43,76,77]. Adaptive boosting is an iterative boosting algorithm that adaptively changes the distribution of the training set based on the performance of previous learners. Another boosting algorithm, called extreme gradient boosting (XGBoost), has been demonstrated to show great advantages in decreasing overestimation of low AGB values and underestimation of high AGB values for a forest type-based biomass estimation using continuous forest inventory data and Landsat 8 imagery [54]. Stacking, first proposed by Wolpert [78], is another method for combining multiple models but is less used than bagging and boosting. Unlike the RF algorithm that the base learner is homogeneous (e.g., regression tree), stacking are heterogeneous ensemble algorithms that could integrate diverse base learners to generate a stronger learner. The stacking algorithm was used to estimate canopy height in forestry (e.g., [79]), however, its potential has not been fully explored in AGB estimation.
Although the synergistic utilization of ALS and optical passive imagery was proved to improve AGB estimation [48], the synergistic approach (i.e., features) has not been fully investigated, especially for NSFs with complex structures. For example, is it more efficient to apply a novel feature extracted from passive imagery and LiDAR data (e.g., COLI1 and COLI2 in [48]) or directly combine all the features from the two data sources (like [61])? In addition, will ensemble learning algorithms improve the accuracy of AGB estimation for NSFs? Inspired by these questions, this study aimed at exploring the effects of different synergistic approaches of features and ensemble learning algorithms on AGB estimation of NSFs of northeastern China based on ALS and Landsat 8 OLI (Operational Land Imager) imagery. Specifically, the objectives of this study were (1) to investigate the effects of different data sources and classic machine learning algorithms on AGB estimation of a natural secondary forest; (2) to grope for a highly effective approach to combine ALS and Landsat 8 OLI imagery on AGB estimation of a natural secondary forest; (3) to explore the performances of ensemble learning algorithms in estimating AGB of a natural secondary forest; (4) to generate an accurate wall-to-wall AGB map of a natural secondary forest for future forest resources management.

Study Area
The study area is located in Maoershan Experimental Forest Farm of Northeast Forestry University (NEFU), Shangzhi, Heilongjiang Province, China, ranging from 127 • 29 to 127 • 44 E and 45 • 14 to 45 • 29 ( Figure 1). The landform of the forest farm belongs to a low mountain and hilly area. The terrain gradually rises from south to north, with an average elevation of 300 m. The highest mountain is Maoer Mountain, with an elevation of 805 m. The total area of the forest farm is 26,496 ha, which belongs to a typical natural secondary forest in northeastern China. The vegetation in the Maoershan area is a part of Changbai plant flora, with the original zonal top-level community of Korean pine broad-leaved forest. Due to the destruction in the last century, the original vegetation has undergone reverse succession. It has formed a forest landscape in which natural secondary forests are dominated by precious broad-leaved forests, poplar and birch forests, oak forests, and so on, and plantations such as red pine and larch are inlaid. The main species include Betula platyphylla, Quercus mongolica, Populus davidiana, Larix olgensis, Pinus sylvestris, and Pinus koraiensis, etc. The average forest coverage rate is 95%, and the total stock is approximately 3.5 million m 3 .

Remotely Sensed Data
The remotely sensed data utilized in this study include ALS data and Landsat 8 OLI imagery. ALS data were obtained in September 2015. It is a secondary product scanned by the LiDAR sensor (Riegl LMS-Q680i) carried by the LiCHY system of the Chinese Academy of Forestry. The maximum frequency of the laser pulse of the LiDAR sensor is 400 kHz, with a wavelength of 1550 nm, a scanning angle of ±30 • , a sampling interval of 1 ns, and vertical accuracy of 0.15 m. The sidelap of this flight strip was designed to be greater than 60%, with an average point cloud density of 3.6 points·m −2 .
To be consistent with ALS data in time, the Landsat 8 OLI imagery acquired on 13 September 2015 was applied in this study (downloaded from https://earthexplorer.usgs. gov/ (accessed on 1 September 2021)). The scene ID is LC81170282015256LGN01 (L1T-level product), with cloudiness of 1.35%, sun elevation angle of 45.28 • , and sun azimuth angle of 154.91 • . Seven multispectral bands (band1-band7) of 30 m nominal spatial resolution were utilized in this study. The radiometric resolution of the imagery is 12 bits and the swath width is 185 km × 185 km.

Reference Data
The 195 fixed plots data of continuous forest resources inventory obtained in 2016 was applied as reference data in this study (see Figure 1b). The plot size was 20 m × 30 m and the center of each plot was correctly determined using a GPS (accuracy ±5 m). The diameter at breast height (DBH) of the trees larger than 5 cm and the tree species of each plot were recorded.
The AGB of individual trees was calculated using the species-specific allometric growth equations with DBH. In this study, the allometric growth models developed by [80,81] for the major species of trees and understory in northeastern China were employed to calculate the AGB of individual trees. The allometric growth equation was showed as Equation (1) and the parameters of major species of trees and understory were listed in Table 1.
where W represents aboveground biomass (kg), D represents DBH (cm), a and b are estimated parameters of different species in [80,81]. The AGB of the plot was the cumulative summation of the AGB of individual trees of each plot.

Methods
To investigate the effects of different synergistic approaches of features and ensemble learning algorithms on AGB estimation of NSFs, a five-step methodology with three experiments of features (Feature experiments I-III) was implemented in this study, including (1) data preprocessing, (2) feature extraction and selection, (3) establishment and evaluation of classic machine learning models, (4) establishment and evaluation of ensemble learning models, (5) wall-to-wall AGB prediction using the most effective algorithm and features. Feature experiment I was designed to explore the effects of features from different data sources (ALS, optical imagery, and combined data) on AGB estimation based on a variety of machine learning algorithms; Feature experiment II was designed to investigate how to efficiently combine the best-performing ALS feature (a unique feature) with several spectral features for AGB estimation, is it better to use novel extracted features or directly combine all the features?; Feature experiment III aims to compare the performance of combining all features for AGB estimation. The feature experiment design and logic of this study were shown in Table 2 and Figure 2, respectively.

Preprocessing of Remotely Sensed Data
The preprocessing of the ALS data includes (1) noise elimination (such as air points, low points, and isolated points). The radius of a fitting plane and the multiples of standard deviation were set to 0.5 m and 1, respectively. The algorithm will automatically calculate the standard deviation of the surrounding fitting plane of a point. If the distance from this point to that plane is less than multiples of standard deviation, this point will be kept.
(2) classification of ground and non-ground points. The ground points were classified by improved progressive triangulated irregular network densification (IPTD) filtering algorithm developed in [82]. The maximum building size and maximum terrain angle were set to 20 m and 88 • , respectively. (3) normalization of point clouds. A digital terrain model (DTM) with a resolution of 0.5m was generated based on ground points using the inverse distance weighted (IDW) interpolation method. The power of the distance between sampling points and an unknown point was set to 2, and the smallest number of points used for interpolation was 12. Then, the point clouds were normalized by subtracting the DTM value from the elevation of all points. The preprocessing of the ALS data was implemented using LiDAR 360 V3.2 of GreenValley International.
Preprocessing of the Landsat 8 OLI imagery including radiometric calibration, atmospheric correction, and topographic correction was implemented using ENVI 5.3 software. The Fast Line-of-sight Atmospheric Analysis of Spectral Hypercube (FLAASH) radiative transfer model was implemented for atmospheric correction and conversion to surface reflectance in the EVNI environment. The topographic correction was conducted with the well-known Sun Canopy Sensor + C correction (SCS + C) approach using the extension tool of "Topographic Correction_V5.3_4_S1". The SCS + C correction approach reduces overcorrection and is an effective topographic correction method in forested and mountainous terrain [83,84]. The SCS + C topographic correction model can be expressed by Equation (2).
where L t is the corrected pixel radiance value of the image; L is the uncorrected pixel radiance value of the image; i is the incidence angles on a horizontal surface; θ is the solar zenith angle; α is the slope angle; C is the semi-empirical parameter. DTM generated from ALS data was applied for topographic correction in this study.

Feature Extraction and Selection
• Feature Extraction Four categories of 101 features related to forest, height, density, and intensity features were derived from normalized ALS point cloud data. Forest features include canopy cover, leaf area index (LAI), and gap fraction. Canopy cover refers to the proportion of the forest floor covered by the vertical projection of the tree crowns [85]. LAI is one of the most significant variables for representing canopy structure, with the definition of half the total foliage area per unit ground surface area [86]. The gap fraction can be calculated by the ratio of the number of ground points whose elevation is lower than the height threshold (i.e., 2 m in this study) and the total return number. All 101 ALS features, including three forest metrics, 46 elevation metrics, 10 density metrics, and 42 intensity metrics were extracted using LiDAR 360 V3.2 of GreenValley International. The feature details were listed in Table A1 of Appendix A.
A variety of features could be derived from optical imagery. According to previous studies (e.g., [48,54,73,87]), band combinations, vegetation indices, textures (e.g., gray-level co-occurrence matrix (GLCM)) of each band, and image transformations (e.g., principal component analysis, tasseled cap, minimum noise fraction) were extracted as potential predictors for AGB modeling. Therefore, 98 features were selected or extracted from Landsat 8 imagery in this study, including seven original bands (band 1-7), ten band combinations, ten image enhancement features (i.e., three principal components, three tasseled-cap features, and four minimum noise fractions), 56 GLCM features, and 15 vegetation indices. The details of the 98 features derived from Landsat 8 were listed in Table A2 of Appendix A.

Feature Selection
To avoid the "curse of dimensionality", it is a prerequisite to select the most effective feature for AGB estimation. In this study, the two-step feature selection procedure is implemented, including (1) preliminary selection using Pearson correlation coefficient; and (2) further selection based on variable importance measure using random forest. For the first step, Pearson correlation coefficients of each feature and AGB were calculated and the features with p-value less than 0.05 that significantly correlated with AGB were selected. Then, the selected features were ranked according to variable important measures calculated with random forest. Due to the randomness, the ranking procedure was implemented 10 times to find out the most stable set of features with high ranking.
The two-step feature selection was implemented for ALS and Landsat 8 data, respectively, to select two sets of best-performing features. Among the selected ALS features, the best-performing ALS variable was determined by establishing and evaluating the univariate models of each ALS feature and AGB. The feature selection procedure was implemented using R version 4.0.4 (https://www.r-project.org/ (accessed on 1 September 2021)).
According to [48], two types of indices (COLI1 and COLI2) incorporating optical imagery and ALS information were established using the best-performing LiDAR variable with each optical spectral vegetation index. The best-performing LiDAR variable was determined by the univariate model of AGB and the LiDAR variable with the highest R 2 . The best-performing spectral features of Landsat 8 were selected by the two-step feature selection procedure described above. Then, the generation of COLI1 and COLI2 based on the best-performing LiDAR variable (only one feature) and the best-performing Landsat 8 features (could be several features) included both feature selection and extraction procedures. For convenience, we still used the notation of [48] but adjusted the equations as follows.
where BLV is the best-performing LiDAR variable (only one feature), SF i is a set of bestperforming features derived from Landsat 8 imagery (several features). Thus, the number of COLI1 or COLI2 is identical to the number of best-performing spectral features (SF i ).

Classic Machine Learning Algorithms
In this study, seven classic machine learning algorithms were conducted to estimate the AGB of NSFs, including extreme learning machine (ELM), backpropagation (BP) neural network, regression tree (RegT), RF, support vector regression (SVR), KNN, and CNN. Traditional multiple linear regression (MLR) was applied as a baseline for model comparison.
• ELM ELM is a class of machine learning methods built on the feedforward neuron network (FNN) for supervised and unsupervised learning problems [88]. ELM is an improvement of FNN and its backpropagation algorithm, which is characterized by random or artificially given weights of the nodes in the hidden layer and does not need to be updated. Compared to single-layer perceptron and SVM, ELM is considered to have possible advantages in terms of learning rate and generalization ability [88].
• BP BP neural network, proposed by Rumelhart et al. in 1986 [89], is a multilayer feedforward network trained by error backpropagation algorithm and is one of the most widely used neural network models [90]. Its learning rule is to use the fastest descent method to continuously adjust the weights and thresholds of the network by backpropagation to minimize the sum of squared errors of the network. According to error and trials, the BP algorithm was implemented with epochs of 1000 in this study.

RegT and RF
A regression tree is a basic method built on the principle of minimizing the loss function for a regression problem. The major advantage of the regression tree is the readability of the model and fast computational speed, which make it particularly suitable for integrated learning, such as random forests. RF, proposed by Leo Breiman [76], is based on multiple regression trees, which is capable of capturing the complicated relationship between a response and a set of explanatory variables with the following advantages: robustness to reduce over-fitting, ability to determine variable importance, higher accuracy, fewer parameters that need to be tuned, lower sensitivity to the tuning of the parameters, fast training speed, and anti-noise property. The number of regression trees and the random state of the RF algorithm were set to 1000 and 10, respectively, in this study.
• SVR SVM is a class of generalized linear algorithms that performs the classification of data in a supervised learning manner, where the decision boundary is the hyperplane of maximum margins solved for the learned samples. SVR is a transformation of SVM designed for regression problems and can perform nonlinear problems by kernel method. Linear kernel and penalty factor of 1 were applied for SVR in this study.

• KNN
The KNN method is a multivariate nonparametric algorithm that uses a set of predictors (Xs) to match each target pixel to a number (K) of most similar (nearest neighbors) reference pixels for which values of response variables (Y) are known. The number of nearest neighbors was set to 5 and uniform weights were utilized in this study.
• CNN CNN, firstly developed in 1995 for the classification of handwritten images [91], is one of the most representative algorithms of deep learning. CNN interprets spatial data by scanning it using a series of trainable moving windows and has the capability of representation learning in a translation-invariant manner according to its hierarchical structure. In this study, the CNN model had a simple structure with an input layer, a hidden layer, and an output layer, and was implemented using an epoch of 1000 and a batch size of 30.

Ensemble Learning Algorithms
Stacked generalization (SG) which is a layered ensemble learning algorithm [92] was applied in this study. There are two layers designed in the SG algorithm here, including basic models and meta models. The input of the base model is the original training set and the output of the base model is applied as the training set for meta model [93]. The meta model could be a single model or an ensemble model [93,94], like RF. To obtain a better performance of SG, the base models should be accurate and different as much as possible. Thus, the four best-performing machine learning algorithms described in Section 2.3.3 were selected for the base models according to leave-one-out cross-validation and meta models for establishing SG algorithms in this study, which resulted in four SG algorithms. The flowchart of the SG algorithm in this study was presented in Figure 3.

Model Evaluation
This study adopted a leave-one-out cross-validation method to evaluate the model accuracy. Since 195 sample plots were used in this study, the training and testing data were 194 plots and 1 plot, respectively; and 195 iterations were run for each model. Due to the problems of coefficient of linear determination (R 2 ) for nonlinear models [95], we avoid applying R 2 of machine learning models established by selected features and AGB. However, R 2 of actual and predicted AGB could be used as an indicator since the relationship of actual and predicted AGB can be described by a simple linear model. Therefore, six indices were applied for model evaluation, including R 2 of actual and predicted AGB, root mean squared error (RMSE), relative root mean squared error (rRMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and precision measure (PM). The equations were shown as follows: where n represents the number of observation samples, y i represents the actual AGB of the ith plot,ŷ i represents the predicted AGB of the ith plot, and y represents the mean of the actual AGB. All the model fitting and evaluation procedures in this study were implemented by python 3.7 (https://www.python.org/downloads/ (accessed on 1 September 2021)), TensorFlow 2.2 (https://tensorflow.google.cn/ (accessed on 1 September 2021)) and sklearn (https://scikit-learn.org/stable/ (accessed on 1 September 2021)).

Feature Selection
Due to a large number of extracted features (199 features in total), two-step feature selection was implemented in this study, including preliminary selection using Pearson correlation coefficient; and further selection based on variable importance measure using random forest. Finally, nine ALS features were selected and sorted from highest to lowest variable importance as follows: elev_mean, int_AII_5th, elev_cv, density_7th, int_max, int_AII_40th, int_per_60th, int_per_80th, and int_AII_50th; nine features extracted from Landsat 8 were selected and ranked in descending order of variable importance: MVI5, B1, B76, B65, B53, Entr_B5, B2, ND563, and MVI7. The selected features and their descriptions were listed in Table 3. To grope for the best-performing ALS feature, simple linear regressions were established to model the relationship between AGB and each ALS feature. The result of univariate models showed that the elevation mean outperformed other ALS features due to higher R 2 and lower RMSE, rRMSE, MAE, MAPE, and PM (Table 4). Thus, elevation mean was selected as the best-performing ALS feature to generate COLI1 and COLI2 using Equations (3) and (4). The goal of feature experiment I was to explore the effects of features from different data sources (optical imagery, ALS, and combined data) on AGB estimation based on seven classic machine learning algorithms, including ELM, BP, RegT, RF, SVR, KNN, and CNN. MLR was implemented as a baseline for model comparison. Table 5 shows the performance of the eight models using the three sets of features designed in Experiment I. In general, the optimal ALS features (Feature 1) performed significantly better than the optimal Landsat 8 features (Feature 2) for AGB estimation, no matter of algorithms; the combination of the optimal ALS and Landsat 8 features (Feature 1 + 2) performed differently for various algorithms. For each data source, the accuracy of CNN was greatly higher than that of other algorithms, especially for applying both ALS and Landsat 8 features (R 2 = 0.97, RMSE = 12.6, rRMSE = 0.08, MAE = 6.43, MAPE = 4.02, PM = 0.13). However, it is worth mentioning that the accuracies of other algorithms (except CNN) based on two data sources (Feature 1 + 2) were not significantly improved compared with those based on optimal ALS features (Feature 1), which suggested that the accuracy of AGB estimation not only depends on data sources but also different algorithms. Some algorithms (like RF and SVR) could provide very similar accuracy using both optimal ALS and Landsat 8 features to that using only optimal ALS features, making it meaningless to involve optical imagery. Thus, ALS data are of significance to AGB estimation.

Experiment II
After determining the best-performing ALS feature (i.e., elevation mean), we designed feature experiment II to investigate how to efficiently combine the unique feature with the optimal Landsat 8 features for AGB estimation. Is it better to utilize a novel feature extracted from elevation mean and optimal Landsat 8 features (i.e., COLI1 and COLI2) or directly combine all the features? A similar feature size in experiment II (i.e., 9 or 10) could avoid the unfair comparison due to the big difference in feature number. Table 6 presented the accuracy assessment of classic machine learning algorithms with three sets of features designed in experiment II. The results showed that the addition of elevation mean significantly improves the accuracies of AGB estimation compared to those using optical features only (Feature 2), no matter how to add it. The models except CNN had very similar performances in AGB estimation for the three feature combinations in experiment II. CNN still showed great advantages like Experiment I, especially for the case of simply combining the optimal Landsat 8 features and elevation mean together (Feature 2 + 3) with the accuracy of R 2 = 0.88, RMSE = 24.48, rRMSE = 0.16, MAE = 10.19, MAPE = 7.23, and PM = 0.24, followed by the case of all COLI2 (Feature 5), and then the case of all COLI1 (Feature 4). Thus, it seemed unnecessary to generate the new features (i.e., COLI1 or COLI2) when CNN was applied for AGB estimation based on the optimal Landsat 8 features and the best-performing ALS feature for NSFs.

Experiment III
To investigate the effect of combing optimal ALS and Landsat 8 features and two types of novel features (COLI1 or COLI2) using classic machine learning algorithms, experiment III was implemented (Table 7). Comparing to the result of applying optimal ALS and Landsat 8 features (Feature 1 + 2) in Table 5, the additions of the novel features, no matter COLI1 or COLI2, slightly improved the accuracies of most models, like MLR, BP, RegT, RF, and KNN. In addition, the accuracies of all models except RF using optimal ALS and Landsat 8 features and all COLI2 (Feature 1 + 2 + 5) were slightly improved compared to those using optimal ALS and Landsat 8 features and all COLI1 (Feature 1 + 2 + 4), indicating COLI2 were more efficient than COLI1 for AGB estimation of NSFs. CNN was still much superior to other algorithms and reached the highest accuracies (R 2 = 0.99, RMSE = 6.85, rRMSE = 0.04, MAE = 2.95, MAPE = 1.02, PM = 0.03) when optimal ALS and Landsat 8 features and all COLI2 (Feature 1 + 2 + 5) was applied. To explore the performances of ensemble learning algorithms in estimating AGB based on different feature combinations, Experiment I, II, and III were also implemented using the designed SG algorithms. According to the results of classic machine learning algorithms (Tables 5-7), four best-performing models, that is, RF, SVR, KNN, and CNN, were selected as base models for the SG algorithm. The predictions of base models were applied as the input of the meta model of the SG algorithms, which were also RF, SVR, KNN, and CNN. Thus, there were four SG algorithms due to four meta models, including SG(RF), SG(SVR), SG(KNN), and SG(CNN). Table 8 presented the accuracy assessment of ensemble learning algorithms with three sets of features designed in experiment I. Comparing to the results of base models (Table 5), the SG algorithms greatly improved the accuracy of AGB estimation using the optimal Landsat 8 features (Feature 2) and the combined optimal features (Feature 1 + 2). However, for the case of optimal ALS features (Feature 1), the SG algorithms had slightly lower accuracies than those of base models, except CNN. In general, CNN still performed best as a meta model of SG algorithm, followed by SG algorithm with SVR meta model, and finally with RF meta model as well as KNN model. Although CNN was still an outstanding meta model for all the cases, it was worth noting that the drastic improvements of accuracies brought by SG(SVR), SG(RF), and SG(KNN) compared with their corresponding base model, especially for the Feature 2 and Feature 1 + 2. For example, R 2 of SG(SVR), SG(RF), and SG(KNN) increased approximately 30%-40% and 60%-70% for Feature 2 and Feature 1 + 2, respectively; alternatively, R 2 of SG(CNN) only increased 49% and 0% for Feature 2 and Feature 1 + 2, respectively. Other indices (RMSE, rRMSE, MAE, MAPE, and PM) had similar trends, but in the opposite direction. Thus, it had more room for improvement to apply the SG algorithms for relatively weaker learners (like SVR, RF, and KNN) than strong deep learning learners (like CNN).

Experiment II
Feature experiment II was also implemented to investigate how to integrate elevation mean and the optimal Landsat 8 features for AGB estimation based on ensemble learning algorithms (Table 9). It showed that the SG algorithms greatly improved the accuracies for all the cases except the SG(CNN) for Feature 5 and Feature 2 + 3, comparing to the accuracies using the corresponding base model (Table 6). When SG algorithms were utilized, the trend that the simple combination of optimal Landsat 8 features and elevation mean (Feature 2 + 3) performed best, followed by all COLI2 (Feature 5), and finally all COLI1 (Feature 4) was much more obvious than that using classic machine learning algorithms (Table 6 vs. Table 9). The advantage of applying deep learning algorithm CNN as meta model decreased with the dramatic increase in the accuracies of the other three algorithms (i.e., RF, SVR, and KNN), especially for Feature 5 and Feature 2 + 3. In other words, when the feature set of all COLI2 or the feature set of optimal Landsat 8 features and elevation mean was applied for AGB estimation, SG(RF), SG(SVR), and SG(KNN) had comparable accuracies to SG(CNN).

Experiment III
The effect of combing optimal ALS and Landsat 8 features and two types of novel features (COLI1 or COLI2) on AGB estimation using ensemble algorithms was investigated with experiment III (Table 10). Unlike classic machine learning algorithms, the addition of COLI1 in ensemble algorithms did not improve the accuracies of AGB estimation, compared to the result of applying optimal ALS and Landsat 8 features (Feature 1 + 2) in Table 8. The SG(SVR) or SG(KNN) with the addition of COLI1 even lower R 2 by about 10%-20% than SG(SVR) or SG(KNN) with only Feature 1 + 2 (Table 8). However, the addition of COLI2 in ensemble algorithms slightly increased the accuracies of most models except SG(KNN), even though SG algorithms with Feature 1 + 2 had already performed well (Table 8). In general, the SG algorithms with optimal ALS and Landsat 8 features and all COLI2 (Feature 1 + 2 + 5) had more stable accuracies than that with optimal ALS and Landsat 8 features and all COLI1 (Feature 1 + 2 + 4), no matter which meta model was used, indicating COLI2 were more efficient than COLI1 for AGB estimation of NSFs. It is still the SG model with CNN meta model that has the highest accuracy (R 2 = 0.99, RMSE = 2.02, rRMSE = 0.01, MAE = 0.87, MAPE = 0.73, PM = 0.02) when optimal ALS and Landsat 8 features and all COLI2 (Feature 1 + 2 + 5) was applied. In addition, the ensemble algorithms greatly improved the accuracies of the corresponding features and base model (Table 10 vs. Table 7). For example, if the combination of optimal ALS and Landsat 8 features and all COLI1 (Feature 1 + 2 + 4) was utilized, the R 2 of SG(RF) increased more than 60% compared with that of the RF model; RMSE, rRMSE, MAE, MAPE and PM of SG(RF) decreased by 75%, 73%, 71%, 76%, and 81%, respectively, compared with those of the RF model. Although the CNN base model had already achieved high accuracy, especially when applying the combination of optimal ALS and Landsat 8 features and all COLI2 (Feature 1 + 2 + 5 in Table 7), the SG(CNN) still decreased the group of RMSE, rRMSE, and MAE and the group of MAPE and PM by about 70% and 30%, respectively.

Wall-to-Wall AGB Predictions
Based on the above results and algorithm efficiency, CNN and the feature set of optimal ALS and Landsat 8 and all COLI2 (Feature 1 + 2 + 5) were selected for a wall-towall AGB prediction of the entire Maorshan Experimental Forest Farm of NEFU (Figure 4). The predicted AGB varied from 0 to 491.04 Mg/ha, with a mean value of 59.9 Mg/ha and a standard deviation of 48.69 Mg/ha. The area with AGB of 0 or low values was located along rivers, roads, or residential regions, whereas the area with high AGB values was located in the center part (e.g., Zhonglin, Yuejin, Beiling, Donglin, and Xinken working districts) of Maorshan (Figure 4a). However, the embedded pattern of high and low AGB values was obvious for most of the study area, as the enlarged area in Zhonglin working district (Figure 4b).  Figure 5 showed the relationship of actual and estimated AGB (Mg/ha) of 195 plots using the CNN algorithm based on different feature sets. For experiment I, it was better to apply ALS than Landsat 8 to predict AGB if only one data source had to be used, which indicated the vertical forest structure was more vital than spectral information for AGB estimation of NSFs. The synergism of optical imagery and ALS markedly increased the accuracy of a single data source (Figure 5c vs. Figure 5a or Figure 5b) since it could effectively alleviate the underestimation of high AGB values. Even only one ALS feature (i.e., elevation mean) was added to the Landsat 8 features (Experiment II), the improvement was obvious and significant. However, it was unnecessary to generate novel features like COLI1 or COLI2 using the optimal Landsat 8 and elevation mean. It was in evidence that the performance of directly combining them was much better than that of new features (Figure 5f vs. Figure 5d) or Figure 5e), but worse than that of all optimal ALS and Landsat 8 features (Figure 5f vs. Figure 5c) due to the smaller number of features (i.e., 10 vs. 18). The effectiveness of COLI1 was very limited because Feature 1 + 2 provided a comparable result to Feature 1 + 2 + 4 (Figure 5c vs. Figure 5g). It is the most efficient to combine all optimal ALS, Landsat 8, and COLI2 features, especially for estimating high AGB values (Figure 5h).

AGB Estimation Using Different Features
The differences in features are responses to the characteristics of different data sources. In this study, we extracted a variety of features and investigated the effects of different synergistic approaches of features derived from ALS and Landsat 8 OLI imagery on AGB estimation of NSFs of northeastern China. For ALS data, besides elevation features, density-(e.g., density_metrics7) and intensity-related (e.g., int_AII_5th, int_max, int_AII_40th, int_per_60th, int_per_80th, and int_AII_50th) metrics also had great potentials in AGB estimation; for Landsat 8 imagery, band combinations and texture are more efficient than vegetation indices, especially MVI5 (i.e., the band combination of band 5, 4 and 2). Unfortunately, some traditional vegetation indices that commonly applied in previous studies [48], for example, the normalized difference vegetation index (NDVI), enhanced vegetation index (EVI), atmospherically resistant vegetation index (ARVI), soil adjusted vegetation index (SAVI), etc., were excluded due to the low correlations with AGB. Only one vegetation index (i.e., ND563) was selected. It might be because that study area is a natural secondary forest with high canopy density which could easily result in the saturation (insensitivity to AGB) of the traditional vegetation indices, which was also confirmed in [96,97]. The low accuracies (e.g., R 2 < 0.3) of AGB estimations using the optimal Landsat 8 features (Feature 2), no matter of algorithms, indicating the difficulties of AGB estimation of NSFs as well. Due to the vegetation characteristics, near-infrared and shortwave infrared bands (i.e., band 5, 6, and 7) were more related to AGB estimation than other bands.
Similar to previous studies [48,52,98], it was beneficial to combine ALS data and optical imagery, even only combining one significant feature derived from ALS (like elevation mean). The synergistic method of extracting novel features (i.e., COLI1 and COLI2) using optimal Landsat 8 features and the best-performing ALS feature (i.e., elevation mean) yielded higher accuracy of AGB estimation than either optical-only or ALS-only features when the same model was implemented. From experiment II and III, it showed that COLI2 had more advantages than COLI1 in AGB estimations of NSFs, which is different from [48] due to different forest types (NSFs of northeastern China vs. mixed forests of southern China). However, it is surprised to find out that the novel extracted features (COLI1 and COLI2) were not efficient in improving the accuracy compared to the simple combination of the untransformed features (optimal Landsat 8 features + BLV), which indicated the great convenience and effectiveness brought by just adding the best-performing ALS feature (i.e., elevation mean) to the original set of Landsat 8 features for AGB estimation of NSFs. The number of features was also a vital factor to influence the AGB accuracy. To make sure a fair comparison of synergistic approaches of features, we keep the number of features consistent as much as possible within each experiment. It is a trend that the accuracy of AGB estimation raises with the increase in the number of involved features under the same conditions (e.g., algorithms). Thus, it was not surprising that the combination with 27 features (i.e., Feature 1 + 2 + 4 or Feature 1 + 2 + 5) in experiment III provided the best performances in this study, from a feature size perspective.

AGB Estimation Using Machine Learning Algorithms
The effect of classic machine learning and ensemble learning algorithms on AGB estimation using different features was explored in this study. The RF algorithm that is one of the most commonly used algorithms in forestry only provided very modest accuracy in this study since it constantly overfits the data, often with poorer predictions [33]. CNN, a deep learning algorithm firstly developed in 1995 for the classification of handwritten images [91], showed absolute advantages compared with other classic algorithms (e.g., ELM, BP, RF, KNN, SVR, etc.). As a representative of deep learning algorithms that is a branch of machine learning, a large and deep CNN (consisting of many-layered convolutions) was further developed in 2012 and achieved a winning top-5 test error rate of 15.3% in the ImageNet ILSVRC-2012 competition [99]. In recent years, the CNN model has been increasingly applied in forestry, for example, for the prediction of forest inventory parameters and identification of different tree species [100,101]. CNN interprets spatial data by scanning it using a series of trainable moving windows and sufficiently complex artificial neural networks and does not require human-derived feature selection in essence [100]. However, to make sure a fair comparison of different models, we keep the feature selection procedure consistent for all models. It means that the CNN model was applied for two-dimensional data of AGB and a set of human-derived features instead of a three-dimensional image. Although the CNN model lost the advantage of automatically extracting and selecting features, it is still sensitive to changes in features and significantly superior to other models (e.g., ELM, BP, RF, KNN, SVR, etc.).
The SG algorithms, a kind of ensemble learning algorithms, applied heterogeneous ensemble methods with different base models and greatly improved the AGB estimation accuracy in this study. RF, KNN, SVR, and CNN were selected as base models since SG algorithms could take advantage of the good and stable predictions from base models.
The good prediction of the CNN base model successfully made the accuracy of the SG algorithms improved and stable no matter of meta-models, which indicated that SG has a stronger generalization ability than base models. In other words, it is more beneficial for weaker learners (e.g., RF, KNN, and SVR) to become stronger learners using SG algorithms than strong learners (e.g., CNN).
However, although the SG algorithm is superior to its corresponding base model, we still recommend employing the CNN model for AGB estimation in practice due to its comparable accuracy and good efficiency. Table 11 summarized the efficiency (i.e., runtime) of all the algorithms with the combination of the optimal ALS and Landsat 8 features, and all COLI2 (Feature 1 + 2 + 5) for AGB estimation of 195 plots on a computer with AMD RX3700x + 16GB + GTX960 4GB. It showed that the runtime of ensemble algorithms (i.e., SG(RF), SG(KNN), SG(SVR), SG(CNN)) was dramatically augmented compared with their corresponding base model (i.e., RF, KNN, SVR, CNN). For example, the efficiency of SG(CNN) is only half of that of the CNN model. Other SG algorithms (i.e., SG(RF), SG(KNN), SG(SVR)) raised the runtime of the corresponding algorithm (i.e., RF, KNN, SVR) even more. The CNN model had the longest runtime but yield the highest accuracy (see Tables 5-7) among classic machine learning algorithms due to the most complex structure. Thus, to balance the workload and accuracy, the wall-to-wall AGB prediction map was generated using the CNN model with the combination of the optimal ALS and Landsat 8 features, and all COLI2 (Feature 1 + 2 + 5) in this study.

Comparison of Estimated Forest AGB and Current Publications
From the AGB accuracy perspective, the highest accuracy (R 2 = 0.99, RMSE = 2.02, rRMSE = 0.01, MAE = 0.87, MAPE = 0.73, PM = 0.02) was yielded by SG(CNN) algorithm with the combination of the optimal ALS and Landsat 8 features and all COLI2 (Feature 1 + 2 + 5) in this study, which was better than other similar AGB studies that applied both LiDAR and optical imagery (e.g., [48,61,69,98,102]). Besides features and algorithms, the high accuracy of this study also benefited from the case of a local study with a relatively small area. It tends to decrease the accuracy for national and global scales. For example, Su et al. [69] provided the R 2 of 0.75 and the RMSE of 42.39 Mg/ha for the AGB estimation of China based on ICESat GLAS laser altimetry data, MODIS, and forest inventory data. Yang et al. [103] produced a global forest AGB map with the R 2 of 0.90 and the RMSE of 35.87 Mg/ha using gradient augmented regression trees algorithm based on multiple data sources (e.g., LiDAR-derived forest AGB datasets, field measurements, high-level products from optical satellite imagery, etc.).
Further, we dig into the predicted AGB values of the wall-to-wall map of the entire Maorshan and compared the distributions of AGB values of the wall-to-wall prediction map and 195 sample plots ( Figure 6). Although the spatial distribution of AGB values of the wall-to-wall prediction map seemed to be reasonable (Figure 4), it showed that there was still a big difference between the two distributions, especially for the ranges of 0-50 Mg/ha and >200 Mg/ha (Figure 6), indicating the underestimation of high AGB values and overestimation of low AGB values. It suggested that the data saturation in Landsat imagery was not fully eliminated in this study of natural secondary forests. For Heilongjiang province, the average forest AGB density estimated by [69,104] was 81 Mg/ha and 85 Mg/ha, respectively (using a ratio of 50% for the conversion from forest AGB to AGB carbon stock); for the entire northeastern China, the average forest AGB density estimated by [57,105] was 83.50 Mg/ha and 89.30 Mg/ha, respectively. All these values were significantly higher than the average AGB of 59.9 Mg/ha in this study. The first reason for that could be the different study area: the area of either Heilongjiang province or northeastern China is much larger than Maorshan Experimental Forest Farm and includes the areas with high AGB values, such as Daxing'an Mountains, Xiaoxing'an Mountain, or Changbai Mountains, which results in a higher average AGB value. The second reason could be that the data saturation in this study greatly causes the relatively low average AGB, although the range of predicted AGB (0-491.04 Mg/ha) is reasonable. Thus, how to eliminate data saturation and quantitatively determine saturation for NSFs still need further investigation.

Limitations and Recommendations
The AGB retrievals with high accuracy from remotely sensed data is not an easy task. Every procedure or factor could greatly influence the accuracy, including data sources, feature extraction and selection, estimation models, and model evaluation, and so on. Although high accuracies of AGB estimation were yielded by the CNN and SG(CNN) models based on the combination of the optimal ALS and Landsat 8 features and all COLI2 (Feature 1 + 2 + 5), there were still limitations in this study. First, in this study, we only tested the features (COLI1, COLI2) proposed by [48] and compared them with the direct combination of these original features that generated them for the AGB estimation of NSFs. It is possible to find a more effective approach to combine ALS and Landsat 8 imagery than COLIs for NSFs. Thus, it is still valuable to propose novel features or explore other synergistic approaches based on multiple data sources for various forest types.
The second limitation is that the underestimation of high AGB values and the overestimation of low AGB values were not eliminated from the wall-to-wall prediction map, although the CNN model had good efficiency and high accuracy according to model evaluation results. Data saturation might be responsible for this phenomenon and lead to a much lower average of AGB estimates of the entire study area than those values in similar studies [57,69,104,105]. The high risks of overfitting resulted from the data-driven models could be another possible reason for the big discrepancy between model evaluation results and final wall-to-wall prediction. Thus, the development of models with good generalizability in the estimation of biomass and the interpretation of the physical meaning of models are strongly recommended in further research [17].
In addition, the model evaluation procedure based on leave-one-out cross-validation may be another incentive for the high accuracy of the CNN model using reference data. Leave-one-out cross-validation is a special case of K-fold cross-validation where the number of folds equals the number of records in the data set [106]. Since the evaluated model is applied once for each record, using all other records as a training set and the selected record as a single-item test set, it could tend to yield higher accuracy due to overfitting compared to ten-fold cross-validation, for example, which only uses 90% records to train the model. However, the quantitative effects of different cross-validation procedures on AGB estimations still need to be further investigated. Sometimes, it could be a big difference between the accuracy of the model evaluation procedure using reference data and wallto-wall prediction values. Thus, besides the traditional model evaluation procedure, we strongly suggest assessing the spatial distribution of AGB estimates based on a wall-towall prediction map and distribution of AGB estimates based on histogram compared to existed data.
The AGB estimation in this study was based on an area-based approach (ABA) that develops models to relate AGB with features derived from remotely sensed data at a plot level and apply the models over the whole study area [17]. The fixed plots of continuous forest resources inventory obtained in 2016 had an area of 20 m ×30 m with the geolocation error of 5 m, while the pixel size of Landsat 8 was 30 m × 30 m. Thus, geolocation mismatch between remotely sensed data (i.e., Landsat 8 imagery) and field measurements is another source of uncertainty of AGB estimation [107]. Fortunately, the large plot size (i.e., 195) in this study could greatly decrease the geolocation errors according to [107]: the geolocation errors will be stabilized below 5 m with 20 measurement points and below 3 m with 50 measurement points. Another drawback of this study is the lack of assessing biomass uncertainty based on ABA. It is difficult for AGB estimation using ABA to understand biomass uncertainties at different spatial scales [108]. In recent years, with the development of automatic individual tree crown delineation algorithms in precise forestry (e.g., [109,110]), the AGB estimation based on individual-tree-based approach (ITA) has received more and more attention because field data are needed only for a sample of trees instead of a sample of plots or stands [17]. In addition, ITA allows AGB estimation of tree-level, plot-level, and propagation of errors in an up-scaling framework [108]. Thus, it is appealing and worth estimating AGB based on ITA for a large-scale forest and quantifying its uncertainty from tree-level to plot-level then to stand-level in an up-scaling framework in subsequent research.

Conclusions
Accurate quantification of AGB plays a vital role in forest carbon sequestration in the context of climate change. In this study, we investigated the effects of different synergistic approaches of features and ensemble learning algorithms on AGB estimation of natural secondary forests of northeastern China based on ALS and Landsat 8 OLI imagery. It is conducive to combine active and passive data to improve the accuracy of AGB estimation. Unlike the previous study implemented in southeastern China [48], we found that COLI2 features are more effective in AGB estimation than COLI1 features for the NSFs. Sometimes, it might be more convenient and efficient to adopt the simple combination of the untransformed features (e.g., the optimal Landsat 8 features + BLV) than the novel features (i.e., COLI1 or COLI2), especially for NSFs of northeastern China. The CNN model was much superior to multiple linear regression and other classic machine learning algorithms (i.e., ELM, BP, RegT, RF, SVR, KNN) no matter of feature sets, and reached the highest accuracies (R 2 = 0.99, RMSE = 6.85, rRMSE = 0.04, MAE = 2.95, MAPE = 1.02, PM = 0.03) when optimal ALS and Landsat 8 features and all COLI2 (Feature 1 + 2 + 5) was applied. Ensemble learning algorithms (SG(RF), SG(SVR), SG(KNN), SG(CNN)) that took advantage of the good and stable predictions from the base models (i.e., RF, SVR, KNN, CNN) greatly improved the accuracy of AGB and had stronger generalization ability compared to its corresponding base model. The ensemble learning algorithm is exceedingly adept to train weaker learners to strong learners, especially when applying heterogeneous ensemble strategy. The SG model with CNN meta-model performed best (R 2 = 0.99, RMSE = 2.02, rRMSE = 0.01, MAE = 0.87, MAPE = 0.73, PM = 0.02) with the feature combination of the optimal ALS and Landsat 8 features and all COLI2 (Feature 1 + 2 + 5) in this study. However, considering both the efficiency (i.e., runtime) and accuracy, a wall-to-wall AGB prediction map of Maoershan was generated using the CNN model and Feature 1 + 2 + 5, instead of the SG(CNN) model. The average and standard deviation of the estimated AGB of Maoershan Experimental Forest Farm in 2015 was 59.9 Mg/ha and 48.69 Mg/ha, respectively, ranging from 0 to 491.04 Mg/ha. The lower average value than that of similar studies for northeastern China maybe because of the different study areas, data saturation, overfitting of the algorithm, and leave-one-out cross-validation. Estimating data saturation, developing advanced algorithms, understanding the effects of the different cross-validation procedures, and quantifying the sources of error are still fundamental and significant to AGB estimation at all levels.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.