Estimation of Forest Above-Ground Biomass by Geographically Weighted Regression and Machine Learning with Sentinel Imagery

Accurate forest above-ground biomass (AGB) is crucial for sustaining forest management and mitigating climate change to support REDD+ (reducing emissions from deforestation and forest degradation, plus the sustainable management of forests, and the conservation and enhancement of forest carbon stocks) processes. Recently launched Sentinel imagery offers a new opportunity for forest AGB mapping and monitoring. In this study, texture characteristics and backscatter coefficients of Sentinel-1, in addition to multispectral bands, vegetation indices, and biophysical variables of Sentinal-2, based on 56 measured AGB samples in the center of the Changbai Mountains, China, were used to develop biomass prediction models through geographically weighted regression (GWR) and machine learning (ML) algorithms, such as the artificial neural network (ANN), support vector machine for regression (SVR), and random forest (RF). The results showed that texture characteristics and vegetation biophysical variables were the most important predictors. SVR was the best method for predicting and mapping the patterns of AGB in the study site with limited samples, whose mean error, mean absolute error, root mean square error, and correlation coefficient were 4 × 10−3, 0.07, 0.08 Mg·ha−1, and 1, respectively. Predicted values of AGB from four models ranged from 11.80 to 324.12 Mg·ha−1, and those for broadleaved deciduous forests were the most accurate, while those for AGB above 160 Mg·ha−1 were the least accurate. The study demonstrated encouraging results in forest AGB mapping of the normal vegetated area using the freely accessible and high-resolution Sentinel imagery, based on ML techniques.


Introduction
As the largest carbon sinks on land, forest ecosystems account for about 80% of terrestrial biosphere carbon storage, and play a pivotal role in mitigating climate change [1,2].Above-ground biomass (AGB), accounting for between 70% and 90% of total forests biomass [3], is one of the important carbon pools in forest ecosystems, and it is a key indicator of forest vegetal health, as well as related seral stages [4,5].The spatially explicit measurement of forests' AGB also supports REDD+ (reducing emissions from deforestation and forest degradation, plus the sustainable management of forests, and the conservation and enhancement of forest carbon stocks) processes [6].Therefore, the rapid and accurate estimation and monitoring of AGB over various scales of space and time are crucial for greatly reducing the uncertainty in carbon stock assessments, and for informing strategic forest management plans [7][8][9].Traditional field-based measurements provide the most accurate AGB values, but they are destructive and spatially limited [10,11].Uncertainty and bias in field measurements obviously exist, particularly those with large trees and tropical issues [4,5].Combining remote sensing and sample plot data has become a popular method to generate spatially explicit estimations of forest AGB [12,13].Various types of remote-sensing data are used for forest biomass estimation such as optical sensor data, radio detection and ranging (radar) data, and light detection and ranging (LiDAR) data, with each one having certain advantages over the others [14,15].Optical sensors were first applied to retrieve the horizontal forest structure and AGB assessments through field sampling, due to their aggregate spectral signatures (reflectance or vegetation indices) with global coverage, repetitiveness, and cost-effectiveness [16,17].Optical remote sensing data from a number of platforms, such as IKONOS, Quickbird, Worldview, ZY-3, systeme probatoire d'observation de la terre (SPOT), Sentinel, Landsat, and moderate-resolution imaging spectroradiometer (MODIS), with spatial resolutions varying from less than one meter to hundreds of meters, have been used by numerous researchers for biomass estimation [18][19][20].However, the widespread usage of optical data is limited by its poor penetration capacity through clouds and forest canopies, as well as data saturation problems [21].Radar data, available internationally from airborne or space-borne systems with different frequency bands, polarizations, and variable imaging geometries, such as Terra-SAR (Terra-Synthetic Aperture Radar), advanced land observing satellite phased array L-band synthetic aperture radar (ALOS PALSAR), and Sentinel, have gained prominence for AGB estimation because of their better penetration ability and detailed vegetation structural information, but these still suffer from signal saturation problems [6,14,22].LiDAR, an active remote-sensing technology, captures forests' vertical structures in great detail and provides 3D information, such as the geoscience laser altimeter system (GLAS), which has found favor in biomass estimation with an improved accuracy, but with complex data-processing, and the lack of space continuity problems [23,24].Additionally, some research using 3-D terrestrial LiDAR has shown bias in biomass, especially from tropical species, mainly because of the underestimation of tree height [25,26].In other words, the accuracy of forest AGB estimates could be improved by a combined use of multi-source remote sensing data.The above-mentioned Sentinel satellite constellation series, including the Sentinel-1 C-band Synthetic Aperture Radar (SAR) and the Sentinel-2 multispectral instrument by the European Space Agency (ESA), provide new capabilities for AGB mapping with wide coverage, a short return cycle, and a long lifespan as the same data format [27][28][29].In other words, the Sentinel series may have a high synergetic potential for overcoming the limitation of single remote sensing techniques for forest AGB estimation.The Sentinel imagery have been applied in a number of previous vegetation studies, focusing on classification [30][31][32], vegetation parameters on agricultural fields [33][34][35], grassland [36][37][38], and forests [39,40], as well as the damage extent of disasters [41][42][43], while forest AGB mapping based on Sentinel imagery is still insufficient.
The techniques for estimating forests' AGB based on remote sensing data have allowed for 'scale-up' or extrapolation of the field data collected for larger scales [8,44].It is a predictive mapping process for an estimation of the value at a location without direct observation.It depends on the values of points at nearby sites where observations were made, and/or values of other factors at the sites, through various methods.Those methods can be divided into two categories: parametric and non-parametric algorithms [45,46].The former refers to statistical regression methods, such as the stepwise regression models (SWR) and geographically weighted regression (GWR), by which the expression relating to the dependent variable (AGB) and the independent variables is easy to calculate [2].However, there is no simple global linear relationship like the SWR model, between remote sensing data and forest AGB, because it is affected by many factors.The GWR method explores spatial heterogeneity, as well as the non-stationarity of relationships, and it estimates the parameters for each sample location, which makes it a very attractive tool for remotely-sensed biomass modeling [47,48].Non-parametric techniques, including machine learning (ML) methods such as the k-nearest neighbor (KNN), artificial neural network (ANN), support vector machine for regression (SVR), and random forest (RF), have a better ability for identifying complex relationships between predictors and the forest AGB [2,49].Despite a variety of forest AGB models, quite a few research studies merely focused on one parametric or non-parametric model are unpersuasive.Thus, a systematic comparison of GWR, ANN, SVR, and RF for mapping forest AGB based on Sentinel imagery is fairly urgent, but also rare in the literature.
In this study, the ability of Sentinel-1 and Sentinel-2 imagery for the retrieval and predictive mapping of forests' AGB estimation was evaluated.The specific objectives included the following: (1) to determine and model the relationship between field-measured forests AGB and Sentinel-based predictors, including Sentinel-1 SAR backscatter information and Sentinel-2 multispectral indices based on GWR and ML; (2) to evaluate and compare the accuracy of the biomass prediction models, including GWR, ANN, SVR, and RF models; and (3) to map forest AGB spatial distribution by four optimal models.The novelty of this paper is the use of Sentinel-1 (texture characteristics and backscatter coefficients) and Sentinel-2 (multispectral bands, vegetation indices, and biophysical variables) imagery in the mapping of forest AGB and AGB model development, as well as their comparison.This study attempted to contribute to the development of remote sensing-based predictive mapping techniques for forest AGB using freely accessible multi-source remote sensing data with a relatively high spatial resolution.

Study Site
This study was conducted in a sample area (42  20 E), which spanned over 2500 km 2 .It is located in the southeast region of Jilin Province, northeast China, in the center of the Changbai Mountains (Figure 1).The study area consists of five towns: Yanjiang and Lushuihe towns of the Fusong County of Baishan City; and Yongqing, Liangjiang, and Erdaobaihe towns of the Antu County of the Yanbian Korean Autonomous Prefecture.This area has a northern temperate continental monsoon climate, with an annual average temperature of 2.8 • C and an annual precipitation of 8000 mm [50,51].Characterized by high forest cover, the spatial distribution of forest types in the study area obtains obvious vertical zonality, with Pinus koraiensis Sieb.et Zucc., Larix gmelinii var.japonica, Betula platyphylla Suk., Fraxinus mandschurica Rupr., and Juglans mandshurica Maxim.as typical tree species [51].
Forests 2018, 9, x FOR PEER REVIEW 3 of 20 regression (SVR), and random forest (RF), have a better ability for identifying complex relationships between predictors and the forest AGB [2,49].Despite a variety of forest AGB models, quite a few research studies merely focused on one parametric or non-parametric model are unpersuasive.Thus, a systematic comparison of GWR, ANN, SVR, and RF for mapping forest AGB based on Sentinel imagery is fairly urgent, but also rare in the literature.In this study, the ability of Sentinel-1 and Sentinel-2 imagery for the retrieval and predictive mapping of forests' AGB estimation was evaluated.The specific objectives included the following: (1) to determine and model the relationship between field-measured forests AGB and Sentinel-based predictors, including Sentinel-1 SAR backscatter information and Sentinel-2 multispectral indices based on GWR and ML; (2) to evaluate and compare the accuracy of the biomass prediction models, including GWR, ANN, SVR, and RF models; and (3) to map forest AGB spatial distribution by four optimal models.The novelty of this paper is the use of Sentinel-1 (texture characteristics and backscatter coefficients) and Sentinel-2 (multispectral bands, vegetation indices, and biophysical variables) imagery in the mapping of forest AGB and AGB model development, as well as their comparison.This study attempted to contribute to the development of remote sensing-based predictive mapping techniques for forest AGB using freely accessible multi-source remote sensing data with a relatively high spatial resolution.

Study Site
This study was conducted in a sample area (42°17′-42°49′ N, 127°35′-128°20′ E), which spanned over 2500 km 2 .It is located in the southeast region of Jilin Province, northeast China, in the center of the Changbai Mountains (Figure 1).The study area consists of five towns: Yanjiang and Lushuihe towns of the Fusong County of Baishan City; and Yongqing, Liangjiang, and Erdaobaihe towns of the Antu County of the Yanbian Korean Autonomous Prefecture.This area has a northern temperate continental monsoon climate, with an annual average temperature of 2.8 °C and an annual precipitation of 8000 mm [50,51].Characterized by high forest cover, the spatial distribution of forest types in the study area obtains obvious vertical zonality, with Pinus koraiensis Sieb.et Zucc., Larix gmelinii var.japonica, Betula platyphylla Suk., Fraxinus mandschurica Rupr., and Juglans mandshurica Maxim.as typical tree species [51].

Field Observations
The field campaign was conducted from July to August, 2017.Before that, the distribution of sampling plots was generated using ArcGIS (version 10.0, ESRI, RedLands, CA, USA), with the non-forest area being masked out.Non-forested areas were derived from 2015 land use and a land cover map [52] by visual interpretation and manual modification based on Sentinel-2 images (Table 1).In the field, nearby preset plots, a total of 56 sampling quadrats measuring 10 m × 10 m were laid out at each representative sampling plot (Figure 1), including two evergreen coniferous forests, 30 broadleaved deciduous forests, five deciduous coniferous forests, and 19 mixed broadleaf-conifer forests.
The sampling equipment were the diameter at breast height ruler for measuring the diameter at 1.3 m from the ground and the laser altimeter (TruPulse200, Laser Technology Inc., Norristown, PA, USA) for the tree height measurement.Based on allometric equations (Table 2), some AGB are the sum of tree trunks, branches, and leaf biomass, and others are directly calculated by diameters at breast height (cm) and tree height (m).The AGB of 56 samples ranged from 14.64~317.40Mg•ha −1 , with mean and median values of 80.44 and 66.00 Mg•ha −1 , respectively.
AGB, T, B, L, D, and H represent above-ground biomass, tree trunk biomass, branch biomass, leaf biomass, diameter at breast height, and tree height, respectively.

Satellite Data Pre-Processing and Predictors Derived
Sentinel-1 Synthetic Aperture Radar (SAR) and cloud-free Sentinel-2 multispectral imagery from the European Space Agency used in this study (Table 1) were downloaded from the agency's Copernicus Sentinel Scientific Data Hub (https://scihub.copernicus.eu/dhus/#/home).Sentinel-1 C-band images adopted in this study were collected in the interferometric wide swath mode of the VH (vertical transmit-Horizontal receive) and VV (vertical transmit-vertical receive) polarizations.With a pixel size of 10 m, the SAR images are at a high-resolution (HR) Level-1 ground range detected (GRD) processing level.The Sentinel-2 Level 1C data involved were top-of-atmosphere reflectance images, and they were processed for orthorectification and spatial registration.The imagery had 13 spectral bands in the visible, near infrared, and short-wave infrared regions, and had 10 m (band 2-4, 8), 20 m (band 5-7, 8a, 11-12), and 60 m (band1, 9-10) spatial resolutions.In addition, elevation data (30 m) Forests 2018, 9, 582 5 of 20 from the Shuttle Radar Topography Mission (SRTM) product was acquired from the United States Geological Service's Earth Explorer (https://earthexplorer.usgs.gov/)for inclusion in the analysis of the Sentinel imagery [55].
The processing steps used in the study are summarized in Figure 2. Sentinel application platform (SNAP) software (version 6.0, European Space Agency) was used to pre-process Sentinel-1 and Sentinel-2 imagery.The steps for the SAR imagery based on the Sentinel-1 Toolbox consisted of image calibration, speckle reduction using the Refined Lee Filter, and terrain correction using the Range-Doppler to acquire an accurate radar intensity backscatter coefficient with a map projection [56,57].Multispectral imagery was atmospherically corrected and processed by the radiative transfer model-based Sen2Cor atmospheric correction processor (version 2.5.5,European Space Agency) to a Level-2A product, a bottom-of-atmosphere-corrected reflectance image.The pre-processed Sentinel images, as well as the elevation data, were brought into a common map projection, universal transverse mercator (UTM) Zone 52 WGS84, and resampled to 10 m pixel sizes.Subsetting and mosaicking were done to cover the study area.for inclusion in the analysis of the Sentinel imagery [55].
The processing steps used in the study are summarized in Figure 2. Sentinel application platform (SNAP) software (version 6.0, European Space Agency) was used to pre-process Sentinel-1 and Sentinel-2 imagery.The steps for the SAR imagery based on the Sentinel-1 Toolbox consisted of image calibration, speckle reduction using the Refined Lee Filter, and terrain correction using the Range-Doppler to acquire an accurate radar intensity backscatter coefficient with a map projection [56,57].Multispectral imagery was atmospherically corrected and processed by the radiative transfer model-based Sen2Cor atmospheric correction processor (version 2.5.5,European Space Agency) to a Level-2A product, a bottom-of-atmosphere-corrected reflectance image.The pre-processed Sentinel images, as well as the elevation data, were brought into a common map projection, universal transverse mercator (UTM) Zone 52 WGS84, and resampled to 10 m pixel sizes.Subsetting and mosaicking were done to cover the study area.In this study, 44 predictors were selected and extracted according to previous research [55,58].Shown in Table 3, 23 predictors were from Sentinel-1 and 18 variables were from Sentinel-2, as well In this study, 44 predictors were selected and extracted according to previous research [55,58].Shown in Table 3, 23 predictors were from Sentinel-1 and 18 variables were from Sentinel-2, as well as Forests 2018, 9, 582 6 of 20 elevation (H) from SRTM.Additionally, the other two variables were longitude and latitude.The first and second part derived from the Sentinel-1 imagery consisted of relating the field AGB with Sentinel SAR polarization channels, and their calculation and texture characteristics.The third to fifth parts based on the Sentinel-2 images proceeded with relating the field AGB to the multispectral bands, vegetation indices, and biophysical variables.The biophysical variables were also calculated in SNAP from its biophysical processor, which uses a neural network algorithm based on the PROSPECT+SAIL (PROSAIL) radiative transfer model [59].Except for serving as a predictor, the elevation data from SRTM digital elevation model (DEM) were supplemented with Sentinel imagery processing to improve the accuracy (Figure 2).  4  (Band 5 − Band 4)/(Band 5 + Band 4) IRECI 5  (Band 7 − Band 4)/(Band 5/Band 6) TNDVI 6  [(Band 8 − Band 4)/(Band 8 + Band 4) + 0.5]

Modeling the Relationship between Field AGB and Satellite Data
Firstly, the pairwise Pearson's product-moment correlation analysis was conducted to determine the correlation of observed above-ground biomass and Sentinel-based predictors, as well as the collinearity between predictors.Predictors that were highly correlated to each other (r ≥ 0.8), and had high variance inflation factors (VIFs ≥ 10) in regression analysis, were excluded from modeling [65,66].These analysis steps were performed using SPSS (version 21.0, IBM, Armonk, NY, USA).Then, all of the explanatory variables were transformed to a Z-score to eliminate the effect of index dimension and quantity of data.The formula was defined as x* = (x − µ)/σ, where µ is the mean value of a specific explanatory variable and σ is its standard deviation [67].After that, GWR, ANN, SVR, and RF were used in this study to model ABG based on Sentinel-derived predictors (Figure 2).

Geographically Weighted Regression
Originally proposed by Brunsdon et al. (1998), GWR is a powerful approach for modeling spatially heterogeneous processes at a local scale [68][69][70].It estimates the individual parameters for each estimation location, and the parameter estimation at any location obeys the distance decay.In other words, the closer to the location of an observation, the greater the weight that is allotted for the observation.The GWR form is regularly expressed as [47]: where ∧ y i is the dependent variable value of observation i considered in the parameter estimation at the location (u i , v i ); β 0 (u i , v i ) is the intercept; β k (u i , v i ) is the coefficient of k explanatory variables, indicating a parameter estimate that explains the relationship around location (u i , v i ), which varies with the location; x* ik represents the independent variables of observation i; p is the total number of explanatory variables; and ε i is the error term that is generally assumed to be explanatory and normally distributed with zero mean and constant variance.The parameter estimator β k (u i , v i ) is identical to the weighted least squares regression, where the weights are computed based on the distance between the observations [48].The parameters are estimated as [71]: where X is the matrix formed by x* ik ; Y is the vector formed by values of the dependent variables; W(u i , v i ) is a weight matrix to ensure that those observations near the point i have more influence on the results than those farther away; and the weights are calculated based on a kernel weighting scheme such as fixed Gaussian, fixed bi-square, adaptive bi-square, and adaptive Gaussian [72].In this study, we used the fixed Gaussian kernel as [73]: where d i (u i , v i ) is the distance between the observation i and the location (u i , v i ); and h is a quantity called bandwidth, which controls the effect of the distance on the weight value.GWR was conducted using GWR (version 4.0, Ritsumeikan University, Kyoto, Japan), by which the weight function (a geographic kernel type) and the minimum value of the corrected Akaike information criterion (AICc, small sample bias corrected AIC) are determined to find the optimal bandwidth by a golden section search [74].

Machine Learning Methods
The machine learning methods adopted in this study were modeled in WEKA software (version 3.8, The University of Waikato, Hamilton, New Zealand).The models defined the best parameters with the highest correlation coefficient (r) and the lowest root mean square error (RMSE) for the prediction of AGB, and then AGB mapping was implemented in ArcGIS.

Multi-Layer Perception Neural Network
As a nonparametric mathematical model, ANN is inspired by biological neural networks and it has strong abilities for linear and nonlinear fitting [75,76].The ANN considered in this study was the multi-layer perception neural network (MLPNN).The architecture of the MLPNN consists of an input layer containing predictors, one or more hidden layers, and an output layer containing the response variable, along with interconnection weights characterizing the connection strength between these layers (Figure 3).The algorithm chosen was the back-propagation (BP) learning rule, an iterative gradient descent algorithm that was designed to minimize the mean square error between the desired target and the actual output vectors [77,78].The initial weights were assigned randomly, and when developing the network, the interconnection weights were adjusted to minimize the prediction error [79,80].
multi-layer perception neural network (MLPNN).The architecture of the MLPNN consists of an input layer containing predictors, one or more hidden layers, and an output layer containing the response variable, along with interconnection weights characterizing the connection strength between these layers (Figure 3).The algorithm chosen was the back-propagation (BP) learning rule, an iterative gradient descent algorithm that was designed to minimize the mean square error between the desired target and the actual output vectors [77,78].The initial weights were assigned randomly, and when developing the network, the interconnection weights were adjusted to minimize the prediction error [79,80].

Support Vector Machines for Regression
SVR is a regression version of support vector machines that project the training dataset from a lower dimensional space into a higher dimensional feature space using kernel functions to separate groups of input data in a linearized manner, based on the Vapnik-Chervonenkis (VC) dimension theory and structural risk minimization [81][82][83].An SVR function for AGB estimation is defined as [84]: where x is a vector of the input predictors; K(xk, xj) is a kernel function; b is a constant threshold; and αk and * k α are the weights (Lagrange multipliers), with the constraints given in Equation ( 5): where C is the regularization parameter for balancing between the training error and model complexity.The sequential minimal optimization (SMO) algorithm was used to solve the quadratic programming optimization problem step-by-step and to update Equation (4) to reflect the new values until the Lagrange multipliers converged [66,85].
Among the various kernel functions, the radial basis function (RBF) shows a superior performance and robust results [86,87], and was used in this study [88]:

Support Vector Machines for Regression
SVR is a regression version of support vector machines that project the training dataset from a lower dimensional space into a higher dimensional feature space using kernel functions to separate groups of input data in a linearized manner, based on the Vapnik-Chervonenkis (VC) dimension theory and structural risk minimization [81][82][83].An SVR function for AGB estimation is defined as [84]: where x is a vector of the input predictors; K(x k , x j ) is a kernel function; b is a constant threshold; and α k and α * k are the weights (Lagrange multipliers), with the constraints given in Equation ( 5): where C is the regularization parameter for balancing between the training error and model complexity.
The sequential minimal optimization (SMO) algorithm was used to solve the quadratic programming optimization problem step-by-step and to update Equation ( 4) to reflect the new values until the Lagrange multipliers converged [66,85].Among the various kernel functions, the radial basis function (RBF) shows a superior performance and robust results [86,87], and was used in this study [88]: where σ is a scale parameter chosen based on the training data, and a unit vector could be concatenated with kernels as the intercept.In a word, the training of the SVR model required finding the best values for the two meta-parameters: the regularization parameter (C) and the kernel width (σ).

Random Forests
As a classification and regression tree (CART) technique proposed by Breiman (2001), RF combines bagging [89,90] with random variable selections at each node [91] to iteratively generate a large group of CARTs.The classification output represents a majority vote (classification), or an average (regression) from the whole ensemble, and hence achieves a more robust model than a single classification tree that is produced by a single model run [89,92,93].A number of decision trees in RF choose their best splitting attributes from a random subset of predictors at each internal node without pruning.Based on the bootstrap sampling procedure, RF ensures at the same time the smallest obtainable bias and very low data variance [94].There are two main important parameters: numFeatures, which means the number of features for splitting the nodes, whose default value is int[ln(numbers of predictors) + 1] in WEKA; and numIterations, which means the number of trees to be optimized in the modeling process, depending on specific application objectives [95].

Evaluation of ABG Models
Based on measured AGB samples, the mean error (ME), mean absolute error (MAE), and RMSE as defined by Isaaks and Srivastava (1989), with r between the measured and estimated ABG, were used to evaluate the performances of different interpolation methods [55,96].

Statistics Analysis
The poor correlation of the observed AGB and the predictors was acquired from the low value of r (−0.288-0.263).Among predictors, VV_MAX (r = −0.288),VV_ENE (r = −0.284),VV_HOM (r = −0.277),VV_ASM (r = −0.276),and VV_ENT (r = 0.263) were significantly correlated (p-values < 0.05) with AGB.Predictors from the texture characteristics of Sentinel-1 VV polarization were significantly associated with AGB, as similarly found by Pan and Sun (2018) [58].Those r values represent an average global correlation; thus, it also indicated that the global linear regression was inappropriate for AGB modeling in this study.

Models of GWR and ML
The optimal bandwidth for GWR in this study was 0.023, with the minimum value of AICc being 694.91, and the adjusted R 2 of this GWR model being 0.79.For a given environmental variable, its coefficient from GWR varied across the study area.The top five predictors for the mean magnitude of the coefficients were VV_MEA (16.5, negative), VV_VAR (16.1, positive), LAI (1.6, positive), Cab (0.9, positive), and FVC (Fraction of Vegetation Cover) (0.8, positive).Predictors from the texture characteristics of Sentinel-1 VV polarization and the vegetation biophysical variables of Sentinel-2 showed a relatively strong association with AGB at a local regression.
As for the MLPNN model, the accuracy for various numbers of neurons in the hidden layer is shown in Figure 4.The results revealed that the optimized MLPNN architecture was 40-10-1, indicating that there were 40 input nodes in the input layer, 10 nodes in the hidden layer with the unipolar sigmoid as the transfer function, and one node in the output layer.Using the Levenberg-Marquardt learning algorithm, the best learning rate, momentum, and training time (iterations) obtained were determined to be 0.2, 0.3, and 500, respectively.In the SVR model, the best parameters C and σ obtained were 1 and 2, respectively.With iteration (trees) numbers of 50 and feature numbers of 5, the top five predictors for attribute importance in the selected RF model were VV_CON, VH_HOM, VH_DIS, VH_ASM, and VH_ENT.The texture characteristics of Sentinel-1 VV and VH polarizations were relatively vital for modeling AGB by RF in this study.(iterations) obtained were determined to be 0.2, 0.3, and 500, respectively.In the SVR model, the best parameters C and σ obtained were 1 and 2, respectively.With iteration (trees) numbers of 50 and feature numbers of 5, the top five predictors for attribute importance in the selected RF model were VV_CON, VH_HOM, VH_DIS, VH_ASM, and VH_ENT.The texture characteristics of Sentinel-1 VV and VH polarizations were relatively vital for modeling AGB by RF in this study.

Models Assessment by Evaluation Indices
Table 4 presents the accuracies of four models for estimating the AGB of 56 forest quadrats.All four models overestimated the AGB.The ANN model resulted in an ME of 0.84 Mg•ha −1 , and had the highest tendency for overestimation; the SVR model with an ME of 0.004 Mg•ha −1 showed the lowest tendency for overestimation.The MAE, indicating the extent to which the process leads to error, was lower with SVR (0.07 Mg•ha −1 ), and higher with the other methods, ranging from 1.21 (ANN) to 4.01 Mg•ha −1 (GWR).The values of the RMSE suggested that ML methods, whose RMSE ranged from 0.08 (SVR) to 4.43 (RF) Mg•ha −1 , produced less error than GWR.A better consistency between the measured AGB and the estimated one was discovered by the r values from the ML models (0.999 of RF to 1 of SVR and ANN) compared to the GWR model (0.995).The SVR model gave rise to the lowest RMSE and the closest-to-zero ME and MAE values, as well as the highest r value.Hence, in this study, the SVR model was the most accurate model for estimating AGB.Besides, the accuracy ranking of the four methods from high to low was SVR, ANN, RF, and GWR.To further analyze the modeling accuracy, the estimated values of AGB were plotted against the measured values (Figure 5).An estimation from the SVR model showed the best agreement along the 1:1 line, followed by those from ANN, GWR, and RF.

Models Assessment by Evaluation Indices
Table 4 presents the accuracies of four models for estimating the AGB of 56 forest quadrats.All four models overestimated the AGB.The ANN model resulted in an ME of 0.84 Mg•ha −1 , and had the highest tendency for overestimation; the SVR model with an ME of 0.004 Mg•ha −1 showed the lowest tendency for overestimation.The MAE, indicating the extent to which the process leads to error, was lower with SVR (0.07 Mg•ha −1 ), and higher with the other methods, ranging from 1.21 (ANN) to 4.01 Mg•ha −1 (GWR).The values of the RMSE suggested that ML methods, whose RMSE ranged from 0.08 (SVR) to 4.43 (RF) Mg•ha −1 , produced less error than GWR.A better consistency between the measured AGB and the estimated one was discovered by the r values from the ML models (0.999 of RF to 1 of SVR and ANN) compared to the GWR model (0.995).The SVR model gave rise to the lowest RMSE and the closest-to-zero ME and MAE values, as well as the highest r value.Hence, in this study, the SVR model was the most accurate model for estimating AGB.Besides, the accuracy ranking of the four methods from high to low was SVR, ANN, RF, and GWR.To further analyze the modeling accuracy, the estimated values of AGB were plotted against the measured values (Figure 5).An estimation from the SVR model showed the best agreement along the 1:1 line, followed by those from ANN, GWR, and RF.

Mapping of Four AGB Models
The predicted values of AGB from the four models ranged from 11.80 to 324.12 Mg•ha −1 .For a better comparison, the values were divided into five levels by equal intervals of 62.46 Mg•ha −1 (Figure 6).Maps show the various spatial distributions of AGB.All of the six maps show that the western part of Lushuihe town was a high AGB region, with values ranging from 199.19 to 324.12 Mg•ha −1 , while low AGB regions were located near the highway connecting Lushuihe and Yangjiang towns, with values ranging from 11.80 to 136.72 Mg•ha −1 .The map resulting from ANN was characterized by more explicit spatial variation than the others.Comparing the estimated and measured AGB (14.64~317.40Mg•ha −1 , mainly ranging from 11.80 to 136.72 Mg•ha −1 with 87.5%) resulted in the following performance ranking for the four algorithms from strong to weak: SVR, ANN, RF, and GWR, meaning that ML performed better than GWR.These maps can guide resource allocation for carbon sequestration and forest management.The evaluation of the forest AGB mapping results by the four models was insufficient for this study as we were limited by the sample size.Future verification work should be conducted; this is conventionally done by independent sample sets or by acknowledged high-accuracy results such as airborne data, especially unmanned aerial vehicle LiDAR data [17,97].

Mapping of Four AGB Models
The predicted values of AGB from the four models ranged from 11.80 to 324.12 Mg•ha −1 .For a better comparison, the values were divided into five levels by equal intervals of 62.46 Mg•ha −1 (Figure 6).Maps show the various spatial distributions of AGB.All of the six maps show that the western part of Lushuihe town was a high AGB region, with values ranging from 199.19 to 324.12 Mg•ha −1 , while low AGB regions were located near the highway connecting Lushuihe and Yangjiang towns, with values ranging from 11.80 to 136.72 Mg•ha −1 .The map resulting from ANN was characterized by more explicit spatial variation than the others.Comparing the estimated and measured AGB (14.64~317.40Mg•ha −1 , mainly ranging from 11.80 to 136.72 Mg•ha −1 with 87.5%) resulted in the following performance ranking for the four algorithms from strong to weak: SVR, ANN, RF, and GWR, meaning that ML performed better than GWR.These maps can guide resource allocation for carbon sequestration and forest management.The evaluation of the forest AGB mapping results by the four models was insufficient for this study as we were limited by the sample size.Future verification work should be conducted; this is conventionally done by independent sample sets or by acknowledged high-accuracy results such as airborne data, especially unmanned aerial vehicle LiDAR data [17,97].

Sentinel-Derived Predictors
By comparing the results of the correlation analysis, the coefficients from GWR, and the attribute importance from RF, it was indicated that texture characteristics of Sentinel-1 had great potential for estimating AGB, which was also shown in previous studies [98,99].Additionally, it was a pioneering finding that the vegetation biophysical variables of Sentinel-2 were very helpful for AGB estimation using a local regression, which was found previously by non-parametric prediction [55].The backscatter coefficient of Sentinel-1 and the vegetation indices of Sentinel-2 were useful and common predictors, as confirmed by other researchers [55,[99][100][101], but their roles were assisted and not apparent for forest AGB mapping in this study.This may have resulted from a mixture of forest types in the study area, while previous studies mainly aimed at a certain type of forest, or modeling by forests types.Besides, the texture characteristics and backscatter coefficients of Sentinel-1, and the multispectral bands, vegetation indices, and biophysical variables of Sentinel-2, were first applied simultaneously in AGB modeling, so that the texture characteristics and biophysical variables were

Sentinel-Derived Predictors
By comparing the results of the correlation analysis, the coefficients from GWR, and the attribute importance from RF, it was indicated that texture characteristics of Sentinel-1 had great potential for estimating AGB, which was also shown in previous studies [98,99].Additionally, it was a pioneering finding that the vegetation biophysical variables of Sentinel-2 were very helpful for AGB estimation using a local regression, which was found previously by non-parametric prediction [55].The backscatter coefficient of Sentinel-1 and the vegetation indices of Sentinel-2 were useful and common predictors, as confirmed by other researchers [55,[99][100][101], but their roles were assisted and not apparent for forest AGB mapping in this study.This may have resulted from a mixture of forest types in the study area, while previous studies mainly aimed at a certain type of forest, or modeling by forests types.Besides, the texture characteristics and backscatter coefficients of Sentinel-1, and the multispectral bands, vegetation indices, and biophysical variables of Sentinel-2, were first applied simultaneously in AGB modeling, so that the texture characteristics and biophysical variables were outstanding compared to the other kinds of Sentinel-based predictors in this study.In a word, this study dug out vital and new information from the Sentinel series about forest AGB estimation.

The Comparison of Models
Similarly, previous researchers have used these four methods to estimate forest AGB and achieve good accuracies, while results of the models' comparison vary compared with this study.Cao et al. (2018), integrating airborne LiDAR and optical data, compared the accuracies of forest AGB models in the upper Heihe River Basin in northwest China, and found that RF was the best (R 2 = 0.9, RMSE = 13.4Mg•ha −1 ), following by ANN and SVR [17].Based on Landsat satellite imagery, Wu et al. (2016) implemented the optimal spatial forest AGB estimation in northwestern Zhejiang Province, China, and RF (R 2 = 0.6, RMSE = 26.4Mg•ha −1 ) also performed better than SVR [11].Liu et al. (2017) developed forest AGB models using GLAS and Landsat data, and also found that RF (R 2 = 0.95, RMSE = 17.73 Mg•ha −1 ) had a better estimation than SVR [2].Gao et al. (2018) concluded that ANN performed (RMSE = 27.6 Mg•ha −1 ) better than SVR and RF, by conducting a comparative analysis of algorithms for forest AGB estimation with ALOS PALSAR and Landsat data [49].In a word, ANN and SVR models all showed a close performance for forest AGB modeling in this study, and in previous research.The RF models generally obtained the highest accuracies among the three ML methods, while SVR showed the best performance in this study.This may be due to the smaller sample sizes in this study, and the uniform random distribution of samples in the study area.It also highlighted the powerful capacity of SVR for limited samples.Additionally, due to direct evaluation and the accuracy of the training data, rather than the independent validation set or by cross-validation from limited samples, the models of this study were obviously much more accurate than in previous research.Because the GWR and ML models have not been compared in any previous research, this finding can provide a reference for mapping forest AGB in the future.

Model Evaluation by Forest Types and Measured AGB
The mean errors of AGB prediction using the four models for different forest types and the measured values of AGB were also calculated to analyze the prediction accuracy of each forest type, and the data saturation in the Sentinel data was discussed (Figure 7).Generally, the estimated AGB values of the four forest types of 56 quadrats by the four models were all higher than the measured values (Figure 7a).Among that, the AGB estimation of the deciduous coniferous forests obtained the maximum error of 0.7 Mg•ha −1 , and all models, excluding the GWR with ME values of −0.6 Mg•ha −1 , performed the worst for deciduous coniferous forests compared to the other three forest types, with ME values ranging from 0.04 (SVR) to 1.8 (RF) Mg•ha −1 .The AGB estimation of mixed broadleaf-conifer forests had the second-large error, with 0.6 Mg•ha −1 , followed by that of broadleaved deciduous forests (0.2 Mg•ha −1 ) and evergreen coniferous forests (0.02 Mg•ha −1 ).As for the four models, the GWR performed best in the AGB estimation of mixed broadleaf-conifer forests, and the worst for evergreen coniferous forests.The ANN, SVR, and RF all gave the most accurate assessments for broadleaved deciduous forests.The ANN showed the least precise assessments for evergreen coniferous forests, but SVR and RF gave the worst for deciduous coniferous forests.In a word, the AGB estimation of broadleaved deciduous forests was the most accurate and stable in the study area using the four models based on Sentinel imagery.Among the five levels of AGB values, the last level with AGB above 160 Mg•ha −1 (from 160 to 320 Mg•ha −1 ) had the least accuracy and the most fluctuated errors, based on Sentinel imagery, while AGB from 40 to 120 Mg•ha −1 obtained relatively higher accuracies (Figure 7b).The saturation level shown in this study was much higher than other studies (at around 60-70 Mg•ha −1 ), using SAR C band data [102].This could be attributed to the integration of abundant predictors from Sentinel-1 and Sentinel-2 in the study area with normal forest coverage, which was human-dominated zones with nearby towns and villages.different AGB using geographically weighted regression (GWR), the multi-layer perception neural network (ANN), support vector machines for regression (SVR), and random forests (RF).EC, BD, DC, and MBC represent evergreen coniferous, broadleaved deciduous, deciduous coniferous, and mixed broadleaf-conifer forests, respectively.

Conclusions
To map the distribution of forest AGB at a regional scale, Sentinel SAR and multispectral imagery were selected for a group of field quadrats with a resolution of 10 m.Four AGB models, one GWR model, and three ML models were built using these field measurements and remote-sensing datasets.The results demonstrated that SVR with SMO algorithms are the best for spatially predicting and mapping the patterns of AGB in the study site.The results also showed that the texture characteristics of Sentinel-1 and the vegetation biophysical variables of Sentinel-2 were the most relative and important predictors for explaining the observed variability of AGB in the area, and that the contributions of the other Sentinel-derived factors were only marginal.The AGB estimation of broadleaved deciduous forests was the most accurate, while the AGB above 160 Mg•ha −1 had the least accuracy, indicating data saturation of Sentinel imagery.Overall, the performance of the models in this study will inform the selection of predictive mapping techniques for forest AGB modeling, while the map that is generated will be instrumental for formulating spatially-targeted climate change mitigation and sustainable land management strategies.In the future, the model performance will be improved by incorporating other important environmental data (e.g., distance to the city center and roads, as well as human disturbance) and other up-to-date Figure 7.The mean errors of the above-ground biomass predictions for (a) different forest types, (b) different AGB using geographically weighted regression (GWR), the multi-layer perception neural network (ANN), support vector machines for regression (SVR), and random forests (RF).EC, BD, DC, and MBC represent evergreen coniferous, broadleaved deciduous, deciduous coniferous, and mixed broadleaf-conifer forests, respectively.

Conclusions
To map the distribution of forest AGB at a regional scale, Sentinel SAR and multispectral imagery were selected for a group of field quadrats with a resolution of 10 m.Four AGB models, one GWR model, and three ML models were built using these field measurements and remote-sensing datasets.The results demonstrated that SVR with SMO algorithms are the best for spatially predicting and mapping the patterns of AGB in the study site.The results also showed that the texture characteristics of Sentinel-1 and the vegetation biophysical variables of Sentinel-2 were the most relative and important predictors for explaining the observed variability of AGB in the area, and that the contributions of the other Sentinel-derived factors were only marginal.The AGB estimation of broadleaved deciduous forests was the most accurate, while the AGB above 160 Mg•ha −1 had the least accuracy, indicating data saturation of Sentinel imagery.Overall, the performance of the models in this study will inform the selection of predictive mapping techniques for forest AGB modeling, while the map that is generated will be instrumental for formulating spatially-targeted climate change mitigation and sustainable land management strategies.In the future, the model performance will be improved by incorporating other important environmental data (e.g., distance to the city center and roads, as well as human

Figure 1 .
Figure 1.The location of the study area and surveyed forest quadrats.

Figure 1 .
Figure 1.The location of the study area and surveyed forest quadrats.

Figure 2 .
Figure 2. Flowchart of steps used for forest above-ground biomass mapping using Sentinel SAR and multispectral imagery.DBH is the abbreviation of diameter at breast height.

Figure 2 .
Figure 2. Flowchart of steps used for forest above-ground biomass mapping using Sentinel SAR and multispectral imagery.DBH is the abbreviation of diameter at breast height.

Figure 3 .
Figure 3. Schematic representation of an example multi-layer perception neural network (MLPNN) model structure to predict forest above-ground biomass.Shown are the inputs with their neurons oi, and interlinked connections from each input to all hidden layer neurons oj, along with the selected weightings.The weighted outputs were then merged and fed into the output neuron ok to form the output values.

Figure 3 .
Figure 3. Schematic representation of an example multi-layer perception neural network (MLPNN) model structure to predict forest above-ground biomass.Shown are the inputs with their neurons o i , and interlinked connections from each input to all hidden layer neurons o j , along with the selected weightings.The weighted outputs were then merged and fed into the output neuron o k to form the output values.

Figure 4 .
Figure 4. Training errors associated with a given number of neurons in the hidden layer.

Figure 4 .
Figure 4. Training errors associated with a given number of neurons in the hidden layer.

ForestsFigure 7 .
Figure7.The mean errors of the above-ground biomass predictions for (a) different forest types, (b) different AGB using geographically weighted regression (GWR), the multi-layer perception neural network (ANN), support vector machines for regression (SVR), and random forests (RF).EC, BD, DC, and MBC represent evergreen coniferous, broadleaved deciduous, deciduous coniferous, and mixed broadleaf-conifer forests, respectively.

Table 1 .
List of Sentinel imagery acquired for the study.

Table 3 .
Sentinel-based imagery data predictors of above-ground biomass.

Table 4 .
Performance evaluation of the GWR, ANN, SVR, and RF models.
error, and correlation coefficient, respectively.The p-values of r were all below 0.01.

Table 4 .
Performance evaluation of the GWR, ANN, SVR, and RF models.