Forest Aboveground Biomass Estimation Using Multisource Remote Sensing Data and Deep Learning Algorithms: A Case Study over Hangzhou Area in China

: The accurate estimation of forest aboveground biomass is of great significance for forest management and carbon balance monitoring. Remote sensing instruments have been widely applied in forest parameters inversion with wide coverage and high spatiotemporal resolution. In this paper, the capability of different remote-sensed imagery was investigated, including multispectral images (GaoFen-6, Sentinel-2 and Landsat-8) and various SAR (Synthetic Aperture Radar) data (GaoFen-3, Sentinel-1, ALOS-2), in aboveground forest biomass estimation. In particular, based on the forest inventory data of Hangzhou in China, the Random Forest (RF), Convolutional Neural Network (CNN) and Convolutional Neural Networks Long Short-Term Memory Networks (CNN-LSTM) algorithms were deployed to construct the forest biomass estimation models, respectively. The estimate accuracies were evaluated under the different configurations of images and methods. The results show that for the SAR data, ALOS-2 has a higher biomass estimation accuracy than the GaoFen-3 and Sentinel-1. Moreover, the GaoFen-6 data is slightly worse than Sentinel-2 and Landsat-8 optical data in biomass estimation. In contrast with the single source, integrating multisource data can effectively enhance accuracy, with improvements ranging from 5% to 10%. The CNN-LSTM generally performs better than CNN and RF, regardless of the data used. The combination of CNN-LSTM and multisource data provided the best results in this case and can achieve the maximum R 2 value of up to 0.74. It was found that the majority of the biomass values in the study area in 2018 ranged from 60 to 90 Mg/ha, with an average value of 64.20 Mg/ha.


Introduction
Zhejiang Province, China, with a 61.15% forest cover, ranks among China's top five highest percentages of forested areas.The subtropical monsoon climate over this region leads to rich forest resources [1].As the mainstay of terrestrial ecosystems, forests regulate the regional ecological environment and play a crucial role in maintaining the Earth's carbon balance, with their carbon sequestration capacity accounting for 76-98% of that of global vegetation.Any changes in the carbon stocks of forests could cause changes in global atmospheric CO 2 concentrations.A rapid, accurate and macroscopic understanding of the spatial distribution of forest resources, biomass and carbon stock values is important for supporting efforts to balance the Earth's carbon cycle, purify the ecosystem and reduce the rate of global warming [2].
As a key measure of forest carbon stocks, forest biomass is a prominent measure of the carbon sequestration capacity of forests and the basis for assessing the regional forest carbon balance [3].The use of field measurements is one of the most effective methods to measure forest biomass with in situ data and anisotropic growth equation.It is more accurate to combine this data with multiple inventory results for better modeling [4].However, the high cost and low coverage of this method limit its application on regional and global scales.Remote sensing instruments are regarded as a good compensation as they can provide consistent data sources with high frequency and global coverage at the multiscale.In particular, extensive overviews have demonstrated the value of using optical and SAR sensors to assess the damage to forests and to investigate the distribution, structure and dynamics of forest resources [5,6].The spectral information in relation to surface features in optical image and electromagnetic information (e.g., slope, shape and surface roughness) in SAR images suggest the combination of both datasets for forest parameter estimation, although heterogeneous data may also introduce errors [7,8].
From the technical viewpoint, multiple linear regression is commonly utilized to estimate biomass [9].For example, Zheng et al. [10] extracted vegetation indices from ETM images and added forest age information that is highly correlated with biomass to determine the biomass threshold for each forest age category.The multiple linear regression was deployed to estimate the biomass of pine and broadleaf forests with a validated R 2 value of 0.67.However, linear regression cannot represent nonlinear relationships well.Therefore, machine learning is introduced to enhance the estimation accuracy.Yue et al. [11] used RADARSAT-2 fully polarimetric SAR, GF1-WFV multispectral data and the biomass of winter wheat to construct a biomass estimation model for winter wheat using Random Forest (RF), and the results revealed that the model that combined the correlation coefficient analysis with the forest data had the higher accuracy.Zhou et al. [12] extracted 34 features from Landsat-8 imagery, together with in situ data, to build a biomass estimation model using Support Vector Machine (SVM) and evaluated the estimation accuracy of the model using 32 samples, with an R 2 of 0.5858 after setting the optimal parameters.Aguirre-Salado et al. associated satellite-derived, climatic, and topographic predictor variables with national forest inventory data to map biomass along the northern border of Mexico by means of the K-Nearest Neighbor (KNN) [13].Hong et al. [14] utilized airborne LiDAR data, ground-based monitoring and optical remote sensing techniques to investigate the Larix olgensis plantation in Heilongjiang Province and proposed a set of methods, namely rapid, universal, multiscale (single tree, stand, management unit, and region), and unithigh-precision continuous monitoring methods for forest biomass components.Singh et al. [15] proposed a framework to monitor aboveground biomass (AGB) at finer scales using open-source satellite data.The framework integrated four machine learning (ML) techniques with field surveys and satellite data.The application of this framework is exemplified in a case study of a dry deciduous tropical forest in India.The results revealed that for wet season Sentinel-2 satellite data, the Random Forest (adjusted R 2 = 0.91) and Artificial Neural Network (adjusted R 2 = 0.77) ML models were better suited for estimating AGB in the study area.Thus, researchers tend to apply machine learning methods that are used more frequently and with better results, such as RF and SVM.More advanced machine learning approaches (e.g., gradient boosting and convolutional neural networks (CNNs)) are still being underutilized.Subsequent studies need to increase the exploration and application of new methodologies.
From the data viewpoint, researchers typically combine multiple optical and SAR images from different sensors.Guerra-Hernández et al. [16] combined recent The Ice, Cloud, and Land Elevation Satellite-2 (ICESat-2), Sentinel-1, Sentinel-2 and ALOS2/PALSAR2 data for extrapolation of AGB estimates and AGB mapping.Nandy et al. [17] integrated ICESat-2 and Sentinel-1 data for mapping forest canopy height, and used RF methods to apply forest canopy height and Sentinel-2 derived variables to map the spatial distribution of AGB.Shendryk [18] proposed a machine learning method that fuses open access Global Ecosystem Dynamics Investigation (GEDI), Sentinel-1, Sentinel-2, elevation and land cover data for large-area AGBD mapping, and the model performs well.Currently, the important data source for many related studies is the Sentinel satellites, and researchers lack research and application of data from other satellites, such as China's Gaofen series of satellites.Many researchers have concluded that the fusion of multisource remote sensing data can lead to a more accurate prediction of aboveground forest biomass.Can the accuracy of predictions be further improved by adding more sources of remote sensing data to the study?This is a question worth exploring.
In the past five years, few studies have demonstrated that deep learning brings a significant opportunity for predicting the forest parameters [19][20][21].The estimation accuracy benefits significantly from the capability of deep learning to extract invariant and abstract features automatically from remote-sensed imagery.The trained learning models are also likely to be generalized to other forest scenarios with similar characteristics.However, the complexity of models and the data requirements (e.g., the availability of forest inventory data) limit the application of deep learning algorithms on biomass estimation.It is therefore important to assess the advantages and disadvantages of such algorithms on specific datasets and scenes carefully [22].
To this end, in response to the current state of data and methods, this experiment decided to use advanced machine learning methods and more diverse remote sensing data.This paper aims to explore the potential of different remote-sensed imagery and deep learning algorithms on forest aboveground biomass estimation.Particularly, various remote-sensed data, including optical data (GaoFen-6, Sentinel-2 and Landsat-8) and SAR data (GaoFen-3, Sentinel-1 and ALOS-2), together with three algorithms such as RF, the Convolutional Neural Networks (CNNs) and Long Short-Term Memory Networks (CNN-LSTMs), are used.We first select the feature variables from individual sources by maximizing the correlation between remote-sensed data and in situ data.Then, the aboveground biomass of the forest is estimated using different configurations of methods and feature variables.The best one was finally selected to map the spatial distribution of forests over the Lin'an district of Hangzhou, northwestern Zhejiang Province, China.

Study Area
The study area shown in Figure 1 is located in the Lin'an district of Hangzhou, in northwestern Zhejiang Province, with a longitude of 118 • 51 ′ to 119 • 52 ′ east and a latitude of 29 • 56 ′ to 30 • 23 ′ north.The area has a subtropical monsoon climate with four distinct seasons, an annual average temperature of 16.4 • C, annual precipitation of 1500.0~1628.6 mm, 1847.3 annual sunshine hours and a frost-free period of 237 days.The area has an altitude of 60~120 m, low hills, and a large area of forest coverage; the region is rich in species, and its main forest types include mixed coniferous forests and mixed broad-leaved forests, with the broad-leaved forests, with the dominant species being horsetail pine, fir, etc.

Field Data
The National Forest Resources Continuous Inventory is a forest resource survey that aims to understand the quantity, quality, and patterns of growth.It is an important part of the comprehensive monitoring of China's forest resources.Forest resource inventory data are the most exhaustive and exact data reflecting the forest resources in China.The main survey contents include forest type, accumulation, growth, and harvesting data [23].
The ground data used in this study are from the ninth forest inventory in 2018, and all sample plots in the Lin'an district were plotted in Figure 2 to select those with forestland and uniform forest stands; those with agricultural land, construction land and other nonforest land types and zero storage volume were excluded [24].The study area is rich in forest resources, and the dominant tree species in the screened sample plots mainly include four species: fir, horsetail pine, hard broad and soft broad.The data include the location, date, origin and species composition.For each sample plot, the main investigation includes diameter at breast height, tree height and species.

Field Data
The National Forest Resources Continuous Inventory is a forest resource survey that aims to understand the quantity, quality, and patterns of growth.It is an important part of the comprehensive monitoring of China's forest resources.Forest resource inventory data are the most exhaustive and exact data reflecting the forest resources in China.The main survey contents include forest type, accumulation, growth, and harvesting data [23].
The ground data used in this study are from the ninth forest inventory in 2018, and all sample plots in the Lin'an district were plotted in Figure 2 to select those with forestland and uniform forest stands; those with agricultural land, construction land and other non-forest land types and zero storage volume were excluded [24].The study area is rich in forest resources, and the dominant tree species in the screened sample plots mainly include four species: fir, horsetail pine, hard broad and soft broad.The data include the location, date, origin and species composition.For each sample plot, the main investigation includes diameter at breast height, tree height and species.
The aboveground data in Figure 2 is used to calculate forest aboveground biomass.In this paper, the aboveground biomass of individual standing trees was estimated by species using the growth models and parameters for each tree species or species group that have been documented (Table 1), and each dominant tree species was classified as one of four species categories, according to the species distribution of the forest: fir (Cunninghamia lanceolata (Lamb.)Hook., belonging to the Cupressaceae), horsetail pine (Pinus massoniana Lamb., belonging to the Pinaceae), hard broad and soft broad.ArcGIS was used to calculate the extremes and standard deviations at each sample site, and sample sites with high data dispersion and abnormal data were deleted.Finally, 160 sample sites were obtained, of which 130 were used for modeling, and 30 were used for testing.Table 1.Anisotropic growth equations for major tree species in the study area.

Optical and SAR Data Processing
Successfully launched on 2 June 2018 as China's first precise agricultural observation satellite, the Gaofen-6 (GF-6) satellite is mainly used for agriculture-related monitoring of The aboveground data in Figure 2 is used to calculate forest aboveground biomass.In this paper, the aboveground biomass of individual standing trees was estimated by species using the growth models and parameters for each tree species or species group that have been documented (Table 1), and each dominant tree species was classified as one of four species categories, according to the species distribution of the forest: fir (Cunninghamia lanceolata (Lamb.)Hook., belonging to the Cupressaceae), horsetail pine (Pinus massoniana Lamb., belonging to the Pinaceae), hard broad and soft broad.ArcGIS was used to calculate the extremes and standard deviations at each sample site, and sample sites with high data dispersion and abnormal data were deleted.Finally, 160 sample sites were obtained, of which 130 were used for modeling, and 30 were used for testing.
Table 1.Anisotropic growth equations for major tree species in the study area.

Optical and SAR Data Processing
Successfully launched on 2 June 2018 as China's first precise agricultural observation satellite, the Gaofen-6 (GF-6) satellite is mainly used for agriculture-related monitoring of crop growth, soil conditions observations and forestry [25].This satellite has eight-band Complementary Metal Oxide Semiconductor (CMOS) detectors, and it is the first domestic satellite to carry the red-edge band that can effectively monitor the growth of vegetation.According to the timing and area of the ground data sampling, a GF-6 WFV image with a spatial resolution of 16 m from 5 September 2018 was acquired.
Sentinel-2 is a multispectral high-resolution imaging satellite that is primarily used to provide monitoring information for agricultural and forestry crops [26].The Sentinel-2 wide-field, high-resolution multispectral imager (MSI) covers 13 spectral bands (443 nm-2190 nm) with a width of 290 km, encompassing the visible, near-infrared and shortwave infrared bands, with spatial resolutions of 10 m, 20 m and 60 m.Two Level-1C Sentinel-2 multispectral images acquired on 29 October and 10 November 2018 were selected.
The Landsat-8 satellite has a total of 11 bands, and the OLI Land Imager has nine bands with an imaging width of 185 km.The range of the panchromatic band has been adjusted compared to that of the Landsat-7 ETM sensor, with a narrower range to better distinguish vegetation areas from other areas.Two images covering the study area were downloaded from the USGS website, and the image information is shown in Table 2.The L1T product is the data product obtained after radiometric correction and geometric refinement using ground control points and digital elevation models.The Gaofen-3 satellite (GF-3) launched on 10 August 2016 is the first C-band multipolarimetric synthetic aperture radar satellite (SAR) in China with a resolution of 1 m and 12 imaging modes [27,28].A dual-polarization data with Stripmap mode acquired on 17 November 2018 was selected according to the test site and the timing of ground data acquisition.Two dual-polarimetric Sentinel-1 images were obtained in interferometric wide (IW) imaging mode, with imaging dates of 1 October and 13 October 2018, as shown in Table 3.The L-band ALOS-2 SAR image acquired on 8 November 2018 with the Fine Beam Dual Polarization (FBD) mode was selected.During SAR data pre-processing, radiometric terrain correction and calibration were carried out for all SAR images, followed by co-registering to the SRTM digital elevation model (DEM) and geocoding into the geographic coordinate system (EPSG: 4326) by using the ESA open-source software SNAP 9.0.0.Adaptive Lee filtering was deployed to eliminate the speckle nature in SAR images.The polarization decomposition provides a reasonable physical explanation of the scattering mechanism of the target, while the incoherent polarization target decomposition can extract the scattering characteristics of the feature more effectively according to different scattering mechanisms [29].Because the SAR data used in this study were all in dual-polarization mode, a dual-polarization Cloude decomposition was used to extract the polarization decomposition parameters, including entropy, anisotropy and the mean scattering angle.
For multispectral images, the meridian convergence angle (angle of true north and coordinate north) was calculated for each pixel, and the UTM projection was transformed into latitude and longitude coordinates.Before analyzing the optical images, all required atmospheric corrections.ENVI (The Environment for Visualizing Images) is a complete remote sensing image processing platform with better Flaash atmospheric correction.All images were atmospherically corrected using the FLAASH tool of ENVI 5.3.

Characteristic Variable Selection
Multispectral images contain band information, vegetation indices and other physical variables [30]; SAR images provide backward scattering features, interference information and polarization decomposition features [31,32].Several features related to aboveground biomass were extracted, including the Normalized Difference Vegetation Index (NDVI), Difference Vegetation Index (DVI), Green Normalized Difference Vegetation (GNDV), Ratio Vegetation Index (RVI) and statistics.On the other hand, texture information and polarization decomposition features in SAR images were also extracted.The details of the feature are shown in Tables 4-6.

Experimental Models
The flowchart of the research for this article is shown in Figure 3.

Random Forest
Random Forest (RF) is an algorithm that produces multiple trees by random sampling and classifies them as one kind of forest.The random component refers to building the model using random sampling, and the forest represents an integrated forest produced by mutually independent decision trees [33].The simplest principle behind the RF model is the random selection of n training datasets from the initial data, followed by the random selection of k features from every training set.Then, decision trees are built using these k features.Each decision tree generates and saves a prediction.Finally, the classification models are ranked based on each prediction, and the highest-ranked model is used as the final choice.We used an ensemble of bagged decision trees in MATLAB to produce the RF model.In the preparation process, the number of decision trees (n) and the number of variables selected in advance by the tree nodes of each decision tree (m) are set as 500 and 5 to minimize errors.

Convolutional Neural Network (CNN)
The convolutional neural organization consists of the convolutional layer, pooling layer and fully connected layer.The number of these parts is not fixed.The reason for utilizing convolutional operations is to discover and extract features in the input.The convolutional layer may only be able to extract such basic features as corners and edges.Still, the network layer can traverse these basic features so that some more complex features can be extracted.The role of the pooling layer is to subsample the feature maps created by the convolutional layer as a result of learning.The convolutional layer, activation layer and pooling layer can be seen as the feature learning/feature extraction layer of the CNN, while the fully connected layer is the final application of the learned features (feature map) to the model task.

Random Forest
Random Forest (RF) is an algorithm that produces multiple trees by random sampling and classifies them as one kind of forest.The random component refers to building the model using random sampling, and the forest represents an integrated forest produced by mutually independent decision trees [33].The simplest principle behind the RF model is the random selection of n training datasets from the initial data, followed by the random selection of k features from every training set.Then, decision trees are built using these k features.Each decision tree generates and saves a prediction.Finally, the classification models are ranked based on each prediction, and the highest-ranked model is used as the final choice.We used an ensemble of bagged decision trees in MATLAB to produce the RF model.In the preparation process, the number of decision trees (n) and the number of variables selected in advance by the tree nodes of each decision tree (m) are set as 500 and 5 to minimize errors.

CNN-LSTM
CNN-LSTM is a hybrid neural network model based on CNN and LSTM (Long Short-Term Memory).Combining CNN and LSTM is one of the more widespread approaches in deep learning [34].The unique convolutional kernel pooling operation of CNN can mine abstract features among data and better extract high-dimensional features, and the LSTM network has strong memory that works better for serialized data extraction.
Based on the features and advantages of the two models for data processing, a composite of the two networks is considered in Figure 4 using CNN as an encoder to extract local features of the data and build up a complete feature vector and LSTM as a decoder to obtain the correlation between the data through the memory unit to obtain the prediction value of the model.and the LSTM network has strong memory that works better for serialized data extraction.
Based on the features and advantages of the two models for data processing, a composite of the two networks is considered in Figure 4 using CNN as an encoder to extract local features of the data and build up a complete feature vector and LSTM as a decoder to obtain the correlation between the data through the memory unit to obtain the prediction value of the model.

Model Accuracy Assessment
The model results in this study were evaluated using several indicators: pseudo Rsquared (pseudo-R 2 ), root mean square error (RMSE), relative root mean square error (rRMSE), Bias and rBias.Equations for the formulation of each of the indicators can be expressed as follows:

Model Accuracy Assessment
The model results in this study were evaluated using several indicators: pseudo R-squared (pseudo-R 2 ), root mean square error (RMSE), relative root mean square error (rRMSE), Bias and rBias.Equations for the formulation of each of the indicators can be expressed as follows: pseudo (1) rBias = Bias y •100 (5) where y i is the actual biomass value, ŷi is the predicted biomass value and y i is the average of the actual measured biomass.

Predicted Variables
In Section 2.3, the backscatter coefficients of the two polarization methods were extracted based on GF-3, Sentinel-1 and ALOS-2 radar images, eight texture feature factors were extracted for each polarization method, and the H/A/α dual polarization decomposition parameters were obtained.Twenty-one features were separately extracted for each SAR dataset.Table 7 shows the Pearson correlation coefficients of the remotely sensed features and biomass calculated based on GF-3, Sentinel-1 and ALOS-2 radar data to provide support for the feature selection.
Based on the correlation analysis, we collected the features with higher correlation for each SAR data, as shown in Table 8.The backscatter coefficients and texture features of the L-band ALOS-2 data were more important for biomass inversion than those of the C-band radar data.The cross-polarized backscatter coefficients, such as HV and VH, had a greater degree of influence than did the co-polarized ones.Since cross-polarization is more sensitive to vegetation moisture, canopy roughness, volume scattering of standing trees and vertical structure, it has a greater potential for forest biomass inversion [35].The sensitivity to forest biomass after texture analysis was improved compared to both backscatter coefficients, as texture analysis reduces the stochastic heterogeneity of backscatter and improves the correlation with forest biomass [36].Due to the physical significance of the decomposition parameters, the dual-polarization decomposition parameters showed the highest correlation with biomass.
To assess the performance of GF-6, Sentinel-2 and Landsat-8 optical images, band information, vegetation indices, texture features and principal components were considered.Nineteen feature variables were extracted for each dataset.Table 9 shows the Pearson correlation coefficients of features and biomass to provide optical data support for feature selection.The feature variables with higher correlation in individual sensors were collected, as shown in Table 10.During the correlation analysis, we found that the red-edge band, vegetation indexes and principal components dominated the correlation.In particular, the band information and vegetation index showed generally higher correlations with biomass compared to the texture features.

Model Test Results
The backscatter coefficients, texture feature and polarization decomposition parameters of the three SAR datasets were retained.The forest aboveground biomass was estimated based on RF, CNN and CNN-LSTM, respectively.
Of the inverse biomass from the three SAR images in Table 11, the biomass estimation from the ALOS-2 image had a higher R 2 value and the lowest RMSE; the results from the GF-3 image had the second-best accuracy, and the results from the Sentinel-1 image had the lowest accuracy in all methods.As stated in previous studies, the L-band is more sensitive to forest biomass inversion than the C-band because the wavelength is proportional to the penetration of radar to the forest canopy [37,38].The short wavelength of the C-band cannot penetrate the dense canopy and basically reacts with the canopy, while the longer wavelength of the L-band can penetrate the vegetation canopy and obtain more vertical information.The better spatial resolutions of the GF-3 and ALOS-2 data may be the other crucial factor that affects the estimate accuracy.Comparing the different estimation methods, the composite model (CNN-LSTM) performed best, and the CNN model also performed better in inverting the deep learning models.Overall, the two deep learning algorithms examined in the paper had better inversion results compared to the machine learning models.Scatter plots of the prediction results for different methods are shown in Figure 5, in which the RF worked well in this case, probably because RF yields an importance ranking of the factors that provides a better underlying nonparametric model.The composite CNN-LSTM model used the CNN network to obtain deep information about the data and to mine the characteristics of the data, while the LSTM network had a strong memory for obtaining data associations, fully integrating the advantages of both networks in prediction and improving biomass inversion accuracy.As shown in Figure 6, of the inversion biomass values among the three types of optical images, the biomass estimation results of the Landsat-8 images had relatively high R 2 and low RMSE values, with the highest estimation accuracy.The Sentinel-2 images had the second-highest accuracy, and the GF-6 images produced the lowest accuracy estimates.
The results demonstrate the better performance of the composite model (CNN-LSTM), which combines the advantages of both networks, using the unique convolution operation of the CNN network to obtain features from the data and LSTM to obtain the data associations, thus improving the model estimation accuracy.Overall, the two deep learning algorithms had better inversion results compared to the machine learning model.

Mapping Spatial Distribution of Forest
There is a nonlinear relationship between forest biomass and remote sensing feature variables.Optical instruments can provide finer vegetation spectral spectrums, and SAR sensors are sensitive to structural and electromagnetic information related to slope, shape and surface roughness [39].By combining optical and radar images, the advantages of each image can be fully explored to achieve data complementarity, thus enhancing the accuracy of the inversion of forest biomass.Considering that data redundancy reduces the accuracy of model inversion, the joint active-passive remote sensing inversion of forest biomass needs to select variables with high correlations with biomass and to fully utilize the individual strong points of different data.To do this, the variables in Table 12 were used.

Mapping Spatial Distribution of Forest
There is a nonlinear relationship between forest biomass and remote sensing feature variables.Optical instruments can provide finer vegetation spectral spectrums, and SAR sensors are sensitive to structural and electromagnetic information related to slope, shape and surface roughness [39].By combining optical and radar images, the advantages of each image can be fully explored to achieve data complementarity, thus enhancing the accuracy of the inversion of forest biomass.Considering that data redundancy reduces the accuracy of model inversion, the joint active-passive remote sensing inversion of forest biomass needs to select variables with high correlations with biomass and to fully utilize the individual strong points of different data.To do this, the variables in Table 12 were used.Figure 7 shows the biomass inversion based on different models and data classes.Compared with the single dataset, the combination of SAR and optical images generally significantly improves estimation accuracy and model fit, regardless of the methods used.The R 2 of the CNN-LSTM prediction from the multisource data reached 0.7405, and the RMSE reached 26.4314 Mg/ha, indicating the advantage of applying a combined dataset.Second, for the estimation results of the three synergistic optical datasets, the CNN-LSTM model estimated an R 2 value of up to 0.7289 and an RMSE of up to 26.9166 Mg/ha.Finally, the inversion results based on SAR datasets and the CNN-LSTM resulted in an R 2 of 0.5882 and an RMSE of 30.0384Mg/ha.The combination of the SAR data did not significantly improve the biomass estimation accuracy, probably because the contrast of SAR features was relatively low [8].
The composite algorithm (CNN-LSTM) had a better accuracy regardless of the combination of multisource data, showing a good-fitting trend to the data.The deep learning estimation results with CNN-LSTM outperformed the RF approach, indicating that combining multisource data with deep learning is feasible for predicting forest biomass.
Considering the best performance of the CNN-LSTM model and multisource datasets, we further developed a biomass estimation model with in situ data.In Figure 8, the spatial distribution of forest biomass was mapped, and the majority of the biomass values in the study area in 2018 ranged from 60 to 90 Mg/ha, with an average value of 64.20 Mg/ha.The low-value areas of biomass were mainly concentrated in the more densely populated eastern and northern regions, while the high-value areas of biomass were mainly located in the sparsely populated remote forest areas and the western regions far from the towns, with a relatively scattered distribution.
Remote Sens. 2024, 16, x FOR PEER REVIEW 16 of 21 Figure 7 shows the biomass inversion based on different models and data classes.Compared with the single dataset, the combination of SAR and optical images generally significantly improves estimation accuracy and model fit, regardless of the methods used.The R 2 of the CNN-LSTM prediction from the multisource data reached 0.7405, and the RMSE reached 26.4314 Mg/ha, indicating the advantage of applying a combined dataset.Second, for the estimation results of the three synergistic optical datasets, the CNN-LSTM model estimated an R 2 value of up to 0.7289 and an RMSE of up to 26.9166 Mg/ha.Finally, the inversion results based on SAR datasets and the CNN-LSTM resulted in an R 2 of 0.5882 and an RMSE of 30.0384Mg/ha.The combination of the SAR data did not significantly improve the biomass estimation accuracy, probably because the contrast of SAR features was relatively low [8].The composite algorithm (CNN-LSTM) had a better accuracy regardless of the combination of multisource data, showing a good-fitting trend to the data.The deep learning estimation results with CNN-LSTM outperformed the RF approach, indicating that combining multisource data with deep learning is feasible for predicting forest biomass.
Considering the best performance of the CNN-LSTM model and multisource datasets, we further developed a biomass estimation model with in situ data.In Figure 8, the spatial distribution of forest biomass was mapped, and the majority of the biomass values in the study area in 2018 ranged from 60 to 90 Mg/ha, with an average value of 64.20 Mg/ha.The low-value areas of biomass were mainly concentrated in the more densely populated eastern and northern regions, while the high-value areas of biomass were mainly located in the sparsely populated remote forest areas and the western regions far from the towns, with a relatively scattered distribution.

Discussion
This study explored the potential of multiple remotely sensed images and deep learning algorithms for forest aboveground biomass estimation.In addition to using data from the more popular Sentinel series satellites, Landsat-8, ALOS-2, etc., the experiment also incorporates data from the seldom-used Chinese Gaofen series satellites.The experiment also uses the newly popular advanced CNN-LSTM method.To our surprise, the results from the Gaofen series satellites are very beautiful.This experiment demonstrated the great potential of China's Gaofen series of satellites to assess the aboveground biomass of forests.In addition, this work achieved good results with a small number of samples by integrating multisource remote sensing data and using the CNN-LSTM method.

Variable Selection
As involving too many remotely sensed features in modeling can cause information redundancy, correlation analysis between feature variables and forest biomass is needed, and feature variables with strong correlations with biomass are screened.In this paper, the Pearson correlation coefficient method was utilized to perform correlation analysis.Among the feature variables obtained based on C-band GF-3, Sentinel-1 and L-band ALOS-2 data, the dual-polarization decomposition parameters had the highest sensitivity to forest biomass, followed by the texture feature of backscatter and the backscatter coefficient.The backscatter coefficient and texture feature of the L-band ALOS-2 were more sensitive to biomass than those of the C-band due to the penetration capability of L-band wavelengths.Among the features of the optical images, the red-edge band, vegetation index and principal components were strongly correlated with biomass, and the band information and vegetation index were overall more correlated with forest biomass than were the texture features.The main reason for the significant quantitative and correlation advantages over the feature variables of the radar data may be the finer spatial texture information of the high-resolution imagery, while the vegetation red-edge band is very favorable for monitoring plant growth conditions on the ground.

Comparison of Different Sensors
In Figure 5, the results of the SAR datasets demonstrated that the biomass estimate accuracy of ALOS-2 in the L-band was slightly greater than that of GF-3 and Sentinel-1 in the C-band.This difference was mainly attributed to the penetration and spatial resolution.The L-band can obtain more vertical information due to its capacity to penetrate the crown.The finer spatial resolution, on the other hand, provides more detail and, therefore, has a positive effect.Meanwhile, the aboveground biomass estimation accuracies of Gaofen-3 and Sentinel-1 performed similarly under each model, and even Gaofen-3 slightly outperformed Sentinel-1.However, in Figure 7a-c, it is worth noting that the combination of the three sources of radar data did not significantly improve the accuracy of aboveground biomass estimation.The model with the highest accuracy is the CNN-LSTM model-the R 2 was 0.5882, and the RMSE was 30.0384.This result has a large gap with the aboveground biomass estimation results obtained from the combination of the three optical data.This gap may occur because of the lower biomass saturation point of the radar characterization factor.
In Figure 6, comparing the accuracy of aboveground biomass estimation results from optical data, the performance of Gaofen-6 is relatively mediocre.The accuracy of Landsat-8 is the best, with high R 2 and low RMSE in both the RF and CNN-LSTM models.In Figure 7d-f, in the CNN-LSTM model, the combination of the three optical data achieves the best among optical data results; the R 2 was 0.7289, and the RMSE was 26.9166.
In Figure 7g-i, the accuracy of aboveground biomass estimated by fusing all optical and radar data is basically higher than the accuracy of aboveground biomass obtained from a single data.Among the results, the CNN-LSTM model estimation has the highest accuracy among the 27 results; the R 2 was 0.7405, and the RMSE was 26.4314.In addition, among the results of the CNN model, the precision of the results of the fused multisource remote sensing data was higher than the precision of the other data that also used the CNN model.In the RF model, the accuracy of the results of fused multisource remote sensing data was only lower than that of Landsat-8 data.This fully demonstrates the advantage of fused multisource remote sensing data in estimating the accuracy of aboveground biomass.It also demonstrates the potential of China's Gaofen series of satellites in estimating aboveground biomass.

Model Comparison
We deployed three methods, i.e., RF, CNN and CNN-LSTM, to estimate forest aboveground biomass.Deep learning methods generally perform better than machine learning methods.Particularly, CNN-LSTM combines the advantages of both the CNN and LSTM algorithms, showing the best ability to fit complex relationships and reduce the misestimation of biomass.In the results of estimating forest aboveground biomass using nine types of data separately, the CNN-LSTM model achieved the best results in eight types of data, and the accuracies were all better than those of the RF model and the CNN model.Therefore, the potential of CNN-LSTM in estimating aboveground biomass is very great and deserves follow-up research.
It should also be noted that the models constructed on the basis of different data and methods generally have the phenomena of high underestimation and low overestimation.As shown in Figures 5-7, when the measured forest aboveground biomass is greater than 80 Mg/ha, the aboveground biomass predicted by the model has a large difference from the measured value and is smaller than the measured value, which is a high underestimation.On the contrary, when the measured forest biomass is less than 30 Mg/ha, the aboveground biomass predicted by the model is generally larger than the measured value, which is the low-value overestimation.When the aboveground biomass was measured, it was in the range of 30-80 Mg/ha, and there was less variation in the results of the various model estimates.This may be because in areas with low aboveground biomass, the vegetation cover is lower, and the surface is more exposed.More surface information is included in the information recorded by remote sensing images, which results in a mixing of image pixels and produces a low-value overestimation.In areas with high aboveground biomass, the vegetation cover is higher.When the pixel information is recorded, the remote sensing image tends to be saturated with information, which makes it impossible to estimate excessive aboveground biomass, thus resulting in the phenomenon of underestimation of high values.

Conclusions
In this paper, remote sensing modeling estimation of forest aboveground biomass was carried out based on multisource remote sensing data, including mainstream multispectral images (GF-6, Sentinel-2 and Landsat-8) and various SAR data (GF-3, Sentinel-1, ALOS-2 PALSAR-2) in China and abroad.Remote sensing features were extracted, and the Pearson correlation coefficient method was used to select the modeling factors.Forest biomass estimation models were constructed according to various machine learning and deep learning methods, and the estimation accuracies of the different models were compared and evaluated.The results revealed that for the SAR dataset, the biomass estimate accuracy of the L-band ALOS-2 data was higher than that of the GF-3 and Sentinel-1 data in the C-band.Comparing the biomass estimation modeling results of different optical data, the CNN-LSTM model combined the advantages of both the CNN and LSTM algorithms and showed a better ability to fit complicated relationships.Integrating data from different sources to estimate biomass can fully take advantage of these characteristics, complementing the advantages of individual sensors and thus improving the accuracy of the models.

21 Figure 1 .
Figure 1.(Left) Location map of the study area in Lin'an district (yellow boundary) of Hangzhou (purple boundary), northwestern Zhejiang Province, China.(Right) Google Maps image of the study area (yellow boundary).

Figure 1 .
Figure 1.(Left) Location map of the study area in Lin'an district (yellow boundary) of Hangzhou (purple boundary), northwestern Zhejiang Province, China.(Right) Google Maps image of the study area (yellow boundary).Remote Sens. 2024, 16, x FOR PEER REVIEW 5 of 21

Figure 2 .
Figure 2. Map of sample plots in the study area.

Figure 2 .
Figure 2. Map of sample plots in the study area.

21 Figure 3 .
Figure 3.The flowchart of the research.

Figure 3 .
Figure 3.The flowchart of the research.

Figure 5 .
Figure 5. Scatter plot of the biomass prediction results for the RF, CNN-LSTM and CNN algorithms based on radar data.The horizontal coordinates indicate the biomass observations, the vertical coordinates are the predicted values, the dashed black line is the 1:1 straight line, and the red line is the fitted line.

2 Figure 5 .
Figure 5. Scatter plot of the biomass prediction results for the RF, CNN-LSTM and CNN algorithms based on radar data.The horizontal coordinates indicate the biomass observations, the vertical coordinates are the predicted values, the dashed black line is the 1:1 straight line, and the red line is the fitted line.

8 Figure 6 .
Figure 6.Scatter plot of biomass prediction results from the RF, CNN-LSTM and CNN algorithms based on multispectral data.The horizontal coordinates indicate the observed biomass values, the vertical coordinates are the predicted values, the dashed black line is the 1:1 straight line, and the red line is the fitted line.

Figure 6 .
Figure 6.Scatter plot of biomass prediction results from the RF, CNN-LSTM and CNN algorithms based on multispectral data.The horizontal coordinates indicate the observed biomass values, the vertical coordinates are the predicted values, the dashed black line is the 1:1 straight line, and the red line is the fitted line.
(a) RF SAR data (b) CNN-LSTM SAR data (c) CNN SAR data

Figure 7 .
Figure 7. Scatter plots of the biomass prediction results of the synergistic inversion of multisource remote sensing data based on the RF, CNN-LSTM and CNN models, where the column indicates the results with the same model but different data source and the row denotes the results from the same dataset but with different models.

Figure 8 .
Figure 8. Spatial distribution of the forest aboveground biomass in the study area.

Figure 7 .
Figure 7. Scatter plots of the biomass prediction results of the synergistic inversion of multisource remote sensing data based on the RF, CNN-LSTM and CNN models, where the column indicates the results with the same model but different data source and the row denotes the results from the same dataset but with different models.

Figure 7 .
Figure 7. Scatter plots of the biomass prediction results of the synergistic inversion of multisource remote sensing data based on the CNN-LSTM and CNN models, where the column indicates the results with the same model but different data source and the row denotes the results from the same dataset but with different models.

Figure 8 .
Figure 8. Spatial distribution of the forest aboveground biomass in the study area.Figure 8. Spatial distribution of the forest aboveground biomass in the study area.

Figure 8 .
Figure 8. Spatial distribution of the forest aboveground biomass in the study area.Figure 8. Spatial distribution of the forest aboveground biomass in the study area.

Table 2 .
Information on the optical data used in the study.

Table 3 .
SAR data parameters used in the study.

Table 4 .
Extraction of waveform information and vegetation indices from multispectral images.

Table 5 .
Extraction of texture feature variables from multispectral images.

Table 6 .
Predictors extracted based on SAR data.

Table 12 .
Results of the screening of biomass predictors from combined multisource remote sensing data.

Table 12 .
Results of the screening of biomass predictors from combined multisource remote sensing data.