Estimating Forest Aboveground Carbon Storage in Hang-Jia-Hu Using Landsat TM / OLI Data and Random Forest Model

: Dynamic monitoring of carbon storage in forests resources is important for tracking ecosystem functionalities and climate change impacts. In this study, we used multi-year Landsat data combined with a Random Forest (RF) algorithm to estimate the forest aboveground carbon (AGC) in a forest area in China (Hang-Jia-Hu) and analyzed its spatiotemporal changes during the past two decades. Maximum likelihood classiﬁcation was applied to make land-use maps. Remote sensing variables, such as the spectral band, vegetation indices, and derived texture features, were extracted from 20 Landsat TM and OLI images over ﬁve di ﬀ erent years (2000, 2004, 2010, 2015, and 2018). These variables were subsequently selected according to their importance and subsequently used in the RF algorithm to build an estimation model of forest AGC. The results showed the following: of classiﬁcation results maximum likelihood can extract land information e ﬀ ectively. Our land cover classiﬁcation overall between 86.86% and 89.47%. (2) Additionally, our RF models showed good performance in predicting forest AGC, with R 2 from 0.65 to 0.73 in the training and testing phase and a RMSE range between 3.18 and 6.66 Mg / ha. RMSEr in the testing phase ranged from 20.27 to 22.27 with a low model error. (3) The estimation results indicated that forest AGC in the past two decades increased with density at 10.14 Mg / ha, 21.63 Mg / ha, 26.39 Mg / ha, 29.25 Mg / ha, and 44.59 Mg / ha in 2000, 2004, 2010, 2015, and 2018. The total forest AGC storage had a growth rate of 285%. (4) Our study showed that, although forest area decreased in the study area during the time period under study, the total forest AGC increased due to an increment in forest AGC density. However, such an e ﬀ ect is overridden in the vicinity of cities by intense urbanization and the loss of forest covers. Our study demonstrated that the combined use of remote sensing data and machine learning techniques can improve our ability to track the forest changes in support of regional natural resource management practices.

Hang-Jia-Hu is located in the northwestern part of Zhejiang Province, China, ranging from 118°50′15′′ E to 121°19′6′′ E and from 29°42′52′′ N to 31°11′53′′ N ( Figure 1). Its climate is subtropical monsoon and the annual average temperature is 15-18 °C, with an average annual precipitation of about 1100 mm. The 18.1 million hectares study area administratively covered the entire Huzhou City and Jiaxing City and the northeastern part of Hangzhou City. Most forest in this place is distributed in the southwest of the study area with the main forest types being broad-leafed forest (BLF), coniferous forest (CNF), and bamboo forest (BMF).

Processing Landsat Times Series Products
30-m multispectral data of Landsat5 TM (2000,2004,2010) and Landsat8 OLI (2015, 2018) were downloaded from the United States Geological Survey (USGS). We selected cloud-free images from the years with ground observations for four scenes that cover our study area (Table 1). Few scenes image contain a high amount of cloud, but the region covering the study area is cloudless. Additionally, due to the poor image quality from 2014, we used the images in 2015 to correspond with field data in 2014.

Processing Landsat Times Series Products
30-m multispectral data of Landsat5 TM (2000,2004,2010) and Landsat8 OLI (2015, 2018) were downloaded from the United States Geological Survey (USGS). We selected cloud-free images from the years with ground observations for four scenes that cover our study area (Table 1). Few scenes image contain a high amount of cloud, but the region covering the study area is cloudless. Additionally, due to the poor image quality from 2014, we used the images in 2015 to correspond with field data in 2014.
Satellite remote sensed data are easily influenced by water vapor, aerosol, bidirectional reflection, and data transmission, which will result in serious fluctuations of time series data and influence the desired effect in data analysis [50,51]. Therefore, this study applied the Fast line-of-Sight Atmospheric Analysis of Spectral Hypercubes (FLAASH) [52][53][54] to eliminate such atmospheric interference in each image. A digital elevation model (DEM) [55,56] was used to make terrain corrections, as terrain factors may affect the brightness values of original imagery. In addition to the original image bands, we calculated vegetation indices and texture variables to use as input data to the forest AGC model. The vegetation indices [57][58][59] included in this study included: NDVI (Normalized Difference Vegetation Index), SAVI (Soil Adjusted Vegetation Index), SR (Simple Ratio Index), DVI (Difference Vegetation Index), and EVI (Enhanced Vegetation Index). Additionally, the texture variables based on the gray-level co-occurrence matrix (GLCM) [60,61] included Mean, Variance, Homogeneity, Contrast, Dissimilarity, Entropy, Angular second moment, and Correlation [27,58,59,[62][63][64] with different windows (3 × 3, 5 × 5, 7 × 7, 9 × 9, and 11 × 11) [27,65,66]. There were 251 total variables derived with the details that are shown in Table 2.
is the ith row of the jth column in the Nth moving  [20,27] in Zhejiang province. The investigation method of NFI was systematic sampling, which is usually evenly placed at the intersection of kilometer grids of 1:50,000 topographic maps. Each plot size is 28.5 m × 28.5 m. Forest AGC data that were used in 2018 were derived from a field survey in July, 2019. Each plot of 30 m × 30 m was placed to cover a homogenous area of bamboo, inside which the number of trees and average diameter at breast height (DBH) were measured. After calculating the bamboo biomass of each plot by using the growth model, bamboo AGC was calculated by using the conversion coefficients between biomass and carbon stocks. Classification verification data in 2018 were manually and evenly selected from the result of unsupervised classified data and visual interpretation.
Land use in this paper was classified into six types: urban, water, cultivated land (CTL), BLF, CNF, and BMF based on the data of the NFI and field survey [26]. Maximum likelihood [68,69] was applied to make the land use classification map. The classification training samples in this paper are uniformly and evenly selected by the visual interpretation based on the land use spectral reflectance characteristics. The training samples of each land use type were consistent (400 pixels) and they could cover the study area. The total number of classification verification samples in 2000, 2004, 2010, 2014, and 2018 were 171, 175, 191, 156, and 240, respectively, and the numbers of forest verification samples were 120, 116, 137, 106, and 160. Table 3 details forest AGC plots used to construct the AGC estimation model from 2000 to 2018.

Random Forest and AGC Estimation
Breiman first made formal definition of the RF in 2001, which is a bagging of uncorrelated CART trees learned with randomized node optimization [70][71][72]. First, the algorithm generates N bootstrap samples for the training dataset. Second, a regression tree model is built for each bootstrap sample. Finally, the predicted results are obtained by averaging the predictions from all individual regressions [20]. RF has three important parameters: (1) the number of random regression trees (Ntree); (2) number of variables to be randomly sampled at each node in a tree (Mtry), used to search for the variable that best partitions samples in the training data set and the default number is 1/3 of input variables [73]; and, (3) the minimum number of terminal nodes (Nodesize) where the default value is 5 in regression analysis [8,49].
Forest AGC samples of different years were divided into two groups to optimize the RF model in this study (Table 3), with one group of data accounting for 70% samples represented training samples of the RF model and the remaining 30% represented test samples to test the model accuracy. Training data were also used to evaluate the importance of input variables to forest AGC and select input variables for constructing the RF model. Forest in this paper includes three types and each of them has corresponding forest AGC plots in 2000, 2004, and 2010. In 2014 and 2018, the AGC plots only have the forest type of BMF. Therefore, we decided to use data in 2000, 2004, and 2010 to construct individual AGC estimation models by using the forest non-stratification method [65] and applied the corresponding model to simulate forest AGC in 2000, 2004 and 2010. Subsequently, these three models are applied to predict forest AGC in 2015 and 2018 and the BMF AGC plots in 2014 and 2018 are used for verification. The estimation result in 2015 and 2018 with the highest accuracy in the verification phase will be chosen as the final simulation result. In this paper, we used "RandomForest" package in the R statistical software [74,75] to construct the forest AGC model. During the process of applying the model established, we put the entire image data into the model and run it, and use Python to read the model result into image format, and then perform spatial analysis.

Accuracy Assessment
The determination coefficient (R 2 ), root mean square error (RMSE), and relative RMSE (RMSEr) [27,65] were used to evaluate the accuracy of the RF model. A higher accuracy is indicated by the higher values of R 2 and lower values of RMSE and RMSEr. R 2 , RMSE, and RMSEr were calculated, as follows: In Equations (1)   The determination coefficient (R 2 ), root mean square error (RMSE), and relative RMSE (RMSEr) [27,65] were used to evaluate the accuracy of the RF model. A higher accuracy is indicated by the higher values of R 2 and lower values of RMSE and RMSEr. R 2 , RMSE, and RMSEr were calculated, as follows:  is the average value of observed plots; is the average value of forest AGC in testing samples; and, i is the number of samples.    Table 4 shows the overall accuracy of classification by using confusion matrix. It illustrates that the overall classification accuracies in different years were above 86.86%, and the highest was 89.47% with a kappa coefficient above 0.84 and a high of 0.87. The classification accuracies of the BLF, BMF, and CNF were between 86.67% and 89.62%. Classification verification results show a high level of precision. Based on this high precision classification result, we extracted BLF, BMF and CNF separately and use it to mask the forest AGC estimation results.

Parameters Optimization of RF
Training data were used to input into RF to traverse all of the variables values and finally obtain the optimal parameters. Figure 3 shows the results. Figure 3a presents Mtry, which is used to determine the minimum amounts of variables in each tree to construct RF. As can be seen from Figure 3a, when Mtry is a certain value, the model error is the smallest, which is the minimum Mtry value that is required. Ntree in Figure 3b shows that model error tends to be stable when the Mtry is big enough. Additionally, we used a 10-fold cross-validation to observe the effect of the number of variables on the model error. Results in Figure 3c represent the variation trend of average model errors in this process. It shows that the variables in the model are not as good as possible. When the model traverses all possibilities, the model error will tend to be stable until a certain amount, when the value of the model error reaches a minimum. Table 5 lists specific settings for different parameter values of different models.  Table 4 shows the overall accuracy of classification by using confusion matrix. It illustrates that the overall classification accuracies in different years were above 86.86%, and the highest was 89.47% with a kappa coefficient above 0.84 and a high of 0.87. The classification accuracies of the BLF, BMF, and CNF were between 86.67% and 89.62%. Classification verification results show a high level of precision. Based on this high precision classification result, we extracted BLF, BMF and CNF separately and use it to mask the forest AGC estimation results.

Parameters Optimization of RF
Training data were used to input into RF to traverse all of the variables values and finally obtain the optimal parameters. Figure 3 shows the results. Figure 3 (a) presents Mtry, which is used to determine the minimum amounts of variables in each tree to construct RF. As can be seen from Figure  3 (a), when Mtry is a certain value, the model error is the smallest, which is the minimum Mtry value that is required. Ntree in Figure 3 (b) shows that model error tends to be stable when the Mtry is big enough. Additionally, we used a 10-fold cross-validation to observe the effect of the number of variables on the model error. Results in Figure 3 (c) represent the variation trend of average model errors in this process. It shows that the variables in the model are not as good as possible. When the model traverses all possibilities, the model error will tend to be stable until a certain amount, when the value of the model error reaches a minimum. Table 5 lists specific settings for different parameter values of different models.

Variable Importance and Autocorrelation
The RF model in different years was ran 100 times each to observe changes in variables importance. We listed the first 20 variables with importance scores of different models in Figure 4. Among all of the variables used in the three models, most of the variables are texture features, while the original band and vegetation index account for a small proportion. Upon viewing of variable selection in different models, we found that 80% of the variables are the mean of the texture features in 2000 model. It reveals that the texture mean at different windows of the original band is important for estimating forest AGC. In the 2004 model, the sixth band of the original band occupies 45% of the overall variables, followed by the third band at 40%. Additionally, the texture window of 11*11 accounts for 65% of the total variables. It illustrated the importance of the texture features of the third and sixth band at 11 × 11 windows to the estimation. In the 2010 model, the mean of the texture features occupies 55% of the total and the sixth band has a selection probability of 35%, which exceeds other variables. As for the importance scores, in the model of 2000, the influence of Tb2w7ME on forest AGC estimation is the largest at an average of 14.83, followed by Tb2w5ME at 14.73. For 2004, Tb311HO has the largest influence of 8.66, followed by Tb3w9DI with score at 8.61. Tb6w7VA has the largest effect on forest carbon prediction at a score of 13.67 and is followed by Tb6w11ME at 12.01.

Variable Importance and Autocorrelation
The RF model in different years was ran 100 times each to observe changes in variables importance. We listed the first 20 variables with importance scores of different models in Figure 4. Among all of the variables used in the three models, most of the variables are texture features, while the original band and vegetation index account for a small proportion. Upon viewing of variable selection in different models, we found that 80% of the variables are the mean of the texture features in 2000 model. It reveals that the texture mean at different windows of the original band is important for estimating forest AGC. In the 2004 model, the sixth band of the original band occupies 45% of the overall variables, followed by the third band at 40%. Additionally, the texture window of 11*11 accounts for 65% of the total variables. It illustrated the importance of the texture features of the third and sixth band at 11 × 11 windows to the estimation. In the 2010 model, the mean of the texture features occupies 55% of the total and the sixth band has a selection probability of 35%, which exceeds other variables. As for the importance scores, in the model of 2000, the influence of Tb2w7ME on forest AGC estimation is the largest at an average of 14.83, followed by Tb2w5ME at 14.73. For 2004, Tb311HO has the largest influence of 8.66, followed by Tb3w9DI with score at 8.61. Tb6w7VA has the largest effect on forest carbon prediction at a score of 13.67 and is followed by Tb6w11ME at 12.01. Figure 4. Importance of the first 20 variables measured in %InMSE (the percentage increase in the mean squared error) from 100 runs of the RF. Note: Tbiwjxx, a texture image developed using the texture measure xx (xx can be such texture measures as ME, VA, HO, CON, DI, EN, SE, COR) on spectral band i (bandi) with a window size of j × j pixel (wj).
We calculated the frequency of the top 100 variables in the different RF models to further understand the proportion of variables in the models. In the 2000 RF model, texture information accounted for 92%, of which the fourth band has the most texture information, while the highest window size is w9, and ME accounts for 28.3% of all texture information. Texture information in the 2004 model occupies 98% of the total variables. The number of textures in the third band is up to 24.5%. W7 and ME have a major advantage in terms of window size and texture value. In 2010, 93% of the texture information in the first band accounted for 27.6%, and w7 is also the window with the most selection. The ME value has the same status as the previous two models in selection to establish forest AGC models.
RF, which is widely used to estimate forest parameters while using remote sensing data, can effectively solve the multicollinearity problems of complex variables in traditional statistical regression models [20,76]. Figure 5 shows the collinear relationship between the first 20 variables. A weak correlation indicates that the RF model could solve the collinearity between variables and the overfitting problem to ensure the accuracy of the forest AGC estimation. We selected the optimal Figure 4. Importance of the first 20 variables measured in %InMSE (the percentage increase in the mean squared error) from 100 runs of the RF. Note: Tbiwjxx, a texture image developed using the texture measure xx (xx can be such texture measures as ME, VA, HO, CON, DI, EN, SE, COR) on spectral band i (bandi) with a window size of j × j pixel (wj).
We calculated the frequency of the top 100 variables in the different RF models to further understand the proportion of variables in the models. In the 2000 RF model, texture information accounted for 92%, of which the fourth band has the most texture information, while the highest window size is w9, and ME accounts for 28.3% of all texture information. Texture information in the 2004 model occupies 98% of the total variables. The number of textures in the third band is up to 24.5%. W7 and ME have a major advantage in terms of window size and texture value. In 2010, 93% of the texture information in the first band accounted for 27.6%, and w7 is also the window with the most selection. The ME value has the same status as the previous two models in selection to establish forest AGC models.
RF, which is widely used to estimate forest parameters while using remote sensing data, can effectively solve the multicollinearity problems of complex variables in traditional statistical regression models [20,76]. Figure 5 shows the collinear relationship between the first 20 variables. A weak correlation indicates that the RF model could solve the collinearity between variables and the overfitting problem to ensure the accuracy of the forest AGC estimation. We selected the optimal number of variables and the optimal variables to build models for different years based on the optimized parameters. number of variables and the optimal variables to build models for different years based on the optimized parameters.

Estimation and Evaluation of Forest AGC
We extracted the AGC estimation for the forest covers based on the classification results of the maximum likelihood method and simulation results of the RF model established above ( Figure 6).

Estimation and Evaluation of Forest AGC
We extracted the AGC estimation for the forest covers based on the classification results of the maximum likelihood method and simulation results of the RF model established above ( Figure 6). number of variables and the optimal variables to build models for different years based on the optimized parameters.

Estimation and Evaluation of Forest AGC
We extracted the AGC estimation for the forest covers based on the classification results of the maximum likelihood method and simulation results of the RF model established above ( Figure 6).  The model performance is evaluated with scatterplots of predicted AGC against the observed data (Figure 7). In the training phase (A) using data from 2000, 2004, and 2010, the models yielded R 2 that ranged from 0.69 to 0.73 (p < 0.01) and RMSE from 3.18 Mg/ha to 4.84Mg/ha. In the testing phase

Spatiotemporal Evolution of Forest AGC
The total forest area in the study region decreased in the past two decades according to the land use classifications in different years. However, forest AGC has gradually increased from 2000 to 2018 in many areas, especially in the west of Hangzhou and southwest of Huzhou. We calculated statistics of forest area, forest AGC, and total AGC storage to understand the change and relationship between forest area, forest AGC, and the total forest AGC storage in Hang-Jia-Hu during the past two dacades (Figure 8). The total forest area only increased from 2000 to 2004 and then gradually decreased afterwards. The total amount of forest area was reduced by 77,713.92 ha with the largest declines

Spatiotemporal Evolution of Forest AGC
The total forest area in the study region decreased in the past two decades according to the land use classifications in different years. However, forest AGC has gradually increased from 2000 to 2018 in many areas, especially in the west of Hangzhou and southwest of Huzhou. We calculated statistics of forest area, forest AGC, and total AGC storage to understand the change and relationship between forest area, forest AGC, and the total forest AGC storage in Hang-Jia-Hu during the past two dacades (Figure 8). The total forest area only increased from 2000 to 2004 and then gradually decreased afterwards. The total amount of forest area was reduced by 77,713.92 ha with the largest declines occurring following 2004 and 2010. The forest AGC density increased 3.4 times, from 10.14 Mg/ha in 2000 to 44.59 Mg/ha in 2018, and the growth rate was the highest between 2015 and 2018. The total forest AGC storage did not decline due to the reduction of forest area, but it followed the same trend as forest AGC density. It increased from 641.38 Mg C in 2000 to 2472.51 Mg C in 2018, with a growth rate of 285%. This reflects that forest AGC density is more influential on to the total forest AGC storage when compared to the forest area. occurring following 2004 and 2010. The forest AGC density increased 3.4 times, from 10.14 Mg/ha in 2000 to 44.59 Mg/ha in 2018, and the growth rate was the highest between 2015 and 2018. The total forest AGC storage did not decline due to the reduction of forest area, but it followed the same trend as forest AGC density. It increased from 641.38 Mg C in 2000 to 2472.51 Mg C in 2018, with a growth rate of 285%. This reflects that forest AGC density is more influential on to the total forest AGC storage when compared to the forest area.    occurring following 2004 and 2010. The forest AGC density increased 3.4 times, from 10.14 Mg/ha in 2000 to 44.59 Mg/ha in 2018, and the growth rate was the highest between 2015 and 2018. The total forest AGC storage did not decline due to the reduction of forest area, but it followed the same trend as forest AGC density. It increased from 641.38 Mg C in 2000 to 2472.51 Mg C in 2018, with a growth rate of 285%. This reflects that forest AGC density is more influential on to the total forest AGC storage when compared to the forest area.   Overall, forest AGC in most of the region is increasing, and the increase is mainly distributed in the western part of the study area. Furthermore, from the above data analysis, forest AGC is increasing during the past two decades. However, some areas showed a decrease in forest AGC, the reduction phenomenon. Such decline in forest AGC mainly occurred near the center of Huzhou and towards the south of Hangzhou in Fuyang District. The magnitude of decline was mainly between −10 Mg/ha and −20 Mg/ha. The reduction of forest AGC in these areas was mounted to 240,660.36 ha.   Overall, forest AGC in most of the region is increasing, and the increase is mainly distributed in the western part of the study area. Furthermore, from the above data analysis, forest AGC is increasing during the past two decades. However, some areas showed a decrease in forest AGC, the reduction phenomenon. Such decline in forest AGC mainly occurred near the center of Huzhou and towards the south of Hangzhou in Fuyang District. The magnitude of decline was mainly between −10 Mg/ha and −20 Mg/ha. The reduction of forest AGC in these areas was mounted to 240,660.36 ha.  Moreover, urban forests have received increasing attention in recent years due to their large C sequestration capability and the positive impact on the environment, especially in areas with rapid urban expansion. According to our land use classification, the urban area in the Hang-Jia-Hu region has expanded into suburbs of the cities over the past 20 years, especially around the metropolitan Moreover, urban forests have received increasing attention in recent years due to their large C sequestration capability and the positive impact on the environment, especially in areas with rapid urban expansion. According to our land use classification, the urban area in the Hang-Jia-Hu region has expanded into suburbs of the cities over the past 20 years, especially around the metropolitan areas, such as Hangzhou and Jiaxing. Extensive decreases of forest AGC were particularly evident in Huzhou City, but not clearly distinguishable in other cities. Therefore, to reveal more details regarding the urban forest AGC change, we used 15-m pan-sharpened Landsat images to analyze the changes of land use in Hangzhou City and Jiaxing City (Figure 11). We found that the urban growth took place in the periphery of Jiaxing and mainly towards the northwest of Hangzhou. In Hangzhou, forest AGC decreased along with intense urban growth in the northwest plains of the City. The decline of forest area due to urbanization reached 3157 ha, which corresponded to an AGC storage loss of 25,309 Mg C. In Jiaxing, forest AGC reduced in all surrounding areas, where the pattern is consistent with the direction of the urban growth. The forest coverage in Jiaxing was reduced by 3078 ha and the forest AGC by 20,701 Mg C as a result of urban growth. It is clearly seen that, although the urban expansion mainly replaced the cultivated lands, it also significantly affected forests.

Discussion
In our modeling exercise using RF, we found that land surface texture information is the most important variables of forest AGC. We calculated the correlation between the variables of the corresponding optimal number of variables and forest AGC in different models, and presented a scatter plot of the first eight variables and forest AGC in three models of the training phase to further understand the specific correlation between variables and forest AGC ( Figure 12). The results revealed that R 2 between the input variables in RF and forest AGC was in the range of 0.13 and 0.32 in the model of 2000, 2004, and 2010. It further reflected that the variables that are involved in RF are closely related with forest AGC, which means that texture information is one of the most essential factors in the process of simulating forest AGC. Furthermore, it also reflected that remote sensing technology, in combination with machine learning algorithms, are an effective means to monitor forest resources in a precise and timely manner at large spatial scales.
However, there are still defects and deficiencies. Firstly, the temporal mismatch of remote sensing and field survey data due to the lack of cloud free Landsat scenes corresponding to the exact times of ground observation can lead to estimation error in the results. Furthermore, different types of forest surfaces have similar spectral signatures, which could have resulted in misclassification [77]. The verification accuracy of the classification results indicates that the BMF is the main reason for the precision declining. Secondly, in the verification phase of simulation results, accuracy in 2015 and 2018 were all below the accuracy in the other three models by a degree. Parameter settings for different modeling due to different raw data vary from year to year. Additionally, different parameter settings will result in the model being only applicable to the simulation of forest AGC in the input data year, but not in other years. Finally, RF also has certain flaws in simulation due to the great requirements for input data. Spatiotemporal distribution of the simulation results show that forest AGC in different periods is concentrated in a certain range (fluctuating around the average of training data), which means that RF has a poor simulation effect on the data at both ends of a dataset, which is consistent with the research result of Gao et al. [65]. High simulation accuracy seems to show that RF is best suited to simulating AGC at certain range, and making the simulation results have the same data distribution as the input data is a critical step. Studies have shown that a combination of multi-source remote sensed data can improve the accuracy of estimation in forest biomass or forest AGC [78]. Furthermore, forest AGC can be simulated with other types of data, such as meteorological data, topography, and soil data [79]. The accuracy of forest AGC simulation can be further improved when considering multivariate remote sensing data and other various impact factors. technology, in combination with machine learning algorithms, are an effective means to monitor forest resources in a precise and timely manner at large spatial scales.

Conclusions
Based on the remotely sensed and ground survey data, we used the optimized RF models to simulate the distribution patterns and temporal changes of forest AGC in the Hang-Jia-Hu region. Classification by using a maximum likelihood method can present well the land use cover. Additionally, our models estimated forest AGC for the study area in different years with relatively high accuracies. The standardized residuals of the testing samples in different models (Figure 13) are all in the range of −2 to 2, which further illustrates that the optimized RF model has good stability and reliability in predicting forest AGC. With the estimation results, we found that, during the study period from 2000 to 2018, although the total forest area has decreased, the total AGC storage has significantly increased as a result of the increased AGC density. However, forest AGC has decreased near the cities due to the urban expansion in spite of the increase of forest AGC over the entire region.
is consistent with the research result of Gao et al. [65]. High simulation accuracy seems to show that RF is best suited to simulating AGC at certain range, and making the simulation results have the same data distribution as the input data is a critical step. Studies have shown that a combination of multi-source remote sensed data can improve the accuracy of estimation in forest biomass or forest AGC [78]. Furthermore, forest AGC can be simulated with other types of data, such as meteorological data, topography, and soil data [79]. The accuracy of forest AGC simulation can be further improved when considering multivariate remote sensing data and other various impact factors.

Conclusions
Based on the remotely sensed and ground survey data, we used the optimized RF models to simulate the distribution patterns and temporal changes of forest AGC in the Hang-Jia-Hu region. Classification by using a maximum likelihood method can present well the land use cover. Additionally, our models estimated forest AGC for the study area in different years with relatively high accuracies. The standardized residuals of the testing samples in different models (Figure 13) are all in the range of −2 to 2, which further illustrates that the optimized RF model has good stability and reliability in predicting forest AGC. With the estimation results, we found that, during the study period from 2000 to 2018, although the total forest area has decreased, the total AGC storage has significantly increased as a result of the increased AGC density. However, forest AGC has decreased near the cities due to the urban expansion in spite of the increase of forest AGC over the entire region.