1. Introduction
The rise in global temperature and climate change are significantly contributed to by the consistent increase in atmospheric CO
2 concentration [
1,
2,
3]. Forests play a crucial role in both global climate change [
4] and the carbon cycle [
5]. Through photosynthesis [
1], the total amount of carbon sequestered in forest vegetation is approximately one-twelfth of the CO
2 in the atmosphere annually [
6]. The majority of this carbon is stored in forest biomass. Therefore, the accurate estimation of regional forest biomass is crucial for climate change studies, the global carbon cycle analysis, and the realization of a country’s commitments toward the ‘carbon neutrality‘ goal [
7]. Typically, forest biomass includes both aboveground and underground components. However, due to the challenges associated with collecting data on underground biomass in sample plots, the current forest biomass estimates primarily focus on aboveground biomass (AGB) research [
8].
Currently, remote sensing technology [
8] is the primary method for estimating AGB at the regional scale. Optical remote sensing and synthetic aperture radar (SAR) data are widely utilized for estimating AGB [
9]. SAR technology has overcome the limitations of traditional optical remote sensing [
10] and can penetrate the forest canopy, capturing information about the deeper canopy, branches, and trunks, thus providing valuable insights into the vertical structure characteristics of the forest [
11]. Consequently, SAR technology offers distinct advantages over optical remote sensing by effectively improving the saturation point and potential of estimating the forest AGB.
Dual-frequency SAR holds even greater potential for estimating AGB [
12], surpassing the limitations of optical remote sensing and single-frequency SAR in terms of the sensor capabilities and penetration. Studies have demonstrated that the accuracy of the backscatter coefficient retrieval of forest AGB based on multi-frequency SAR data is higher than that of single-frequency SAR data [
3,
13,
14,
15]. SAR polarization decomposition parameters display high sensitivity to changes in forest canopy biomass and AGB, significantly enhancing the accuracy and saturation point of biomass estimation from SAR data [
16,
17,
18,
19]. Noticeably, the dominant scattering mechanism and complementarity of dual-frequency SAR in the estimation of forest AGB have not been considered. These previous studies have not considered feature selection based on the advantages of estimating the ground AGB using dual-frequency bands of SAR data with the backscatter coefficients, polarization decomposition parameters, and scattering mechanism.
Moreover, it should be noted that the wavelengths and penetration capacities of SAR data exhibit variations across different frequency bands. Consequently, the region within the forest that interacts with different frequency SAR signals and the prevailing scattering mechanism within the forest are also subject to differentiation. Short-wave bands (K-, X-, C-band) primarily capture smaller elements about the forest canopy including leaves (needles) and small branches [
12], while long-wave bands such as the L-band and P-band exhibit strong penetration capabilities, enabling them to reach the ground through the canopy. The L-band and P-band interact with different parts of the forest including the main branches of the forest canopy, trunk, and ground [
20,
21,
22]. The polarization target decomposition method can describe the forest’s scattering mechanism. However, there is a lack of research on the analysis of the main scattering mechanism of SAR data in different frequency bands in the coniferous forest based on the results of polarization decomposition. Since the AGB saturation point estimated by SAR data is influenced by the band, polarization mode, and forest structure [
23], L-band data have emerged as the optimal choice for the accurate estimation of forest biomass [
24,
25]. The C-band is suitable for the estimation of the canopy biomass of a small structure (fine branches and needle) [
26]. Therefore, we used ALOS-2 data in the L-band and GF-3 data in the C-band, all of which have full-polarization, combining the advantages of the C- and L-bands to improve the potential of the estimation of AGB in dual-frequency.
With the increase in the number of multi-frequency bands and SAR biomass sensitivity factors, the limited number of forest plots inevitably leads to small samples and the ‘curse of dimensionality’, which results in information redundancy and affects the accuracy of the model [
27]. Therefore, feature selection and appropriate methods are crucial to achieve the robust and high-precision estimation of forest AGB using multi-band SAR. The stepwise regression method selects the parameters entering the model for biomass estimation effectively, outperforming unary linear models and simple logarithmic exponential models [
22,
28]. However, with the increase in the number of feature parameters, the stepwise regression method becomes inadequate in capturing the complex relationship between the forest aboveground biomass (AGB) and the features. Therefore, some studies have applied a non-parametric model based on machine learning algorithms such as fast iterative feature selection with K-nearest neighbors (KNN), support vector machine (SVM), artificial neural networks (ANNs), and random forest (RF) models to estimate forest AGB using multi-band spaceborne SAR data [
3,
15,
17,
29], and the estimation results are expected to outperform the multivariate linear regression model [
30]. However, only relying on machine learning algorithms for adaptive feature selection and modeling can lead to the problem of local optimal selection. This study employed feature selection based on the scattering mechanism and sensitivity analysis and utilized a non-parametric model that combines the advantage of dual-frequency SAR data. The sensitivity analysis explored the potential of the C-band and L-band SAR data for estimating the AGB in
Larix principis-rupprechtii Mayr forests. The three main objectives of the sensitivity analysis were to investigate the sensitivity of the backscatter coefficients and polarization decomposition parameters extracted from GF-3 and ALOS-2 SAR data concerning AGB estimation, determine whether these correlations could facilitate the identification of dominant scattering mechanisms within coniferous forests, and evaluate the efficacy of GF-3 and ALOS-2 SAR data for estimating the AGB specifically in pure
Larix principis-rupprechtii Mayr forested areas. The non-parametric model of the random forest adaptive genetic algorithm (RF-AGA) model [
31] was used, which combines feature selection and modeling into a single step. By using the minimum root mean square error (RMSE) of the model to control the optimal feature selection and modeling results, this approach avoids the problem of falling into the local optimal solutions effectively.
In existing studies on estimating the forest biomass using machine learning algorithms based on dual-frequency SAR data, the analysis of the advantages, complementarity, and dominant scattering mechanisms of dual-frequency SAR in forests has been neglected. In this context, this paper aims to achieve the following objectives:
- (1)
Estimate the dominant scattering mechanisms of the C-band and L-band data in the coniferous forest in north China.
- (2)
Assess the advantages and complementarity of the C-band and L-band data in estimating the AGB.
- (3)
Improve the potential and saturation point of estimating the forest AGB using feature selection combined C-band and L-band dominant scattering mechanisms, advantages, and sensitivity analysis based on dual-frequency SAR data through stepwise regression models and the non-parametric RF-AGA model.
4. Results
4.1. Result of Dominant Scattering Mechanism in C-Band and L-Band Data
In the method of three-component decomposition, the dominant scattering mechanism of the L-band and C-band in the
Larix principis-rupprechtii Mayr forest was volume scattering (the proportion of volume scattering for each decomposition method in the L-band and C-band ranges from 44–86% and 40–72%, respectively), followed by surface scattering, and the smallest proportion was double-bounce scattering (except for the VanZyl3 component decomposition result in the C-band) (
Figure 3). Regardless of the three decomposition method mentioned in this study, the proportion of volume scattering and double-bounce scattering in the C-band was smaller than in the L-band, and the proportion of surface scattering in the C-band was larger than that in L-band. In the C-band, the surface scattering mechanism was the dominant mechanism (accounting for 47%) in the VanZyl3 component decomposition results, followed by the volume scattering mechanism (accounting for 40%) and the double-bounce scattering mechanism (accounting for 13%). This is because the VanZyl3 component decomposition method improves the overestimation of volume scattering in the Freeman three-component decomposition method [
41], which reduces the proportion of volume scattering in the C-band in the VanZyl3 component and increases the proportion of surface scattering, making the VanZyl3 component decomposition results different from the other decomposition methods.
4.2. Result of Sensitivity Analysis
Among the backscatter coefficients of the C-band, only the backscatter coefficients of the cross-polarization channels (HV and VH) were more sensitive to the change in the AGB (
Table 1).
In the L-band, the backscatter coefficients of HV and VH were sensitive to changes in the forest biomass (
Table 2), and the backscatter coefficients of the co-polar channel (HH) were also sensitive to changes in the forest biomass (R = 0.6). Among the backscatter coefficients of GF-3 and ALOS-2 and their combinations, the highest correlation with AGB was the HV polarization channel (R = 0.49, R = 0.68). The sensitivity between backscatter and forest AGB depends on the frequency, polarization, and angle of incidence of the SAR data [
47]. In the combination of C-band backscatter coefficients, (HV + VH)/2 had the highest correlation with AGB (R = 0.48), but the combination of backscatter coefficients did not significantly improve the correlation with AGB, and the strongest correlation was the backscatter coefficient for the HV polarized channel. Among the combinations of backscatter coefficients in the L-band, HH × VV × X (X = (HV + VH)/2) had the highest correlation with the forest AGB (R = 0.7). The reason for the highest correlation of this combination is that this combination combined the backscatter information of four channels, which can contain more information about the interaction with AGB. The backscattering of the HV and VH channels in the C-band was almost entirely from the volume scattering of the forest in the canopy, and the backscattering coefficient was highly dependent on the parameters in the canopy [
48]. The backscattering of the L-band HV, VH, and HH channels basically came from the scattering in the canopy and trunks (the contribution of both the twig and needles was deemed insignificant) [
21,
22].
In the C-band, the volume scattering component exhibited the strongest correlation with the forest AGB across the various decomposition methods, followed by the surface scattering component, while the secondary scattering component showed the lowest correlation with biomass (
Table 3). Specifically, the volume scattering components from Freeman3, Anyang4, Yamaguchi4, and Pauli3 were found to be sensitive to changes in the forest AGB, with the volume scattering component from Freeman3 exhibiting the highest correlation with an R value of 0.38. Additionally, the C-band radar vegetation index (RVI) demonstrated a higher sensitivity to changes in the AGB with a correlation coefficient of 0.36.
Comparatively, the correlation between each scattering component of the L-band decomposition methods and forest AGB was higher than that observed for the C-band decomposition methods. However, the volume scattering components in both the C-band and L-band exhibited better sensitivity to forest AGB compared to other scattering components.
Table 4 shows the sensitivity analysis between the polarization decomposition parameters of the L-band and forest AGB. In the L-band, the volume scattering components (vol) from the Freeman 2, Freeman3, AnYang3, and Yamaguchi3 decomposition methods exhibited the strongest correlation with forest AGB. Specifically, the volume scattering component of the Freeman2 method in the L-band showed a higher correlation with the biomass compared to Freeman3 and Yamaguchi3, with an R value reaching 0.43.
For the L-band VanZyl method, all three components demonstrated a significant correlation with the forest AGB, indicating their sensitivity to changes in the AGB. Among these components, the secondary scattering component exhibited the highest sensitivity to biological changes, with an R value as high as 0.43, followed by the volume scattering component (R = 0.39), and finally, the surface scattering component (R = 0.33). Upon applying logarithmic transformations and combining the polarization decomposition parameters, their correlation with AGB improved to varying degrees. After the logarithmic change, the volume scattering parameter of the Freeman2 method exhibited the highest correlation with AGB (R = 0.46) and passed the significance test. Furthermore, the sensitivity of the L-band backscattering coefficient to AGB was found to be stronger than that of the polarization decomposition parameters.
4.3. Results of Feature Selection
The results of selecting the backscatter coefficients based on correlation analysis were as follows. In the L-band, a total of 13 backscatter coefficients and their combinations were retained: HH, HV, VH, VV, (HV + VH)/2, HH + X, HH + VV + X, VV + X, X-VV, HH-VV-X, HH × X HH × VV × X VV × X (X = (HV + VH)/2). In the C-band, a total of 11 backscatter coefficients and their combinations were retained: HH, HV, VH, VV, (HV + VH)/2, HH/X, VV/X, HH + X, HH-X, X-VV, and HH-VV-X.
Based on the principle of selecting the dominant scattering mechanism, the results of selecting the polarization decomposition parameters were as follows. The dominant scattering mechanisms were retained for each decomposition method in the C-band and L-band, while the secondary scattering parameters of the Freeman3 decomposition method in both the C-band and L-band were removed. Additionally, the helix scattering component in the Anyang4 and Yamaguchi4 decomposition methods as well as the secondary scattering parameters in the C-band Anyang3 decomposition method were excluded.
The results of selecting the polarization decomposition parameters based on correlation analysis were consistent with the results of the screening parameters based on the dominant scattering mechanism, and the parameters that dominated the scattering were also biomass sensitive factors, which passed the significance test. A total of 91 parameters were selected based on prior knowledge (the dominant scattering mechanism and sensitivity analysis) including 51 C-band parameters and 40 L-band parameters (
Table 5).
4.4. Results of AGB Model and Validation
The improved forest aboveground biomass model was created by combining the non-parametric model (random forest adaptive genetic algorithm model) and feature selection based on the dominant scattering mechanism and sensitivity analysis, utilizing the GF-3 data in the C-band and ALOS-2 data in the L-band. In order to validate the potential of using the combined C-band and L-band for estimating biomass, which is superior to using the C-band or L-band alone, we incorporated data from the C-band, L-band, and the combined C-band with L-band into the non-parametric and stepwise regression models. To validate the potential of improving the aboveground biomass (AGB) estimation in the model by first selecting parameters based on the scattering mechanism and sensitivity analysis, we separately input the data from the combined C-band and L-band as well as the data selected based on the parameters of the C-band and L-band into the non-parametric and stepwise regression models.
Figure 4 shows the validation results of AGB estimation using the random forest adaptive genetic algorithm (RF-AGA) model and stepwise regression model. The results indicate that the RF-AGA model that selected parameters through the dominated scattering mechanism and sensitivity analysis by incorporating 91 parameters of both the C-band and L-band data achieved the highest accuracy in estimating the forest AGB, with the lowest root mean square error (RMSE) of 10.42 t/ha and the best fit (R
2 = 0.93). Moreover, this model demonstrated the most effective parameter simplification, with the final model including only eight parameters representing the Yamaguchi3 volume scattering component, entropy 1 of H/A/alpha eigenvalue set decomposition, lg(Freeman3_vol), VanZly3_Dbl for the L-band and
,
, Freeman2_vol, and VanZyl_Odd for the C-band. Compared to models that solely estimated the AGB using either the C-band or L-band parameters, the model combining the C-band and L-band parameters exhibited superior estimation performance within the biomass range of 0–200 t/ha. At biomass levels exceeding 200 t/ha, the model underestimated AGB, although no significant saturation phenomenon was observed. The inclusion of 91 parameters from the C-band and L-band, selected based on prior knowledge (dominated scattering mechanism and sensitivity analysis), into the stepwise regression model (R
2 = 0.72) for estimating the forest AGB yielded superior results compared to the stepwise regression model without feature selection (176 parameters from the C-band and L-band), particularly for high biomass levels exceeding 150 t/ha. Overall, feature selection with scattering mechanisms and sensitivity analyses improved the accuracy of the biomass estimates for both the linear and non-parametric models using C-and L-band data.
The model that utilized C-band and L-band data, considering the backscattering and polarization parameters, demonstrated superior performance in estimating the biomass compared to solely using the C-band or L-band data, and can effectively resolve the problem of significant overestimation observed when using individual C-band or L-band data within the biomass range of 50–150 t/ha as well as the issue of notable underestimation within the biomass range of 150–220 t/ha.
This study concluded that the non-parametric (RF-AGA) model based on feature selection using the C-and L-band data was the best model when compared with the stepwise regression model and the RF-AGA model, which was not selected with the dominant scattering mechanisms and sensitivity analysis.
Figure 5 shows the forest biomass map using the RF-AGA model based on feature selection.
The results of model validation (
Table 6) indicates that the estimation model for AGB using the C-band and L-band demonstrated higher levels of fitting and accuracy compared to using either the C-band or L-band models separately.
In general, the predicted aboveground biomass (AGB) values of the models were consistently higher than the ground-based measured AGB values within the biomass range of 0–100 t/ha, resulting in positive bias values. However, it should be noted that for the RF-AGA algorithm’s CL model, the individual predicted values were lower than the ground-based measured AGB values within the 0–50 t/ha AGB range, yielding negative bias values. For the biomass range of 100–220 t/ha, the predicted values of the stepwise regression model incorporating backscattering and polarization decomposition as well as the RF-AGA algorithm’s models were consistently lower than the ground-based measured biomass values. An exception to this was the RF-AGA model using the C- and L-band data, which slightly overestimated the ground-based measured values within the 100–150 t/ha biomass range with a bias of 0.08. At the level of 200–220 t/ha, the model bias value was higher than other biomass levels, the reason being that there were less training samples, resulting in a poor fitting effect (
Table 7).
These results indicate that the models incorporating prior knowledge-selected feature exhibited smaller biases compared to models without parameter selection. This suggests that the combination of parameter selection based on scattering mechanisms and sensitivity analysis, followed by the utilization of machine learning RF-AGA algorithm, leads to improved fitting performance for aboveground biomass estimation.
6. Conclusions
In this study, the backscattering coefficients and polarization decomposition parameters of the C- and L-band SAR data were used to apply the non-parametric model (RF-AGA) based on feature selection of the dominant scattering mechanism and sensitivity analysis to improve the accuracy and efficiency of dual-frequency SAR estimation of forest AGB under the condition of limited sample plots. The dominant scattering mechanism of the L-band and C-band in the Freman3 component, AnYang3 component, and Yamaguchi3 component was volume scattering, followed by surface scattering, and the proportion of double-bounce scattering was the smallest. In the VanZyl three-component, the C-band dominant scattering mechanism was surface scattering, followed by volume scattering, and double-bounce scattering accounted for the smallest proportion. The L-band estimation of the scattering mechanism of the North China larch has the advantage that it can fully interact with the main branches and trunks of the tree canopy. The backscattering coefficients of the L-band HV, VH, and HH three polarization channels and the parameters of the Vanzyl3 component decomposition method are more suitable for estimating the aboveground biomass. The advantage of the C-band is that it can fully interact with the needles and small branches in the canopy. The HV VH backscatter coefficient, the Pauli three-component decomposition method, and the VanZyl3 component decomposition method are more suitable for the C-band estimation of the forest aboveground biomass. Combining the estimation advantages of the C-band and L-band can more comprehensively reflect the information of the fine structures, main branches, and trunks in the forest canopy, and can better improve the saturation point of estimating the forest AGB. The fitting degree and accuracy of the biomass model estimated by using the machine learning RE-AGA algorithm selected by prior knowledge (forest-dominated scattering mechanism and biomass sensitivity analysis) of the C-band and L-band data were better than those of SAR parameters without. After selection, the fitting effect of directly entering the model was good, and it was verified by the leave-one-out cross validation method where RMSE = 10.42 t/ha and R2 = 0.93. However, this study only used 50 sample plots. Although the results are convincing, more observational data are needed to verify the results. This study introduced a research method that was applied to coniferous plantations in northern China, with the potential for adaptation to other forest types, pending further validation and analysis. According to the specific forest type and combining the advantages of the scattering mechanism of different frequency bands in the forest and the advantages of the forest biomass sensitive factors, analysis was conducted to improve the potential of the non-parametric model to estimate the biomass of different forest types. This research method can also be applied to regional-scale biomass estimation research in different climate zones and regions in the future, and the dual-band SAR data can be extended to multi-band SAR data by selecting the appropriate frequency band data according to the forest biomass level and forest density in the study area. Combining the feature selection and non-parametric model of dual-frequency SAR to estimate temperate forest biomass can lay the foundation for the quantitative estimation of regional forest biomass and also become an important part of the global carbon cycle model and climate change research.