4.1. Variable Selection
It is important to select the appropriate variables as input parameters for forest AGB estimation before modeling. The extracted variables included four main types: spectral indices, polarization indices, topographic factors, and texture features.
Our work found that the backscattering ratio (HH/HV) of ALOS2-PALSAR2 was highly sensitive to plant growth owing to its associated scattering mechanisms, which are in agreement with the study by Golshani et al. [
43]. They demonstrated that ALOS2-PALSAR2 L-band data are sensitive to forest AGB through deep transmissions into large woody branches, trunks, and ground surfaces, and can be used to estimate biomass accurately. Additionally, according to their study, the results also confirmed that SAR data are valuable for forest biomass mapping and should be employed along with optical images, which can be more efficient. Topographic factors (elevation) also played a vital role in this study because hydrothermal conditions varied across different regions. Our study found that the topographic factors developed from ALOS 12.5 DEM had high AGB estimation accuracy, which is consistent with the study by Karabork et al. [
44].
In our study, the texture variables (PC1 and mean77) showed great performance in AGB estimation, which is consistent with the studies by Zhou et al. [
45] and Li et al. [
46]. Our work found that textual extraction from PC1 images can reduce spectral saturation problems and has excellent performance in forest AGB estimation, which is consistent with the results of Su et al. [
47]. Zhou et al. [
45] pointed out that edge effects exist in the image classification field of textural information, which may reduce the classification accuracy, and the textural information can produce higher accuracy with increasing window size within the threshold. Li et al. mentioned that the texture variables extracted with 7 × 7 window sizes had good performance in AGB estimation in their study area of Hunan Province, and the results of this study align with our findings.
4.2. Comparison Between Models
In this study, we first compared the basic RF (R
2 = 0.59) and CNN-Transformer (R
2 = 0.69) models for the AGB estimation in Lishui City, Zhejiang Province. From the validation metrics based on ten-fold cross-validation, the latter (CNN-Transformer) was selected for subsequent Kriging interpolation as CNN-Transformer-OK (R
2 = 0.69) and CNN-Transformer-CK (R
2 = 0.72). As can be seen in
Table 10, the CK model outperformed the OK model and the basic CNN-Transformer model. Hence, the CK model was used as an additional AGB prediction accuracy improvement approach to the basic CNN-Transformer models as CNN-Transformer-CK. Finally, CNN-Transformer-CK was selected for the final Lishui forest AGB distribution map.
From the residuals obtained by each fold of the CNN-Transformer model, there was a trend that the small AGB observations were more likely to be overestimated and the large observations tended to be underestimated. The predicted value (>100 t/ha) obtained by this model was smaller than that of the observed AGB, which is consistent with the results of several studies [
48,
49,
50]. Golshani et al. [
43] validated in their study that the saturation point of ALOS2-PALSAR2 was approximately 300 t/ha. Hence, it can be inferred that the saturation problem that occurred in our study may be attributed to optical images. Su et al. [
47] and Gao et al. [
51] found that the saturation points of optical images range from 15 to 70 t/ha and that the vegetation index can mitigate the saturation problem. Therefore, it can be inferred that in our study, the combination of Sentinel-2 with ALOS-2 PALSAR-2 did not lead to a wide margin of underestimation and controlled the saturation within an acceptable range.
In this study, we found that using Kriging methods (CK and OK) can both slightly improve the prediction accuracy compared with the basic CNN-Transformer model, and it also supplies the gap of spatial correlation. We found that the co-Kriging of the CNN-Transformer model’s residuals performed better than the Ordinary Kriging method. According to this study’s results, the residuals extracted from the CNN-Transformer model did not show very high spatial correlation compared with previous studies, such as Jiang et al. [
23] and Chen et al. [
52], but using Kriging methods to optimize the prediction accuracy is necessary, similar to the study by Li et al. [
7].
This study showed that the distribution of AGB values is highly related to elevation, as shown in
Figure 11. The distribution of high AGB values is consistent with high-altitude areas. Previous studies have shown that AGB and elevation have similar trends [
53]. The R
2 and RI of the CK models outperformed those of the OK models in this study, indicating that the employment of elevation in mountainous regions can improve the interpolation accuracy. Additionally, the use of CK models can mitigate the saturation problem in the high AGB value region, which can be seen in
Figure 10a,c. By adding the CK model to the original CNN-Transformer model, the maximum AGB value increased and the minimum AGB value declined. Moreover, the relatively small number of sample points (398) in this large-scale study area and sample plots were not completely free from human interference.
Compared with the original deep learning method, CNN-Transformer, which only considers the extracted modeling variables, integrating the co-Kriging method into it takes the spatial autocorrelation into consideration and thus can improve the AGB estimation accuracy. Hence, the employment of the CK model in the original deep learning model is of great importance.
Cai et al. mentioned in their study that by integrating spatial weight, a method of geostatistics, and deep learning techniques, such as CNN-LSTM, the accuracy of NDVI prediction is improved because adding spatial weight is conducive for analyzing spatial nonstationality when it comes to complex spatial relationship modeling [
54,
55]. A recent study by Maryland University developed a Geo-RF framework to address spatial variability and improve crop classification accuracy [
56]. In future research, this new technique can be introduced for biomass estimation to explore its potential in this area.
The sample plots in this study employ a systematic sampling design based on a fixed coordinate grid (typically 4 km × 6 km in this region). In this paper, theoretical semivariogram models of the residuals for CK model analysis revealed a substantial effective range of 35.03 km. This indicates that the spatial autocorrelation of the residuals operates at a broad regional scale (likely driven by macro-topography or climatic gradients) rather than a micro-scale. Given that the sampling design is based on the NFI with an average sampling interval of approximately 6.6 km (derived from 398 plots over 17,300 km2), the sampling density is significantly finer than the detected range (sampling interval < range). Therefore, the spatial range supported by our design covers the regional scale. The plot grid is sufficiently dense to reliably capture and model this spatial structure, ensuring that the Kriging interpolation is a valid inference.
The residual correction is most effective in continuous forest areas where residuals exhibit consistent spatial structure. In contrast, in highly fragmented zones or areas with extreme topographic discontinuity, the spatial autocorrelation of residuals may be disrupted. Lishui is characterized by strong elevation gradients and heterogeneous disturbance patterns, which may introduce non-stationarity and abrupt transitions in residuals. Such conditions can violate the intrinsic stationarity and continuity assumptions implicit in the Kriging method, reducing Kriging effectiveness and potentially leading to overly smoothed corrections in fragmented landscapes. Co-Kriging with elevation can partially account for terrain-driven gradients, but it does not fully resolve non-stationarity caused by management and disturbance mosaics. We will explore this issue in our future research.
The hybrid approach allows the deep learning (DL) model to handle the complex non-linear trends, while geostatistics captures the remaining spatially structured errors that the DL model fails to explain. This complementary relationship corrects local biases. We think that the data requirement is a critical constraint. Geostatistical integration strictly requires precise geographic coordinates (X, Y) for every sample plot to calculate the distance matrix and semivariogram.
While the proposed CNN-Transformer-CK model successfully reduces the estimation error for AGB, we recognize that reducing errors for all forest attributes via remote sensing remains a challenge. Remote sensing maps often struggle to capture the full spectrum of detailed information required by inventories, such as specific understory species composition or precise forest health pathology. Therefore, field survey and remote sensing mapping approaches should be viewed as complementary rather than competing.
Traditional field surveys provide high-fidelity data on complex forest attributes that are difficult to invert remotely. In contrast, the wall-to-wall maps generated by our approach bridge the spatial gaps between discrete sample plots. By integrating the high-quality AGB estimates from this study into operational inventories, forest managers can achieve a hybrid monitoring system that combines the detailed accuracy of field plots with the comprehensive spatial coverage of remote sensing maps.
4.3. The Effects on Policy
Lishui City endured great human interference between 2008 and 2017, mainly in the southwestern and northeastern parts of Lishui City. According to a study by Xiong et al. [
57], these two parts of the city have a large number of residential areas and are more susceptible to landslides, and these areas are experiencing relatively serious forest loss. The areas surrounding densely populated towns, Longquan City, and Suichang County, have a high degree of forest disturbance. These areas have exerted significant pressure and caused considerable damage to forests due to urbanization, industrialization, and transportation infrastructure development. As the most economically developed area in Lishui City, Liandu District has experienced severe forest disturbances in some of its regions.
In order to maintain the forest area and conserve the ecosystem, the Lishui government prepares a restoration project in three years (2021–2023) with a total investment of 5.53 billion RMB. To date, the ecological environment has significantly improved, forest quality has effectively enhanced, biodiversity is increasingly rich, and economic benefits continue to improve. In the past three years, the city’s forest carbon storage has increased by 8%, with a total amount of 62 million tons, ranking first in Zhejiang Province in terms of carbon sequestration capacity. Lishui was selected as one of the first national forestry and carbon sink pilot cities by the National Forestry and Grass Administration.
Beyond the spatial distribution of forest resources, the high-quality AGB estimates achieved in this study hold significant practical value for forest management and policy implementation. Precise biomass quantification is a prerequisite for the accurate accounting of forest carbon sinks. Lishui City is a national pilot city for forestry carbon sinks. In the context of carbon trading and ecological compensation mechanisms, reducing the estimation error directly minimizes the uncertainty in carbon stock calculations. This high confidence level allows for more accurate pricing and trading of carbon credits, ensuring the economic realization of ecological values.
The direct effect of altitude is a key mechanism influencing the biomass of Masson pine forests [
58]. Masson pine is the main forest type in Lishui City. The biomass of Masson pine and the Shannon index have indirect interactions [
59]. Biomass is also related to canopy density. High forest biomass usually means high species richness, and an increase in species diversity can enhance the overall resource utilization efficiency of a community, thereby improving productivity [
60]. Regions with high AGB values demonstrated high species richness. According to the AGB spatial distribution pattern obtained in this study, ecological scientists can obtain valuable information based on this map. Additionally, there is a high chance of intricate ecological interactions in the high AGB value region, which is more suitable for scientific exploration.
Based on the results of this study, the spatial distribution of forest AGB in Lishui City could be effectively predicted using diverse potential predictor variables that were extracted from appropriate image processing technology applied to Sentinel-2 bands, its vegetation indices and textures, ALOS-2 PALSAR-2 polarization features, and topographic factors. Therefore, the findings of this study are beneficial to the scientific advancement of localized forest management strategies and can offer a foundation for regional land-use planning and forest resource management.
Although this study primarily focuses on the status assessment of forest AGB for the year 2020, we recognize that the forest ecosystem is subject to continuous change due to natural growth and anthropogenic disturbances. The high-quality AGB baseline established in this study provides a solid foundation for further research. Moving forward, we are actively extending this work to long-term time-series analysis in Lishui city. We are currently exploring and developing a cascade-based deep learning model designed to effectively enhance change detection.