Estimating Forest Carbon Stock Using Enhanced ResNet and Sentinel-2 Imagery

Ren, Jintong; Liu, Lizhi; Wu, You; Ouyang, Lijian; Yu, Zhenyu

doi:10.3390/f16071198

Open AccessArticle

Estimating Forest Carbon Stock Using Enhanced ResNet and Sentinel-2 Imagery

by

Jintong Ren

^1,2,

Lizhi Liu

^3,*,

You Wu

⁴,

Lijian Ouyang

¹ and

Zhenyu Yu

⁵

¹

School of Ecological Engineering, Guizhou University of Engineering Science, Bijie 551700, China

²

Guizhou Key Laboratory of Plateau Wetland Conservation and Restoration, Bijie 551700, China

³

Chinese Academy of Forestry, Beijing 100091, China

⁴

School of Mathematics and Systems Science, Shenyang Normal University, Shenyang 110034, China

⁵

Faculty of Computer Science and Information Technology, Universiti Malaya, Kuala Lumpur 50603, Malaysia

^*

Author to whom correspondence should be addressed.

Forests 2025, 16(7), 1198; https://doi.org/10.3390/f16071198

Submission received: 13 June 2025 / Revised: 15 July 2025 / Accepted: 18 July 2025 / Published: 20 July 2025

(This article belongs to the Special Issue Mapping and Modeling Forests Using Geospatial Technologies)

Download

Browse Figures

Versions Notes

Abstract

Accurate estimation of forest carbon stock is critical for understanding ecosystem carbon dynamics and informing climate mitigation strategies. This study presents a deep learning framework that integrates Sentinel-2 multispectral imagery with an enhanced residual neural network for estimating aboveground forest carbon stock in the Liuchong River Basin, Bijie City, Guizhou Province, China. The proposed model incorporates multiscale residual blocks and channel attention mechanisms to improve spatial feature extraction and spectral dependency modeling. A dataset of 150 ground inventory plots was employed for supervised training and validation. Comparative experiments with Random Forest, Gradient Boosting Decision Trees (GBDT), and Vision Transformer (ViT) demonstrate that the enhanced ResNet achieves the best performance, with a root mean square error (RMSE) of 23.02 Mg/ha and a coefficient of determination (

R^{2}

) of 0.773 on the test set. Spatial mapping results further reveal that the model effectively captures fine-scale carbon stock variations across mountainous forested landscapes. These findings underscore the potential of combining multispectral remote sensing and advanced neural architectures for scalable, high-resolution forest carbon estimation in complex terrain.

Keywords:

carbon sink; forest; estimation; remote sensing; multi-spectral image

1. Introduction

Forests play a pivotal role in the global carbon cycle by serving as major carbon sinks, thereby contributing to climate regulation, ecological stability, and terrestrial carbon sequestration [1,2,3,4,5]. Through photosynthesis, forests absorb atmospheric

{CO}_{2}

and store it as organic carbon in biomass and soils, mitigating anthropogenic greenhouse gas emissions [6,7,8,9,10]. According to Pan et al. [11,12,13,14], global forests sequester approximately 2.4 ± 0.4 PgC annually—over one-fifth of global fossil fuel emissions—underscoring their essential role in international climate mitigation efforts [15,16,17,18,19]. As nations strive toward carbon neutrality under the Paris Agreement, accurate and spatially explicit forest carbon stock estimation has become increasingly vital for national emission inventories, REDD+ implementation, and sustainable forest management [20,21,22,23].

China, aiming for peak carbon emissions by 2030 and achieving carbon neutrality by 2060, faces growing demands for scalable, high-resolution carbon monitoring solutions—particularly in ecologically fragile and topographically complex regions such as Guizhou Province [24,25,26,27,28]. The Liuchong River Basin, located in Bijie City, Guizhou, exemplifies this challenge, with pronounced elevation gradients, heterogeneous vegetation patterns, and significant spatial variability in forest structure [29,30,31,32,33]. These factors complicate traditional forest inventory and modeling approaches [34,35,36,37,38]. Historically, forest carbon stock estimation has depended on field-based inventories and allometric equations [39,40,41,42], which, despite their plot-level accuracy, are labor-intensive and lack scalability. The advent of remote sensing has enabled broader spatial monitoring, with vegetation indices and topographic features derived from medium-resolution imagery (e.g., Landsat) commonly used in conjunction with statistical and machine learning algorithms (e.g., multiple linear regression, Random Forest, GBDT) for biomass estimation [43,44,45,46,47,48]. However, the 30 m spatial resolution of Landsat limits its capacity to capture fine-scale canopy heterogeneity in rugged mountainous terrain, thus affecting model precision [49,50].

The Sentinel-2 satellite constellation offers new opportunities for high-resolution forest carbon mapping, delivering multispectral imagery at 10–20 m resolution with a 5-day revisit cycle [51,52,53,54]. Its broad spectral range—including visible, red-edge, and shortwave infrared bands—has proven effective for monitoring vegetation structure and health [55,56,57]. While several studies have leveraged Sentinel-2 imagery for forest biomass estimation using machine learning methods [58,59,60,61], challenges remain regarding model generalizability across heterogeneous landscapes and complex topography.

To address these challenges, deep learning has emerged as a powerful tool for geospatial modeling [62,63,64,65]. Convolutional Neural Networks (CNNs), especially residual networks (ResNet), have demonstrated strong performance in remote sensing tasks due to their capacity for multiscale feature learning and end-to-end modeling [12,66,67,68,69]. Nevertheless, the application of deep architectures to forest carbon estimation remains underexplored, particularly with respect to spectral–spatial optimization in mountainous ecosystems.

In this study, we propose a deep learning framework for forest carbon stock estimation that integrates Sentinel-2 multispectral data with an enhanced ResNet architecture. Our objective is to improve estimation accuracy and spatial adaptability under complex terrain conditions. The main contributions of this work are as follows:

Enhanced deep regression model: We develop a modified ResNet incorporating multiscale residual blocks and channel attention mechanisms to strengthen spatial context learning and spectral feature extraction for carbon modeling.
Multisource dataset construction: We assemble a regional forest carbon stock dataset by integrating Sentinel-2 spectral variables with field-based carbon density measurements from 150 inventory plots in the Liuchong River Basin, Guizhou.
Comprehensive performance evaluation: We benchmark the proposed model against a suite of classical and advanced models (RF, GBDT, ViT, Swin Transformer), demonstrating its superiority in accuracy and generalization across heterogeneous forest conditions.

To guide this study, we formulated the following four research questions:

Can an enhanced ResNet architecture improve the accuracy of forest carbon stock estimation compared to conventional machine learning and deep learning models?
How effectively can multispectral Sentinel-2 data represent structural and spectral heterogeneity of forest ecosystems in mountainous terrain?
Is the proposed model generalizable across different forest types and elevation zones within the study area?
What is the spatial mapping capability of the proposed framework in producing continuous, high-resolution carbon stock estimates?

2. Materials

2.1. Study Area

This study focuses on the Liuchong River Basin, located in Bijie City, in the northwestern part of Guizhou Province, China (Figure 1). The basin lies on the eastern edge of the Yunnan–Guizhou Plateau and is part of the Wujiang River system, which ultimately drains into the Yangtze River. The Liuchong River itself is a major tributary of the northern source of the Wujiang River, with a total length of 273 km and a drainage area of approximately 10,874

{km}^{2}

. The river has a multi-year average discharge of 176 cubic meters per second. The study area is characterized by typical karst mountainous topography, with elevations ranging from approximately 1200 to over 2800 m. It experiences a subtropical monsoon climate with distinct wet and dry seasons, an annual average temperature of 12–1

7^{\circ} C

, and total annual precipitation ranging from 900 to 1300 mm, with most rainfall concentrated between May and September.

The diverse terrain and microclimatic variability support a wide array of forest types, including subtropical evergreen broadleaf forests, coniferous forests, and mixed forest ecosystems. In recent years, the Liuchong River Basin has seen large-scale ecological restoration and reforestation under national greening programs, resulting in a complex and heterogeneous forest structure. The area’s rich vegetation, pronounced elevation gradients, and availability of ground-based forest inventory data make it a representative site for evaluating remote sensing-based methods for forest carbon stock estimation in mountainous environments.

2.2. Data Sources

This study integrates ground-based forest inventory measurements with Sentinel-2 multispectral imagery to estimate plot-level aboveground carbon stock in the Liuchong River Basin, Guizhou Province.

(1) Ground-Based Forest Inventory Data: Field data were collected from 150 circular sample plots (15 m radius, approximately 707

m^{2}

) strategically distributed across the county’s representative forest ecosystems. Tree-level attributes such as species, diameter at breast height (DBH ≥ 5 cm), total height, and crown width were recorded using standardized protocols (see Figure A1). GNSS receivers with sub-meter accuracy were used for geolocation.

Aboveground biomass (AGB) per tree was estimated using species-specific allometric equations tailored to regional forest types. Plot-level AGB was then aggregated and converted to carbon stock (Mg/ha) using the standard IPCC biomass-to-carbon conversion factor of 0.5. Each plot was spatially matched to a 640 m × 640 m Sentinel-2 patch centered on its geolocation, which also served as the input region for model training. The sampled plots cover a broad range of vegetation types, including Pinus yunnanensis, Quercus variabilis, and mixed deciduous stands, as well as an elevation gradient from 2000 m to 2800 m, thus ensuring ecological diversity and modeling robustness.

The 150 sample plots cover a representative range of forest types in the Liuchong River Basin, including mixed deciduous–coniferous stands, evergreen coniferous forests (e.g., Pinus yunnanensis), and broadleaf deciduous forests (e.g., Quercus variabilis). The estimated aboveground biomass (AGB) values derived from the inventory plots span from approximately 40 to 150 Mg/ha, ensuring that the training dataset captures both low-density and high-density forest conditions. This diversity supports robust model learning across ecological gradients.

(2) Sentinel-2 Multispectral Imagery: Sentinel-2 Level-2A surface reflectance data were retrieved from the Copernicus Open Access Hub. Four cloud-free scenes captured between May and October 2022 were selected to minimize seasonal and atmospheric effects and composited into a seamless mosaic of the study area.

Preprocessing included atmospheric correction (Sen2Cor), cloud and shadow masking using the Scene Classification Layer (SCL), and bilinear resampling of all bands to a consistent 20 m resolution. A total of 10 spectral bands with high relevance to forest structure and biomass estimation were retained, including visible, red-edge, near-infrared (NIR), and shortwave infrared (SWIR) bands, as detailed in Table 1. Specifically, we implemented a Cosine Correction (C-correction) model using the 30-meter SRTM Digital Elevation Model (DEM). This method normalizes the reflectance of each pixel by computing the cosine of the solar incidence angle relative to the local terrain slope and aspect, thereby adjusting reflectance values to compensate for varying illumination conditions.

2.3. Data Preprocessing

To ensure spatial consistency and model-readiness, both ground-based and satellite data underwent the following preprocessing procedures:

(1) Ground Truth Preprocessing: Carbon stock values were standardized to Mg/ha and spatially aligned to the center of a 32 × 32 pixel region (640 m × 640 m) extracted from the Sentinel-2 mosaic. This patch size provides sufficient spatial context for learning forest structure. Only plots with high data quality and complete field attributes were retained. Plots with evident outliers or georegistration errors were excluded.

(2) Sentinel-2 Imagery Preprocessing: All satellite imagery was preprocessed using the following steps:

Resampling: All 10 selected spectral bands were resampled to a uniform 20 m resolution via bilinear interpolation.
Cloud and Shadow Masking: The Scene Classification Layer (SCL) was used to mask cloud and shadow pixels. Only classes corresponding to vegetation, bare soil, or water (SCL classes 4–7) were retained.
Mosaicking: A median composite was generated from four cloud-free Sentinel-2 scenes to produce a radiometrically consistent mosaic.
Normalization: Pixel values were normalized to the [0, 1] range using min-max normalization:

x^{'} = \frac{x - x_{\min}}{x_{\max} - x_{\min}}

(1)

(3) Patch Extraction: From the preprocessed Sentinel-2 mosaic, a 32 × 32 × 10 cube was extracted for each sample plot, covering an area of 640 m × 640 m with 10 spectral bands. These patches were used as input features, with the corresponding carbon stock values serving as regression targets. All preprocessing was conducted in Python 3.9 using the rasterio, GDAL, and NumPy libraries. The final dataset was split into training (70%), validation (15%), and test (15%) subsets through stratified sampling to ensure balanced representation across forest types and elevations.

3. Methodology

This study proposes a deep learning framework for estimating forest carbon stock using Sentinel-2 multispectral imagery and an enhanced residual network. The overall workflow is illustrated in Figure 2 and consists of three main components:

(1) Input Preparation: 32 × 32 multispectral image patches are extracted around each field plot from preprocessed Sentinel-2 composites. Each patch includes 10 spectral bands and aligns spatially with the corresponding ground measurements.

(2) Model Architecture: An enhanced ResNet is constructed by integrating multiscale residual blocks and channel attention modules. This design improves the network’s ability to model both spatial heterogeneity and spectral relevance for carbon estimation.

(3) Model Training and Evaluation: The model is trained in a supervised regression setting using ground-based carbon stock values. Performance is assessed using multiple evaluation metrics and compared against conventional machine learning and deep learning baselines.

3.1. Input Feature Construction

The model input comprises multispectral image patches extracted from Sentinel-2 Level-2A surface reflectance products. For each ground inventory plot, a patch of size

32 \times 32 \times 10

was generated, corresponding to a

640 \times 640 m^{2}

area centered on the plot location. The spatial resolution of all bands was unified to 20 m through bilinear resampling to ensure consistency.

Ten spectral bands were selected based on their relevance to vegetation and biomass estimation: visible (B2–B4), red-edge (B5–B7), near-infrared (B8, B8A), and shortwave infrared (B11, B12). These bands capture key canopy attributes such as structure, pigment concentration, and moisture content.

To normalize reflectance values and improve model convergence, min-max scaling was applied to each spectral band independently, transforming pixel values to the range

[0, 1]

based on their global minimum and maximum within the study area.

Cloud and shadow pixels were masked using the Scene Classification Layer (SCL). Only cloud-free samples were retained. The final input set ensures accurate spatial alignment between Sentinel-2 imagery and ground-truth plots, providing reliable input for carbon stock modeling. It is important to note that geographic coordinates (latitude and longitude) were not included as input features in the model. They were used solely for spatial alignment between ground inventory plots and Sentinel-2 image patches during the preprocessing stage. This design choice ensures that the model relies entirely on spectral information rather than spatial proxies, thus improving ecological interpretability and avoiding the inclusion of location-dependent artifacts.

3.2. Enhanced ResNet Architecture

The proposed model builds upon ResNet-18 [67], a lightweight convolutional neural network known for its residual learning capabilities, which help mitigate gradient degradation in deep networks. To adapt ResNet-18 for the task of carbon stock estimation in complex forest environments, we introduce two key enhancements: multiscale residual blocks and a channel attention mechanism.

3.2.1. Multiscale Residual Block (MSRB)

Forested landscapes, especially in mountainous regions, exhibit substantial spatial heterogeneity due to variations in canopy structure, forest type, elevation, and topographic complexity. Standard residual blocks in ResNet typically rely on a single convolutional scale (e.g.,

3 \times 3

) and may struggle to capture such diverse spatial patterns. To address this, we introduce a Multiscale Residual Block (MSRB) (see Figure 3) that explicitly extracts features at multiple spatial resolutions. Each MSRB contains three parallel convolutional branches:

A standard $3 \times 3$ convolution for fine-grained local details (e.g., leaf texture, small shadows);
A $5 \times 5$ convolution for medium-scale patterns such as tree crowns and canopy gaps;
A dilated $3 \times 3$ convolution (dilation rate = 2) for capturing broader contextual cues (e.g., tree clusters, terrain contours).

The outputs of the three branches are concatenated along the channel dimension and fused using a

1 \times 1

convolution to reduce dimensionality and enable feature interaction:

F_{ms} = {Conv}_{1 \times 1} ([{Conv}_{3 \times 3} (X), {Conv}_{5 \times 5} (X), {DilatedConv}_{3 \times 3} (X)])

(2)

where

X

is the input feature map and

[\cdot]

denotes channel-wise concatenation. This multiscale structure offers several advantages:

It allows the network to capture both local and global spatial patterns, improving robustness to canopy size variation and forest fragmentation.
It enhances the residual pathway with multi-receptive field aggregation, which improves spatial representation without a significant increase in parameters.
It contributes to stable gradient propagation and faster convergence due to preserved residual connections.

In summary, MSRB empowers the model to effectively extract hierarchical spatial features under diverse ecological conditions, thereby improving the accuracy and generalization of carbon stock estimation across heterogeneous forested landscapes.

3.2.2. Channel Attention Mechanism

In multispectral remote sensing imagery, not all spectral bands contribute equally to the estimation of forest carbon stock. Certain bands—such as the red-edge (B5–B7), near-infrared (B8, B8A), and shortwave infrared (B11, B12)—are particularly sensitive to vegetation structure, biomass, and moisture content. To enable the model to focus on these informative bands while suppressing irrelevant or noisy ones, we integrate a channel-wise attention mechanism based on the Squeeze-and-Excitation (SE) block [70] (see Figure 4) after each MSRB module.

The SE block performs two main operations: squeeze and excitation. The squeeze operation aggregates spatial information from each channel using global average pooling (GAP), producing a compact channel descriptor that captures global contextual information:

z_{c} = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} F_{ms}^{(c)} (i, j), for c = 1, \dots, C

(3)

where H and W denote the height and width of the feature map, and

F_{ms}^{(c)}

is the c-th channel output from the MSRB. Next, the excitation step passes the squeezed descriptors through two fully connected (FC) layers with a ReLU activation followed by a sigmoid gating function:

s = σ (W_{2} \cdot ReLU (W_{1} \cdot z))

(4)

Here,

s \in R^{C}

is the learned attention weight vector for all C channels, and

σ (\cdot)

ensures the weights are bounded in

[0, 1]

. Finally, the attention weights are applied to the MSRB output through channel-wise multiplication:

F_{att}^{(c)} = F_{ms}^{(c)} \cdot s_{c}

(5)

This mechanism allows the network to dynamically recalibrate the importance of each spectral band based on global contextual cues, thus enhancing the representation of critical biophysical signals while suppressing redundancy or noise. Importantly, this design preserves spatial information and introduces minimal computational overhead, making it well-suited for high-dimensional multispectral inputs. By combining spatial richness from MSRB with spectral selectivity from SE modules, the model is better equipped to extract meaningful vegetation characteristics for accurate and robust forest carbon stock estimation.

3.2.3. Regression Head

The final stage of the Enhanced ResNet is the regression head, which transforms the high-dimensional attention-enhanced feature maps into a scalar estimate of forest carbon stock for each input patch. This head is designed to be both efficient and expressive, ensuring that spatially and spectrally processed features are effectively aggregated for prediction. Specifically, the feature map

F_{att} \in R^{C \times H \times W}

is first passed through a global average pooling (GAP) layer to generate a compact vector representation. This operation collapses the spatial dimensions by computing the average of each channel, resulting in a feature vector

f \in R^{C}

:

f_{c} = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} F_{att}^{(c)} (i, j), for c = 1, \dots, C

(6)

This step ensures that the model aggregates spatial context from the entire input region while preserving the learned importance of each channel from the attention mechanism. The resulting vector

f

serves as a compact global descriptor of the input patch, summarizing both structural and spectral information. Next, the descriptor is passed to a fully connected (FC) layer for final prediction. This linear transformation maps the feature vector to a scalar output

\hat{y}

, representing the estimated carbon stock (in Mg/ha):

\hat{y} = w^{T} \cdot f + b

(7)

where

w \in R^{C}

and

b \in R

are the learnable weights and bias of the regression layer. This minimalist yet effective regression design allows the model to remain lightweight while capturing rich information for precise estimation. The use of GAP also helps mitigate overfitting by reducing the number of parameters and enforcing global spatial aggregation, which is particularly useful when dealing with limited field inventory data.

Overall, the regression head completes the end-to-end mapping from raw multispectral image patches to carbon stock values, enabling pixel- or patch-level prediction suitable for large-scale forest monitoring and carbon accounting applications.

3.2.4. Model Efficiency

The Enhanced ResNet contains approximately 13.2 million parameters, striking a practical balance between model complexity and computational efficiency. While incorporating both multiscale residual blocks and channel attention modules increases its representational capacity compared to the standard ResNet-18, the architecture remains lightweight enough to be deployed in real-world remote sensing scenarios. From a computational perspective, the model avoids deepening the network unnecessarily. Instead, it leverages architectural innovations—such as parallel convolutions and efficient squeeze-and-excitation units—that add only a modest overhead in terms of parameters and FLOPs (floating-point operations). These components selectively enrich feature representations without incurring the high memory costs typically associated with deeper or transformer-based architectures.

The use of global average pooling and a single linear regression head further contributes to model compactness. Unlike dense fully connected layers commonly used in high-capacity models, this design significantly reduces overfitting risk and ensures smooth convergence during training, even with moderate-sized ground truth datasets. Moreover, the model is compatible with standard hardware, including mid-range GPUs and edge devices, making it feasible for large-scale inference on satellite images or integration into operational forest monitoring systems. This is particularly important for applications in developing regions or mountainous forest zones, where computational resources may be limited.

In summary, the Enhanced ResNet achieves high predictive performance without sacrificing efficiency, making it well-suited for scalable carbon stock estimation in remote sensing workflows that demand both accuracy and deployability.

3.3. Regression Strategy and Optimization

The goal of this framework is to estimate forest carbon stock (Mg/ha) for each inventory plot based on input multispectral image patches. The model is trained to minimize the discrepancy between predicted and observed carbon values.

3.3.1. Loss Function

We use the Mean Squared Error (MSE) as the regression loss function:

L_{MSE} = \frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}

(8)

where

y_{i}

is the ground-truth carbon stock,

{\hat{y}}_{i}

is the predicted value, and N is the number of samples. MSE is suitable for continuous regression tasks and penalizes larger prediction errors more heavily.

3.3.2. Optimization

Training is performed using the Adam optimizer, with an initial learning rate of 0.0005 and a batch size of 64. To avoid overfitting, early stopping is applied with a patience of 20 epochs, and training is capped at 200 epochs.

3.3.3. Dataset Split

The dataset is divided into training (70%), validation (15%), and test (15%) sets. Stratified sampling ensures proportional representation of forest types and elevation zones in each subset, supporting fair performance evaluation and generalization.

3.3.4. Implementation

Experiments are conducted on a workstation equipped with an NVIDIA RTX 4090 GPU. Training metrics and model checkpoints are recorded automatically to support reproducibility and performance monitoring.

3.4. Evaluation Metrics

To quantitatively evaluate model performance in carbon stock estimation, we employ three standard regression metrics: Coefficient of Determination (

R^{2}

), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE). These metrics collectively capture the model’s explanatory power, prediction accuracy, and error robustness.

3.4.1. Coefficient of Determination ( $R^{2}$ )

R^{2}

measures the proportion of variance in the ground truth explained by the model:

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}}

(9)

where

\bar{y}

is the mean of the ground truth values. A higher

R^{2}

indicates a better model fit.

3.4.2. Root Mean Squared Error (RMSE)

RMSE quantifies the standard deviation of prediction errors:

RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}

(10)

It is sensitive to large errors and reflects the overall prediction accuracy.

3.4.3. Mean Absolute Error (MAE)

MAE captures the average magnitude of absolute errors:

MAE = \frac{1}{N} \sum_{i = 1}^{N} | y_{i} - {\hat{y}}_{i} |

(11)

Unlike RMSE, MAE treats all errors equally and is less influenced by outliers.

All metrics are computed on the test set and used to compare the proposed model against baseline methods. Higher

R^{2}

and lower RMSE/MAE values indicate better predictive performance.

4. Results

4.1. Performance Comparison

To evaluate the effectiveness of the proposed Enhanced ResNet architecture for forest carbon stock estimation, we conducted a comprehensive comparison with three representative baseline models: Random Forest (RF), Gradient Boosting Decision Trees (GBDT), and Vision Transformer (ViT). All models were trained and tested on the same dataset splits described in the methodology section and evaluated using three widely adopted regression metrics: Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the Coefficient of Determination (

R^{2}

).

As shown in Table 2, the Enhanced ResNet achieved the best overall performance across all evaluation metrics. Specifically, it yielded an RMSE of 23.02 Mg/ha, an MAE of 17.15 Mg/ha, and an

R^{2}

of 0.773 on the test set. In contrast, the ViT baseline reached an RMSE of 29.55 Mg/ha and an

R^{2}

of 0.719. Compared to ViT, the Enhanced ResNet reduced RMSE by approximately 22.1% and improved

R^{2}

by 7.5%. Similar improvements were observed over the GBDT and RF models, with particularly significant reductions in MAE. These gains reflect the effectiveness of incorporating multiscale residual blocks, which enhance spatial feature representation, and Squeeze-and-Excitation (SE) modules, which enable adaptive weighting of informative spectral channels such as red-edge and SWIR bands.

Overall, these results highlight the superior capability of the Enhanced ResNet in capturing complex spatial–spectral dependencies within high-resolution Sentinel-2 imagery, enabling more accurate and reliable estimation of forest carbon stocks in topographically and ecologically heterogeneous regions like the Liuchong River Basin. The consistent performance gain across all metrics suggests that deep learning models, when carefully designed to incorporate both spatial and spectral priors, can offer substantial advantages over traditional machine learning and transformer-based alternatives in ecological remote sensing applications.

4.2. Spatial Distribution of Estimates

To qualitatively assess the spatial prediction performance of each model, we visualized the estimated aboveground carbon stock (Mg/ha) over representative subregions in the Liuchong River Basin. Figure 5 presents a visual comparison across six models: (a) true-color satellite imagery, (b) ground-truth carbon stock maps derived from field measurements, and predictions from (c) Enhanced ResNet (ours), (d) ViT, (e) GBDT, and (f) RF. Each subregion was sampled to reflect different landscape types, including mountainous forests, mixed vegetation zones, and anthropogenic areas.

The visual comparison highlights clear differences in spatial fidelity and predictive coherence. Traditional models like RF and GBDT tended to produce overly coarse and spatially fragmented predictions. These models frequently failed to detect fine-scale variability and exhibited systematic underestimation in high-carbon areas, especially along ridgelines and densely forested valleys. The ViT model offered better spatial continuity than tree-based methods but introduced unnatural block artifacts due to its fixed patch-based encoding, which lacked explicit spatial overlap.

In contrast, the Enhanced ResNet generated predictions that more closely matched the ground truth, both in value distribution and spatial structure. It effectively delineated boundaries between high- and low-carbon zones, preserved continuous forest patterns, and responded well to variations in canopy density and elevation. Notably, our model showed robustness in both high-resolution and complex terrain subregions, likely due to the multiscale residual blocks and channel-wise attention mechanisms that enhanced spatial representation and spectral discrimination.

Overall, the Enhanced ResNet provides visually coherent and topographically consistent carbon stock maps, supporting its suitability for fine-scale forest carbon monitoring in ecologically diverse and mountainous landscapes.

4.3. Error Analysis

To further assess model robustness and generalization, we conducted an in-depth error analysis from three perspectives: distributional characteristics of prediction errors, quantitative comparisons across regression metrics, and sensitivity to ecological heterogeneity.

4.3.1. Feature Importance Ranking

Figure 6 presents the ranked feature importance derived from the Enhanced ResNet model using permutation-based evaluation. Each bar represents the relative contribution of a specific input variable to the model’s predictive accuracy. Among the 17 input features, Tree Cover, Slope, and Longitude emerged as the top three most influential variables, with importance scores of 0.050, 0.046, and 0.045, respectively. These features are strongly linked to vegetation structure and terrain, which aligns with the ecological drivers of aboveground biomass variation in mountainous regions like the Liuchong River Basin. The importance of EVI and B4 further highlights the model’s reliance on spectral signals related to chlorophyll content and canopy reflectance in the red band.

In particular, red-edge (B5–B7) and shortwave infrared (B11, B12) bands also showed notable importance, indicating their strong contribution to representing canopy structure, moisture content, and vegetation density. These bands are less prone to saturation and are well-known for their sensitivity to forest biophysical properties, which supports the model’s capacity to capture structural heterogeneity in complex landscapes.

Interestingly, traditional vegetation indices such as NDVI and common spectral bands (e.g., B3, B7) showed relatively lower importance, suggesting that models incorporating more advanced architectures can shift reliance from conventional indices to more discriminative spatial–spectral cues. This result supports the design motivation of integrating channel attention mechanisms, which allow the network to prioritize informative bands and suppress redundant inputs. The balanced distribution of feature importance across topographic, spectral, and vegetation-related inputs also indicates that the model does not overfit to a narrow set of variables, improving its generalization in heterogeneous landscapes.

4.3.2. RMSE, MAE, and $R^{2}$ Comparisons

Figure 7 summarizes the comparative performance of all models on the test set across three widely used regression metrics: Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Coefficient of Determination (

R^{2}

). This unified bar chart provides a holistic view of model accuracy and consistency. The proposed Enhanced ResNet consistently outperforms baseline models across all metrics. Specifically, it achieves the lowest RMSE of 23.02 Mg/ha, significantly improving upon ViT (29.55 Mg/ha), GBDT (29.95 Mg/ha), and RF (30.27 Mg/ha). The addition of multiscale residual blocks (MSRBs) alone reduces RMSE to 25.36 Mg/ha, while integrating the Squeeze-and-Excitation (SE) module further refines the estimate, demonstrating the additive effect of attention-based spectral weighting.

For MAE, the Enhanced ResNet achieves 17.15 Mg/ha, a substantial reduction compared to 21.52 Mg/ha for ViT and over 27 Mg/ha for RF and GBDT. This reflects improved stability and reduced average deviation, which are crucial for operational carbon accounting. In terms of

R^{2}

, the Enhanced ResNet attains a score of 0.773, indicating the strongest agreement with ground-truth data. This represents a 7.5% improvement over ViT (0.719), and a 15.1% improvement over GBDT (0.643). Even without the SE module, the MSRB-only variant achieves a solid

R^{2}

of 0.741, underscoring the benefits of multiscale spatial modeling in capturing forest heterogeneity.

These results collectively validate that each architectural component contributes meaningfully to model accuracy and robustness. The consistent improvements across RMSE, MAE, and

R^{2}

highlight the advantage of combining spatial feature diversity and adaptive spectral weighting for forest carbon estimation in complex and heterogeneous landscapes.

4.3.3. Case Study of Extreme Prediction Error

To further assess model reliability in challenging ecological contexts, we analyzed individual test cases with large prediction deviations (>30 Mg/ha). One representative case involved a high-elevation plot at 2350 m, where the observed carbon stock was 128.1 Mg/ha. Traditional models significantly underestimated this value, with predictions from RF, GBDT, and ViT all falling below 80 Mg/ha. In contrast, the Enhanced ResNet predicted 124.6 Mg/ha, closely matching the ground truth.

Field records indicated that this plot was located in a dense, mature coniferous–broadleaf mixed stand with a multi-layer canopy structure. Such complex spatial and spectral patterns are typically difficult to capture using hand-crafted features or shallow models. The near-accurate estimation by the Enhanced ResNet highlights its strength in leveraging multiscale spatial cues and adaptive spectral attention to model forest heterogeneity. This case underscores the practical advantage of the proposed method in ecologically complex and high-biomass regions—conditions where accurate carbon accounting is often most critical.

4.4. Ablation Study

To evaluate the contribution of each architectural component in the proposed model, we conducted an ablation study comparing three configurations:

Baseline ResNet: The original ResNet-18 architecture without any modifications.
ResNet + MSRB: A variant in which the standard residual blocks are replaced with multiscale residual blocks (MSRBs) to capture spatial patterns across multiple receptive fields.
Enhanced ResNet (MSRB + SE): The full version combining MSRB with Squeeze-and-Excitation (SE) modules to introduce channel-wise spectral attention.

Table 3 presents the evaluation results using RMSE, MAE, and

R^{2}

on the test set. All models were trained under identical experimental conditions to ensure a fair and consistent comparison. The results demonstrate that both architectural enhancements significantly improve performance. The introduction of MSRB alone leads to a 4.05 Mg/ha reduction in RMSE compared to the baseline, confirming the importance of multiscale spatial feature extraction in heterogeneous forest environments. Incorporating SE modules provides additional gains by adaptively reweighting spectral channels, particularly enhancing the influence of informative bands such as red-edge and SWIR.

4.5. Regional Carbon Stock Mapping

To assess the spatial estimation capabilities of different models, we generated basin-wide carbon stock maps for the Liuchong River Basin using four representative approaches: Random Forest (RF), Gradient Boosting Decision Trees (GBDT), Vision Transformer (ViT), and the proposed Enhanced ResNet. As shown in Figure 8, each model produces spatially continuous carbon density estimates at a resolution of 640 m × 640 m, based on patch-wise predictions. While all models capture the general spatial trend of higher carbon densities in mountainous forested regions and lower values in valleys or agricultural zones, the level of detail and spatial coherence varies. The RF and GBDT outputs appear more fragmented and noisy, with abrupt transitions that lack ecological plausibility. ViT shows improved smoothness but tends to underestimate carbon stock in dense forest zones.

In contrast, the Enhanced ResNet yields the most coherent and ecologically consistent spatial patterns. High-carbon areas (e.g., >150 Mg/ha) are clearly identified in the northeast and central highlands—regions dominated by evergreen and mixed forests. The spatial distribution aligns well with known topography and forest cover, indicating the model’s superior ability to integrate spectral and spatial information. These maps demonstrate that deep learning models, particularly with architectural enhancements like multiscale and attention modules, are well-suited for large-scale carbon stock estimation in heterogeneous landscapes. The outputs are suitable for supporting applications in forest monitoring, climate policy design, and land management planning.

5. Discussion

5.1. Effectiveness of the Enhanced ResNet Architecture

The proposed enhancements to the ResNet backbone—namely multiscale residual blocks and channel attention modules—played a vital role in improving forest carbon stock estimation, particularly in complex mountainous terrains. The multiscale convolutions captured spatial features across varying receptive fields, which is essential for representing canopy heterogeneity and topographic complexity [66]. Meanwhile, the incorporation of Squeeze-and-Excitation (SE) modules enabled the model to learn adaptive spectral weighting, emphasizing informative bands such as the red-edge and shortwave infrared (SWIR), which are known to be sensitive to biomass, leaf area index, and water content [71,72]. These architectural improvements led to more accurate and spatially coherent predictions compared to standard ResNet and traditional baseline models. The model’s demonstrated reliance on red-edge and SWIR bands further validates the structural sensitivity of Sentinel-2 imagery, which plays a vital role in forest heterogeneity modeling.

It is worth noting that although the RMSE values across models are relatively close (ranging from 23 to 30 Mg/ha), the visual differences in spatial predictions are substantial. This is because RMSE is a global, aggregate metric that captures average predictive accuracy, but it lacks sensitivity to spatial consistency, edge preservation, and fine-scale variation. In contrast, deep learning models such as the Enhanced ResNet not only minimize overall error but also better preserve spatial structure, transitions between vegetation zones, and topographic gradients. These qualities are especially valuable in ecological applications where spatial coherence is critical for decision making and mapping accuracy.

Compared to traditional machine learning models such as Random Forest and GBDT, which rely heavily on handcrafted features and lack spatial awareness, the enhanced ResNet provides a unified end-to-end solution capable of capturing spatially heterogeneous patterns and nonlinear spectral dependencies. This advantage is especially pronounced in complex mountainous terrain, where conventional models struggle with fragmented predictions [62]. Additionally, in contrast to Transformer-based architectures like ViT, which often require large datasets and exhibit rigid patch encoding, our approach offers a more flexible and lightweight alternative while achieving superior accuracy and spatial coherence [69].

5.2. Comparison with Traditional Machine Learning Models

In comparison to classical machine learning methods such as Random Forest (RF) and Gradient Boosting Decision Trees (GBDT), the proposed enhanced ResNet outperformed across all evaluation metrics. Traditional models rely on handcrafted features and cannot capture local spatial dependencies [73], making them less effective for high-resolution multispectral imagery. In contrast, convolutional neural networks (CNNs) are capable of automatically learning hierarchical spatial patterns, which is critical for modeling the structural and spectral variability of forest ecosystems [74]. The enhanced ResNet’s strong generalization in heterogeneous subtropical forest environments—like the Liuchong River Basin—demonstrates the potential of deep learning approaches for operational forest carbon estimation. These results highlight the benefit of learning spatial and spectral features jointly in a hierarchical framework, which cannot be easily replicated by shallow learners or rule-based indices.

5.3. Limitations and Future Work

Despite its effectiveness, the proposed method has some limitations. First, the model was trained and validated solely within the Liuchong River Basin. Although this region exhibits diverse forest types and elevation gradients, geographic transferability to other regions remains untested. Domain adaptation or pretraining with large-scale regional data could help address this limitation [75]. Second, the use of single-date Sentinel-2 imagery may lead to temporal biases due to seasonal phenology or recent land-cover changes [76]. Future work should consider multi-temporal data fusion and the integration of complementary modalities such as LiDAR, radar, or high-resolution topographic information [77].

Third, Sentinel-2 multispectral bands are known to saturate in high-biomass conditions, which may reduce the model’s sensitivity and accuracy when estimating carbon stocks in dense or mature forest stands. This spectral saturation effect limits the representation of subtle structural variation beyond certain thresholds. Future studies may address this limitation by integrating active remote sensing data such as LiDAR or SAR, which provide direct measurements of canopy height and structure, and are less affected by saturation.

Lastly, incorporating uncertainty quantification and explainability tools (e.g., SHAP, Grad-CAM) would enhance the interpretability and decision support capabilities of forest carbon models.

In summary, while our study demonstrates promising results within a regional context, future efforts should aim to build more generalized and transferable frameworks that can scale across biomes and temporal settings. In addition, integrating biophysical priors and leveraging explainable learning mechanisms will be critical to bridging the gap between predictive performance and ecological insight.

6. Conclusions

This work contributes a practical deep learning framework that advances the state of the art in remote sensing-based forest carbon estimation, particularly under conditions of ecological complexity and data limitations. This study proposed an enhanced ResNet-based deep learning framework for estimating forest aboveground carbon stock using Sentinel-2 multispectral imagery in the mountainous region of the Liuchong River Basin, Guizhou Province. By incorporating multiscale residual blocks and channel attention modules, the model effectively captured spatial heterogeneity and spectral relevance across forest types. Experimental results demonstrated superior accuracy and spatial consistency compared to conventional machine learning methods and standard ResNet baselines. The findings highlight the potential of integrating satellite data and deep learning for scalable, high-resolution carbon estimation, offering practical implications for ecological monitoring and climate policy support in complex forested landscapes.

Author Contributions

Conceptualization, methodology, validation, formal analysis, visualization, writing—original draft preparation, J.R.; writing—review and editing, L.L., Y.W., L.O. and Z.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by High-level innovative talents in Guizhou Province (BKRCH [2024] No.7), The Science and Technology Project of Bijie city of open competition mechanism to select the best candidates (Grant No: BKHZDZX[2023]1), Dongfeng Lake and Liuchong River Basin of Observation and Research Station of Guizhou Province (Grant No: QKHPT YWZ[2025]002), Bijie Scientist Workstation Project (BKHPT[2025]NO.2), Karst Plateau Resources and Environment Remote Sensing Talent Team (BWRLT [2023] No.14), Intelligent Geospatial Information Application Engineering Center (BKLH [2023] No. 8), Intelligent Geospatial Information Application Engineering Center (BKLH [2023] No. 8), and the Fundamental ResearchFunds for theLiaoning Universities (LJ202410166034).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Figure A1. Boxplot of key forest structural variables (DBH, tree height, crown width, and carbon stock) derived from 150 field plots in the Liuchong River Basin. The plot summarizes the distribution of each variable, including minimum, maximum, median, and interquartile range, and reflects the ecological variability of the sampled forest stands.

References

Pan, Y.; Birdsey, R.A.; Phillips, O.L.; Houghton, R.A.; Fang, J.; Kauppi, P.E.; Keith, H.; Kurz, W.A.; Ito, A.; Lewis, S.L.; et al. The enduring world forest carbon sink. Nature 2024, 631, 563–569. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Deng, J.; Ren, S.; Qu, G.; Wang, C.; Guo, R.; Zhao, X. Acoustic wave propagation characteristics and spontaneous combustion warning of coal during oxidative warming of loose coal. Fuel 2025, 398, 135528. [Google Scholar] [CrossRef]
Wang, Y.F.; Li, X.; Datta, R.; Chen, J.; Du, Y.; Du, D.L. Key factors shaping prokaryotic communities in subtropical forest soils. Appl. Soil Ecol. 2022, 169, 104162. [Google Scholar] [CrossRef]
Nazir, M.J.; Li, G.; Nazir, M.M.; Zulfiqar, F.; Siddique, K.H.; Iqbal, B.; Du, D. Harnessing soil carbon sequestration to address climate change challenges in agriculture. Soil Tillage Res. 2024, 237, 105959. [Google Scholar] [CrossRef]
Yan, H.; Li, M.; Zhang, C.; Zhang, J.; Wang, G.; Yu, J.; Ma, J.; Zhao, S. Comparison of evapotranspiration upscaling methods from instantaneous to daytime scale for tea and wheat in southeast China. Agric. Water Manag. 2022, 264, 107464. [Google Scholar] [CrossRef]
Luo, Y.; Deng, Q.F.; Yang, K.; Yang, Y.; Shang, C.X.; Yu, Z.Y. Spatial-Temporal Change Evolution of PM 2.5 in Typical Regions of China in Recent 20 Years. Huan Jing Ke Xue = Huanjing Kexue 2018, 39, 3003–3013. [Google Scholar] [PubMed]
Gao, H.; Gong, J.; Liu, J.; Ye, T. Effects of land use/cover changes on soil organic carbon stocks in Qinghai-Tibet plateau: A comparative analysis of different ecological functional areas based on machine learning methods and soil carbon pool data. J. Clean. Prod. 2024, 434, 139854. [Google Scholar] [CrossRef]
Han, S.H.; Kim, S.; Chang, H.; Li, G.; Son, Y. Increased soil temperature stimulates changes in carbon, nitrogen, and mass loss in the fine roots of Pinus koraiensis under experimental warming and drought. Turk. J. Agric. For. 2019, 43, 80–87. [Google Scholar] [CrossRef]
Shao, Z.; Xing, C.; Xue, M.; Fang, Y.; Li, P. Selective removal of Pb (II) from yellow rice wine using magnetic carbon-based adsorbent. J. Sci. Food Agric. 2023, 103, 6929–6939. [Google Scholar] [CrossRef] [PubMed]
Wan, L.; Li, H.; Li, C.; Wang, A.; Yang, Y.; Wang, P. Hyperspectral sensing of plant diseases: Principle and methods. Agronomy 2022, 12, 1451. [Google Scholar] [CrossRef]
Zhu, W.; Feng, Z.; Dai, S.; Zhang, P.; Wei, X. Using UAV multispectral remote sensing with appropriate spatial resolution and machine learning to monitor wheat scab. Agriculture 2022, 12, 1785. [Google Scholar] [CrossRef]
Chen, S.; Chen, Z.; Zhang, X.; Luo, Z.; Schillaci, C.; Arrouays, D.; Richer-de Forges, A.C.; Shi, Z. European topsoil bulk density and organic carbon stock database (0–20 cm) using machine-learning-based pedotransfer functions. Earth Syst. Sci. Data 2024, 16, 2367–2383. [Google Scholar] [CrossRef]
Yakov, K.; Sajjad, R.; Annie, I.; Andrew, M.; Kazem, Z.; Nan, L.; Sami, U.; Khalid, M.; Muhammad, A.K.; Nadeem, S.; et al. Inorganic carbon is overlooked in global soil carbon research: A bibliometric analysis. Geoderma 2024, 443, 116831. [Google Scholar] [CrossRef] [PubMed]
Liu, C.; Li, Y.; Chen, T.; Meng, S.; Liu, D.; Dong, D.; You, T. Electric field-induced specific preconcentration to enhance DNA-based electrochemical sensing of Hg2+ via the synergy of enrichment and self-cleaning. J. Agric. Food Chem. 2022, 70, 7412–7419. [Google Scholar] [CrossRef] [PubMed]
Zhao, Y.; Yang, K.; Luo, Y.; Yu, Z. Spatial–temporal characteristics of surface thermal environment and its effect on Lake surface water temperature in Dianchi Lake basin. Front. Ecol. Evol. 2022, 10, 984692. [Google Scholar] [CrossRef]
Xu, W.; Cheng, Y.; Luo, M.; Mai, X.; Wang, W.; Zhang, W.; Wang, Y. Progress and Limitations in Forest Carbon Stock Estimation Using Remote Sensing Technologies: A Comprehensive Review. Forests 2025, 16, 449. [Google Scholar] [CrossRef]
Wang, J.; Manning, D.A.; Werner, D. The limited potential of soil and vegetation in urban greenspace for nature-based offsetting of institutional carbon emissions. Soil Use Manag. 2024, 40, e13081. [Google Scholar] [CrossRef]
Raza, A.; Hu, Y.; Lu, Y. Improving carbon flux estimation in tea plantation ecosystems: A machine learning ensemble approach. Eur. J. Agron. 2024, 160, 127297. [Google Scholar] [CrossRef]
Liu, S.; Meng, S.; Wang, M.; Li, W.; Dong, N.; Liu, D.; Li, Y.; You, T. In-depth interpretation of aptamer-based sensing on electrode: Dual-mode electrochemical-photoelectrochemical sensor for the ratiometric detection of patulin. Food Chem. 2023, 410, 135450. [Google Scholar] [CrossRef] [PubMed]
Li, L.; Xu, H.; Zhang, Q.; Zhan, Z.; Liang, X.; Xing, J. Estimation methods of wetland carbon sink and factors influencing wetland carbon cycle: A review. Carbon Res. 2024, 3, 50. [Google Scholar] [CrossRef]
Zeng, K.; Wei, W.; Jiang, L.; Zhu, F.; Du, D. Use of carbon nanotubes as a solid support to establish quantitative (centrifugation) and qualitative (filtration) immunoassays to detect gentamicin contamination in commercial milk. J. Agric. Food Chem. 2016, 64, 7874–7881. [Google Scholar] [CrossRef] [PubMed]
Qin, C.; Guo, W.; Liu, Y.; Liu, Z.; Qiu, J.; Peng, J. A novel electrochemical sensor based on graphene oxide decorated with silver nanoparticles–molecular imprinted polymers for determination of sunset yellow in soft drinks. Food Anal. Methods 2017, 10, 2293–2301. [Google Scholar] [CrossRef]
Xu, S.; Xu, X.; Zhu, Q.; Meng, Y.; Yang, G.; Feng, H.; Yang, M.; Zhu, Q.; Xue, H.; Wang, B. Monitoring leaf nitrogen content in rice based on information fusion of multi-sensor imagery from UAV. Precis. Agric. 2023, 24, 2327–2349. [Google Scholar] [CrossRef]
Luo, Q.; Bai, X.; Zhao, C.; Luo, G.; Li, C.; Ran, C.; Zhang, S.; Xiong, L.; Liao, J.; Du, C.; et al. Unexpected response of terrestrial carbon sink to rural depopulation in China. Sci. Total Environ. 2024, 948, 174595. [Google Scholar] [CrossRef] [PubMed]
Leng, Y.; Li, W.; Ciais, P.; Sun, M.; Zhu, L.; Yue, C.; Chang, J.; Yao, Y.; Zhang, Y.; Zhou, J.; et al. Forest aging limits future carbon sink in China. One Earth 2024, 7, 822–834. [Google Scholar] [CrossRef]
Liu, Y.; Jing, Z.; Zhang, T.; Chen, Q.; Qiu, F.; Peng, Y.; Tang, S. Fabrication of functional biomass carbon aerogels derived from sisal fibers for application in selenium extraction. Food Bioprod. Process. 2018, 111, 93–103. [Google Scholar] [CrossRef]
Zhang, T.; Yuan, D.; Guo, Q.; Qiu, F.; Yang, D.; Ou, Z. Preparation of a renewable biomass carbon aerogel reinforced with sisal for oil spillage clean-up: Inspired by green leaves to green Tofu. Food Bioprod. Process. 2019, 114, 154–162. [Google Scholar] [CrossRef]
Memon, M.S.; Chen, S.; Niu, Y.; Zhou, W.; Elsherbiny, O.; Liang, R.; Du, Z.; Guo, X. Evaluating the efficacy of Sentinel-2B and Landsat-8 for estimating and mapping wheat straw cover in rice–wheat fields. Agronomy 2023, 13, 2691. [Google Scholar] [CrossRef]
Lou, H.; Shi, X.; Ren, X.; Yang, S.; Cai, M.; Pan, Z.; Zhu, Y.; Feng, D.; Zhou, B. Limited terrestrial carbon sinks and increasing carbon emissions from the Hu Line spatial pattern perspective in China. Ecol. Indic. 2024, 162, 112035. [Google Scholar] [CrossRef]
Hu, Y.; Li, Y.; Zhang, H.; Liu, X.; Zheng, Y.; Gong, H. The trajectory of carbon emissions and terrestrial carbon sinks at the provincial level in China. Sci. Rep. 2024, 14, 5828. [Google Scholar] [CrossRef] [PubMed]
Ma, S.; Wang, M.; You, T.; Wang, K. Using magnetic multiwalled carbon nanotubes as modified QuEChERS adsorbent for simultaneous determination of multiple mycotoxins in grains by UPLC-MS/MS. J. Agric. Food Chem. 2019, 67, 8035–8044. [Google Scholar] [CrossRef] [PubMed]
Jing, Z.; Ding, J.; Zhang, T.; Yang, D.; Qiu, F.; Chen, Q.; Xu, J. Flexible, versatility and superhydrophobic biomass carbon aerogels derived from corn bracts for efficient oil/water separation. Food Bioprod. Process. 2019, 115, 134–142. [Google Scholar] [CrossRef]
Zhang, Y.; Zhang, B.; Shen, C.; Liu, H.; Huang, J.; Tian, K.; Tang, Z. Review of the field environmental sensing methods based on multi-sensor information fusion technology. Int. J. Agric. Biol. Eng. 2024, 17, 1–13. [Google Scholar] [CrossRef]
Yue, X.; Zhou, H.; Cao, Y.; Liao, H.; Lu, X.; Yu, Z.; Yuan, W.; Liu, Z.; Lei, Y.; Sitch, S.; et al. Large potential of strengthening the land carbon sink in China through anthropogenic interventions. Sci. Bull. 2024, 69, 2622–2631. [Google Scholar] [CrossRef] [PubMed]
Cong, C.; Guangqiao, C.; Yibai, L.; Dong, L.; Bin, M.; Jinlong, Z.; Liang, L.; Jianping, H. Research on monitoring methods for the appropriate rice harvest period based on multispectral remote sensing. Discret. Dyn. Nat. Soc. 2022, 2022, 1519667. [Google Scholar] [CrossRef]
Zhou, Q.; Cheng, K.W.; Xiao, J.; Wang, M. The multifunctional roles of flavonoids against the formation of advanced glycation end products (AGEs) and AGEs-induced harmful effects. Trends Food Sci. Technol. 2020, 103, 333–347. [Google Scholar] [CrossRef]
Zhang, C.; Yu, X.; Shi, X.; Han, Y.; Guo, Z.; Liu, Y. Development of carbon quantum dot–labeled antibody fluorescence immunoassays for the detection of morphine in hot pot soup base. Food Anal. Methods 2020, 13, 1042–1049. [Google Scholar] [CrossRef]
Chen, X.; Zhao, C.; Zhao, Q.; Yang, Y.; Yang, S.; Zhang, R.; Wang, Y.; Wang, K.; Qian, J.; Long, L. Construction of a colorimetric and near-infrared ratiometric fluorescent sensor and portable sensing system for on-site quantitative measurement of sulfite in food. Foods 2024, 13, 1758. [Google Scholar] [CrossRef] [PubMed]
Wei, L.; Yang, H.; Niu, Y.; Zhang, Y.; Xu, L.; Chai, X. Wheat biomass, yield, and straw-grain ratio estimation from multi-temporal UAV-based RGB and multispectral images. Biosyst. Eng. 2023, 234, 187–205. [Google Scholar] [CrossRef]
Jing, Y.; Zhang, Y.; Han, I.; Wang, P.; Mei, Q.; Huang, Y. Effects of different straw biochars on soil organic carbon, nitrogen, available phosphorus, and enzyme activity in paddy soil. Sci. Rep. 2020, 10, 8837. [Google Scholar] [CrossRef] [PubMed]
Jiang, N.J.; Wang, Y.J.; Chu, J.; Kawasaki, S.; Tang, C.S.; Cheng, L.; Du, Y.J.; Shashank, B.S.; Singh, D.N.; Han, X.L.; et al. Bio-mediated soil improvement: An introspection into processes, materials, characterization and applications. Soil Use Manag. 2022, 38, 68–93. [Google Scholar] [CrossRef]
Dong, X.; Huang, A.; He, L.; Cai, C.; You, T. Recent advances in foodborne pathogen detection using photoelectrochemical biosensors: From photoactive material to sensing strategy. Front. Sustain. Food Syst. 2024, 8, 1432555. [Google Scholar] [CrossRef]
Yin, L.; Zhang, Y.; Azi, F.; Zhou, J.; Liu, X.; Dai, Y.; Wang, Z.; Dong, M.; Xia, X. Inhibition of biofilm formation and quorum sensing by soy isoflavones in Pseudomonas aeruginosa. Food Control 2022, 133, 108629. [Google Scholar] [CrossRef]
Anees, S.A.; Mehmood, K.; Khan, W.R.; Sajjad, M.; Alahmadi, T.A.; Alharbi, S.A.; Luo, M. Integration of machine learning and remote sensing for above ground biomass estimation through Landsat-9 and field data in temperate forests of the Himalayan region. Ecol. Inform. 2024, 82, 102732. [Google Scholar] [CrossRef]
Yu, Z. AI for Science: A Comprehensive Review on Innovations, Challenges, and Future Directions. Int. J. Artif. Intell. Sci. (IJAI4S) 2025, 1. [Google Scholar] [CrossRef]
Yuan, J.; Zhu, Y.; Wang, J.; Gan, L.; He, M.; Zhang, T.; Li, P.; Qiu, F. Preparation and application of Mg–Al composite oxide/coconut shell carbon fiber for effective removal of phosphorus from domestic sewage. Food Bioprod. Process. 2021, 126, 293–304. [Google Scholar] [CrossRef]
Hu, X.; Li, Y.; Xu, Y.; Gan, Z.; Zou, X.; Shi, J.; Huang, X.; Li, Z.; Li, Y. Green one-step synthesis of carbon quantum dots from orange peel for fluorescent detection of Escherichia coli in milk. Food Chem. 2021, 339, 127775. [Google Scholar] [CrossRef] [PubMed]
Tang, C.; He, Y.; Yuan, B.; Li, L.; Luo, L.; You, T. Simultaneous detection of multiple mycotoxins in agricultural products: Recent advances in optical and electrochemical sensing methods. Compr. Rev. Food Sci. Food Saf. 2024, 23, e70062. [Google Scholar] [CrossRef] [PubMed]
Tanase, M.A.; Mihai, M.; Miguel, S.; Cantero, A.; Tijerín, J.; Ruiz-Benito, P.; Domingo, D.; García-Martín, A.; Aponte, C.; Lamelas, M.T. Long-term annual estimation of forest above ground biomass, canopy cover, and height from airborne and spaceborne sensors synergies in the Iberian Peninsula. Environ. Res. 2024, 259, 119432. [Google Scholar] [CrossRef] [PubMed]
Wang, X.; Xu, Y.; Li, Y.; Li, Y.; Li, Z.; Zhang, W.; Zou, X.; Shi, J.; Huang, X.; Liu, C.; et al. Rapid detection of cadmium ions in meat by a multi-walled carbon nanotubes enhanced metal-organic framework modified electrochemical sensor. Food Chem. 2021, 357, 129762. [Google Scholar] [CrossRef] [PubMed]
Yu, Z.; Wang, J.; Idris, M.Y.I. IIDM: Improved implicit diffusion model with knowledge distillation to estimate the spatial distribution density of carbon stock in remote sensing imagery. arXiv 2024, arXiv:2411.17973. [Google Scholar]
Li, W.; Zhang, C.; Ma, T.; Li, W. Estimation of summer maize biomass based on a crop growth model. Emir. J. Food Agric. (EJFA) 2021, 33. [Google Scholar] [CrossRef]
Hassan, M.M.; Zareef, M.; Jiao, T.; Liu, S.; Xu, Y.; Viswadevarayalu, A.; Li, H.; Chen, Q. Signal optimized rough silver nanoparticle for rapid SERS sensing of pesticide residues in tea. Food Chem. 2021, 338, 127796. [Google Scholar] [CrossRef] [PubMed]
Awais, M.; Li, W.; Hussain, S.; Cheema, M.J.M.; Li, W.; Song, R.; Liu, C. Comparative evaluation of land surface temperature images from unmanned aerial vehicle and satellite observation for agricultural areas using in situ data. Agriculture 2022, 12, 184. [Google Scholar] [CrossRef]
Xu, L.; He, N.; Li, M.; Cai, W.; Yu, G. Spatiotemporal dynamics of carbon sinks in China’s terrestrial ecosystems from 2010 to 2060. Resour. Conserv. Recycl. 2024, 203, 107457. [Google Scholar] [CrossRef]
Estévez, J.; Salinero-Delgado, M.; Berger, K.; Pipia, L.; Rivera-Caicedo, J.P.; Wocher, M.; Reyes-Muñoz, P.; Tagliabue, G.; Boschetti, M.; Verrelst, J. Gaussian processes retrieval of crop traits in Google Earth Engine based on Sentinel-2 top-of-atmosphere data. Remote Sens. Environ. 2022, 273, 112958. [Google Scholar] [CrossRef] [PubMed]
Liu, R.; Ali, S.; Haruna, S.A.; Ouyang, Q.; Li, H.; Chen, Q. Development of a fluorescence sensing platform for specific and sensitive detection of pathogenic bacteria in food samples. Food Control 2022, 131, 108419. [Google Scholar] [CrossRef]
Rusňák, T.; Kasanickỳ, T.; Malík, P.; Mojžiš, J.; Zelenka, J.; Sviček, M.; Abrahám, D.; Halabuk, A. Crop mapping without labels: Investigating temporal and spatial transferability of crop classification models using a 5-year sentinel-2 series and machine learning. Remote Sens. 2023, 15, 3414. [Google Scholar] [CrossRef]
Li, H.; Zhang, G.; Zhong, Q.; Xing, L.; Du, H. Prediction of urban forest aboveground carbon using machine learning based on landsat 8 and Sentinel-2: A case study of Shanghai, China. Remote Sens. 2023, 15, 284. [Google Scholar] [CrossRef]
Kang, W.; Lin, H.; Jiang, H.; Yao-Say Solomon Adade, S.; Xue, Z.; Chen, Q. Advanced applications of chemo-responsive dyes based odor imaging technology for fast sensing food quality and safety: A review. Compr. Rev. Food Sci. Food Saf. 2021, 20, 5145–5172. [Google Scholar] [CrossRef] [PubMed]
Okeke, E.S.; Ezeorba, T.P.C.; Okoye, C.O.; Chen, Y.; Mao, G.; Feng, W.; Wu, X. Analytical detection methods for azo dyes: A focus on comparative limitations and prospects of bio-sensing and electrochemical nano-detection. J. Food Compos. Anal. 2022, 114, 104778. [Google Scholar] [CrossRef]
Cheng, F.; Ou, G.; Wang, M.; Liu, C. Remote sensing estimation of forest carbon stock based on machine learning algorithms. Forests 2024, 15, 681. [Google Scholar] [CrossRef]
Yu, Z.; Wang, J.; Tan, Z.; Luo, Y. Impact of climate change on SARS-CoV-2 epidemic in China. PLoS ONE 2023, 18, e0285179. [Google Scholar] [CrossRef] [PubMed]
Zhihua, L.; Xue, Z.; Xiaowei, H.; Xiaobo, Z.; Jiyong, S.; Yiwei, X.; Xuetao, H.; Yue, S.; Xiaodong, Z. Hypha-templated synthesis of carbon/ZnO microfiber for dopamine sensing in pork. Food Chem. 2021, 335, 127646. [Google Scholar] [CrossRef] [PubMed]
Yu, Z.; Idris, M.Y.I.; Wang, H.; Wang, P.; Chen, J.; Wang, K. From Physics to Foundation Models: A Review of AI-Driven Quantitative Remote Sensing Inversion. arXiv 2025, arXiv:2507.09081. [Google Scholar] [CrossRef]
Li, J.; Hong, D.; Gao, L.; Yao, J.; Zheng, K.; Zhang, B.; Chanussot, J. Deep learning in multimodal remote sensing data fusion: A comprehensive review. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102926. [Google Scholar] [CrossRef]
Shafiq, M.; Gu, Z. Deep residual learning for image recognition: A survey. Appl. Sci. 2022, 12, 8972. [Google Scholar] [CrossRef]
Illarionova, S.; Tregubova, P.; Shukhratov, I.; Shadrin, D.; Efimov, A.; Burnaev, E. Advancing forest carbon stocks’ mapping using a hierarchical approach with machine learning and satellite imagery. Sci. Rep. 2024, 14, 21032. [Google Scholar] [CrossRef] [PubMed]
Yang, K.; Zhou, P.; Wu, J.; Yao, Q.; Yang, Z.; Wang, X.; Wen, Y. Carbon stock inversion study of a carbon peaking pilot urban combining machine learning and Landsat images. Ecol. Indic. 2024, 159, 111657. [Google Scholar] [CrossRef]
Jin, X.; Xie, Y.; Wei, X.S.; Zhao, B.R.; Chen, Z.M.; Tan, X. Delving deep into spatial pooling for squeeze-and-excitation networks. Pattern Recognit. 2022, 121, 108159. [Google Scholar] [CrossRef]
Jiao, Y.; Wang, D.; Yao, X.; Wang, S.; Chi, T.; Meng, Y. Forest emissions reduction assessment using optical satellite imagery and space LiDAR fusion for carbon stock estimation. Remote Sens. 2023, 15, 1410. [Google Scholar] [CrossRef]
Tang, Y.; Song, S.; Gui, S.; Chao, W.; Cheng, C.; Qin, R. Active and low-cost hyperspectral imaging for the spectral analysis of a low-light environment. Sensors 2023, 23, 1437. [Google Scholar] [CrossRef] [PubMed]
Fassnacht, F.E.; White, J.C.; Wulder, M.A.; Næsset, E. Remote sensing in forestry: Current challenges, considerations and directions. For. Int. J. For. Res. 2024, 97, 11–37. [Google Scholar] [CrossRef]
Thapa, A.; Horanont, T.; Neupane, B.; Aryal, J. Deep learning for remote sensing image scene classification: A review and meta-analysis. Remote Sens. 2023, 15, 4804. [Google Scholar] [CrossRef]
Wang, X.; Zhu, X.X. Deep domain adaptation in remote sensing: A meta-review and future directions. ISPRS J. Photogramm. Remote Sens. 2021, 171, 274–290. [Google Scholar] [CrossRef]
Griffiths, P.; Zuccarini, P.; Kennedy, R.E.; Gorelick, N.; Cohen, W.B.; Healey, S.P.; Yang, Z. LandTrendr: A decade of land cover change detection using Landsat time series. Remote Sens. Environ. 2020, 246, 111898. [Google Scholar] [CrossRef]
Dalponte, M.; Coomes, D.A. Tree-centric mapping of forest carbon density from airborne laser scanning and hyperspectral data. Methods Ecol. Evol. 2016, 7, 1236–1245. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Location and topographic features of the study area (Liuchong River Basin, Guizhou, China).

Figure 2. Overview of the proposed deep learning framework for forest carbon stock estimation.

Figure 3. Structure of the proposed Multiscale Residual Block (MSRB), combining standard, large-kernel, and dilated convolutions to capture spatial features at multiple receptive fields.

Figure 4. Structure of the channel-wise Squeeze-and-Excitation (SE) module, which adaptively reweights spectral features through global average pooling and two fully connected layers.

Figure 5. Spatial distribution of estimated forest carbon stock (Mg/ha) across the Liuchong River Basin using different models: (a) Satellite, (b) Ground Truth, (c) Enhanced ResNet (Ours), (d) ViT, (e) GBDT, and (f) RF. Each row shows a different representative subregion at varying spatial scales.

Figure 6. Feature importance scores of input variables based on permutation analysis. Top features include Tree Cover, Slope, and Red-edge bands.

Figure 7. Performance comparison of different models across RMSE, MAE, and

R^{2}

metrics.

Figure 7. Performance comparison of different models across RMSE, MAE, and

R^{2}

metrics.

Figure 8. Basin-level forest carbon stock maps (Mg/ha) across the Liuchong River Basin generated by (a) RF, (b) GBDT, (c) ViT, and (d) Enhanced ResNet. Higher values are concentrated in mountainous forest regions.

Table 1. Spectral bands of Sentinel-2 MSI used in this study.

Description	Band	Wavelength (nm)	Resolution (m)	Resampled (m)
Blue	B2	490	10	20
Green	B3	560	10	20
Red	B4	665	10	20
Red-edge 1	B5	705	20	20
Red-edge 2	B6	740	20	20
Red-edge 3	B7	783	20	20
NIR	B8	842	10	20
Narrow NIR	B8A	865	20	20
SWIR 1	B11	1610	20	20
SWIR 2	B12	2190	20	20

Table 2. Performance comparison of different models on the test set. Bold is the best result.

Model	RMSE (Mg/ha)	MAE (Mg/ha)	$R^{2}$
Random Forest (RF)	30.27	28.31	0.622
Gradient Boosting (GBDT)	29.95	27.88	0.643
Vision Transformer (ViT)	29.55	21.52	0.719
Enhanced ResNet (Ours)	23.02	17.15	0.773

Table 3. Ablation study of structural components in the proposed ResNet architecture. Bold is the best result.

Model Variant	RMSE (Mg/ha)	MAE (Mg/ha)	$R^{2}$
ResNet-18 (Baseline)	29.41	21.87	0.683
+MSRB only	25.36	18.72	0.741
+MSRB + SE (Enhanced ResNet)	23.02	17.15	0.773

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ren, J.; Liu, L.; Wu, Y.; Ouyang, L.; Yu, Z. Estimating Forest Carbon Stock Using Enhanced ResNet and Sentinel-2 Imagery. Forests 2025, 16, 1198. https://doi.org/10.3390/f16071198

AMA Style

Ren J, Liu L, Wu Y, Ouyang L, Yu Z. Estimating Forest Carbon Stock Using Enhanced ResNet and Sentinel-2 Imagery. Forests. 2025; 16(7):1198. https://doi.org/10.3390/f16071198

Chicago/Turabian Style

Ren, Jintong, Lizhi Liu, You Wu, Lijian Ouyang, and Zhenyu Yu. 2025. "Estimating Forest Carbon Stock Using Enhanced ResNet and Sentinel-2 Imagery" Forests 16, no. 7: 1198. https://doi.org/10.3390/f16071198

APA Style

Ren, J., Liu, L., Wu, Y., Ouyang, L., & Yu, Z. (2025). Estimating Forest Carbon Stock Using Enhanced ResNet and Sentinel-2 Imagery. Forests, 16(7), 1198. https://doi.org/10.3390/f16071198

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Estimating Forest Carbon Stock Using Enhanced ResNet and Sentinel-2 Imagery

Abstract

1. Introduction

2. Materials

2.1. Study Area

2.2. Data Sources

2.3. Data Preprocessing

3. Methodology

3.1. Input Feature Construction

3.2. Enhanced ResNet Architecture

3.2.1. Multiscale Residual Block (MSRB)

3.2.2. Channel Attention Mechanism

3.2.3. Regression Head

3.2.4. Model Efficiency

3.3. Regression Strategy and Optimization

3.3.1. Loss Function

3.3.2. Optimization

3.3.3. Dataset Split

3.3.4. Implementation

3.4. Evaluation Metrics

3.4.1. Coefficient of Determination ( R 2 )

3.4.2. Root Mean Squared Error (RMSE)

3.4.3. Mean Absolute Error (MAE)

4. Results

4.1. Performance Comparison

4.2. Spatial Distribution of Estimates

4.3. Error Analysis

4.3.1. Feature Importance Ranking

4.3.2. RMSE, MAE, and R 2 Comparisons

4.3.3. Case Study of Extreme Prediction Error

4.4. Ablation Study

4.5. Regional Carbon Stock Mapping

5. Discussion

5.1. Effectiveness of the Enhanced ResNet Architecture

5.2. Comparison with Traditional Machine Learning Models

5.3. Limitations and Future Work

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.4.1. Coefficient of Determination ( $R^{2}$ )

4.3.2. RMSE, MAE, and $R^{2}$ Comparisons