1. Introduction
Forests are the cornerstone of terrestrial ecosystems, playing a vital role in maintaining ecological balance, regulating climate, and improving the overall environment [
1,
2]. Forest volume (FV) serves as a critical indicator for reflecting the abundance of forest resources and evaluating the quality of forest ecosystems [
3,
4,
5,
6]. Consequently, accurately estimating forest volume at a regional scale has been a significant area of focus within forestry research. Global initiatives such as the United Nations Sustainable Development Goals (SDG 13: Climate Action) and the Paris Agreement have highlighted the urgent need for improved forest monitoring systems, particularly in relation to carbon sequestration and climate adaptation. Forest volume estimation, as a precursor to carbon accounting, is thus increasingly recognized not only as a scientific endeavor but also as a policy-relevant tool for evidence-based environmental governance [
2,
4,
7]. Precise management of forest resources and the enhancement of their information management are indispensable for formulating scientifically sound forest carbon sequestration strategies [
8,
9]. Traditional methods for estimating forest volume mainly rely on field sampling and forest growth equations. However, these approaches are marred by long update cycles, low efficiency, and high costs [
10]. Field-based surveys, while precise, are logistically challenging in remote or rugged terrains, making scalable remote sensing solutions particularly valuable in such contexts. Thanks to the remarkable progress in next-generation spaceborne and airborne remote sensing technologies, it is now feasible to obtain timely and accurate spatial information for forest areas, which is of great value for investigating terrestrial ecological balance and carbon storage potential [
2,
11].
Despite advancements in forest volume estimation, a persistent knowledge gap remains in accurately assessing biomass and volume in structurally complex, multi-species forests located in mountainous terrains. Most existing studies have focused on either lowland homogeneous forests or relied on single-source data, which struggle to capture the combined spectral and structural heterogeneity of diverse ecosystems. Future research must prioritize external validation of such frameworks using independent datasets and rigorous uncertainty quantification to ensure reliability for policy applications. Particularly in dense secondary forests with varied slope aspects, conventional optical methods face saturation, while microwave and LiDAR approaches encounter technical limitations due to terrain effects or acquisition constraints. This lack of integrated modeling strategies, specifically tailored for regions like the Saihanwula Nature Reserve, highlights the need for a synergistic framework that combines complementary remote sensing sources.
Multi-source remote sensing data have been widely adopted in volume estimation models, such as spaceborne and airborne optical data, microwave remote sensing, and LiDAR data. Traditional optical remote sensing for forest volume estimation emerged in the 1970s–1980s, with early studies leveraging multispectral data from Landsat MSS to estimate biomass parameters. It utilizes vegetation reflectance, vegetation indices, image transformation components, and high-resolution texture features to construct volume models, enabling estimations with relatively high temporal and spatial resolutions [
12,
13,
14]. Nevertheless, spectral saturation in high-biomass forests primarily affects near-infrared (NIR) bands, where vegetation reflectance plateaus due to canopy light interception, whereas visible bands remain relatively unaffected. Although previous studies have attempted to mitigate this issue through methods like mitigating saturation effects through sensor radiometric calibration and atmospheric correction improvements [
15], and applying data compression algorithms, it remains challenging to distinguish spectral differences among various tree species solely based on vegetation indices and texture features [
16]. This limitation is particularly evident in dense secondary forests and heterogeneous canopies, where spectral reflectance tends to saturate beyond a certain volume threshold. As a result, models based solely on optical inputs often fail to capture vertical structural variability, leading to systematic underestimation in high-volume regions. Unlike spectral indices that capture both surface reflectance and canopy subsurface characteristics, Canopy Height Models derived from LiDAR or stereoscopic imagery provide direct structural measurements of vertical forest heterogeneity. This complementary dimension addresses one of the most persistent challenges in volume modeling: the inability of 2D spectral data to accurately represent 3D forest structure, particularly in mature, multi-layered canopies [
17,
18,
19]. Microwave remote sensing, which is immune to weather conditions and possesses penetration capabilities, can offer comprehensive information regarding forest structure and functions by leveraging backscatter coefficients and interferometric data. For instance, spaceborne synthetic aperture radar (SAR) provides coarse-scale forest structural information, primarily reflecting canopy volume and biomass density through radar backscatter mechanisms, serving as an effective predictor for volume estimation [
19,
20,
21]. However, in complex forest environments like mountainous regions, geometric distortions caused by terrain and temporal decorrelation can introduce substantial errors in the estimation process. LiDAR stands out with its unique advantages as it can generate digital surface models (DSM) and precise digital elevation models (DEM) to extract vertical structure information, and using LiDAR-derived Canopy Height Models (CHM) effectively mitigate saturation in high-density forests, though their performance varies with forest structure and density [
22,
23]. However, LiDAR data alone cannot provide sufficient spectral context for discriminating between species or assessing vegetation health. Therefore, integrating CHM with multispectral and texture features offers a complementary perspective that captures both the horizontal and vertical heterogeneity of forest canopies, enabling a more comprehensive estimation framework [
24]. Nevertheless, airborne LiDAR systems have limited coverage areas and come with high costs. Therefore, spaceborne LiDAR, including systems like GEDI and ICESat-2, is often combined with other remote sensing data to improve estimation accuracy [
25,
26]. However, in practice, even with these combined efforts, achieving a satisfactory volume estimation for various forest situations, especially those with complex characteristics, remains a challenge. In the Saihanwula Nature Reserve (42°22′–42°31′ N, 118°18′–118°30′ E), located in Chifeng City, Inner Mongolia Autonomous Region, China, the terrain there is extremely complex with numerous mountainous areas. Despite the growing interest in forest volume estimation across temperate ecosystems, while regional forest studies exist for northern China, few have focused on complex mountainous ecosystems with heterogeneous topography like the Saihanwula Nature Reserve. The unique topographic variability and species composition of this area pose significant challenges for conventional remote sensing techniques, necessitating a more tailored and integrated approach. Moreover, it features a mixed forest of multiple tree species, where the secondary forests are so dense that it is difficult to accurately segment them. In such a situation, previous research means have shown significant deficiencies. The traditional optical remote sensing struggles to precisely distinguish between different tree species due to the spectral similarity issues in the complex forest environment. Microwave remote sensing is affected by terrain-induced geometric distortions, leading to errors in estimating volume. For LiDAR, the dense secondary forests make it hard for the equipment to access and obtain comprehensive data, limiting its effectiveness. As a result, the accuracy of volume estimation in this area using existing methods is far from satisfactory. While the incorporation of hybrid learning frameworks, including deep learning architectures, holds promise for improving generalizability in forest volume modeling, this study focuses on interpretable machine learning algorithms due to data size constraints and the need for model transparency [
27,
28]. Future work may explore deep learning approaches to further enhance accuracy, particularly in large-scale or data-rich forest environments.
Given the complexity of the Saihanwula Nature Reserve’s forest conditions, including its mountainous terrain, multi-species mixed forests, and the challenges posed by the dense secondary forests, there is an urgent need for more suitable and accurate volume estimation methods. While the incorporation of hybrid learning frameworks, including deep learning architectures, holds promise for improving generalizability in forest volume modeling, this study focuses on interpretable machine learning algorithms due to data size constraints and the need for model transparency. Future work may explore deep learning approaches to further enhance accuracy, particularly in large-scale or data-rich forest environments. Volume estimation usually involves both parametric and non-parametric empirical models. Parametric models, like linear, logarithmic, or power functions, are incapable of modeling complex nonlinear relationships [
29]. Non-parametric models, including neural networks, k-nearest neighbors (kNN), random forest (RF), and support vector machines (SVM), face challenges, including data noise, model overfitting, and the curse of dimensionality. In remote sensing applications, spectral bands often exhibit high inter-correlation, increasing the risk of overfitting when all features are included without dimensionality reduction, which can undermine model accuracy [
30]. Machine learning techniques, such as gradient boosting regression trees (GBRT), have shown promise in forest volume estimation. Their ability to handle non-linear relationships makes them suitable for complex ecological data. Future research could explore deep learning frameworks, which have demonstrated superior performance in spatial pattern recognition for similar remote sensing tasks. Despite this progress, there remains a lack of integrative approaches that combine multispectral data, stereoscopic imagery, and LiDAR-derived canopy metrics into a unified predictive framework. Bridging this gap is critical not only for enhancing estimation accuracy, but also for informing broader applications such as carbon stock accounting, biodiversity assessment, and sustainable forest management. In this context, the present study proposes an integrated modeling strategy. Given these challenges, a unified, multi-source approach that integrates structural signals derived from LiDAR-generated Canopy Height Models (CHM) with spectral metrics from Sentinel-2 multispectral imagery. Through advanced machine learning models is needed to improve estimation reliability and generalizability in forest ecosystems with complex physiographic and floristic characteristics [
31,
32,
33].
Volume estimation in the complex, mountainous, and multi-species forests of the Saihanwula Nature Reserve remains challenging, largely due to limitations in data sources and modeling approaches. To address these challenges, this study integrates forest resource plot data with multi-source remote sensing datasets—including Sentinel-2 spectral imagery, Resource-3 stereoscopic data, and airborne LiDAR—for improved volume estimation [
34,
35]. However, uncertainty quantification for biomass estimation was not conducted herein, and future research is recommended to incorporate this for enhanced forest monitoring reliability. By analyzing the contributions of spectral features, texture features, and canopy height to volume and employing the recursive feature elimination (RFE) with 10-fold cross-validation identified optimal variables, ranking features by Gini impurity reduction from random forest models. We construct and compare four models: fandom forest (RF), k-nearest neighbors (kNN), gradient boosting regression trees (GBRT), and multiple linear regression (MLR) to evaluate their effectiveness for volume estimation in the Saihanwula Nature Reserve. This comparative framework aims to support dynamic forest volume monitoring in protected areas through multi-source data fusion. Accurate volume estimation not only contributes to ecological research but also supports national forest inventory programs, carbon trading mechanisms, and the implementation of REDD+ strategies. This underscores the practical significance of advancing robust and scalable volume estimation methods, particularly for ecologically sensitive and structurally complex forest regions such as Inner Mongolia [
33,
36].
In summary, this research is not only crucial for improving the accuracy of volume estimation in the specific context of Saihanwula Nature Reserve but also has the potential to offer valuable references for similar forest areas with complex characteristics across the region. Moreover, the proposed methodology can serve as a transferable framework for other forested regions facing similar topographic and structural complexity, contributing to global efforts in ecosystem monitoring and climate resilience.
2. Materials and Methods
2.1. Study Area Overview
Saihanwula Nature Reserve is located in the middle and lower mountainous regions of Southern Daxing’anling Mountains (43°59′–44°27′ N, 118°18′–118°55′ E), covering a total area of 1004.62 km
2 (
Figure 1). It spans across Suobojiga Town, Han Mountain Forest Farm, and Xingfu Village. The mountains in this region generally extend from northeast to southwest, with an average elevation of around 1000 m. The region has an average annual temperature of 2 °C and average annual precipitation of 400 mm, based on 1991–2020 climatological normals from the China Meteorological Data Service Center. The reserve’s forest vegetation is highly diverse, shaped by topographic variation and microclimatic differences. Vegetation distribution in the reserve is governed by interactions among slope aspect, elevation, soil moisture, and microclimate. Mid-elevation shaded slopes support larch, while sunny, lower-elevation slopes host birch. Higher shaded slopes feature
Populus davidiana and
Betula platyphylla, whereas sunny slopes harbor
Quercus mongolica and
Betula dahurica. These patterns, consistent with regional studies linking aspect/elevation to precipitation and growing season length, highlight the reserve’s distinct plant communities and ecological significance. Saihanwula is one of the few comprehensive mountain nature reserves in China, being focused on the protection of forest, grassland, and wetland ecosystems, biodiversity, and key water conservation areas for the source of the Xiliao River. Given its heterogeneous structure and multi-layered canopy configuration, this ecological complexity—characterized by mixed-species forests, varied slope aspects, and dense canopy structures—combined with a well-distributed network of ground inventory plots, positions Saihanwula as an exemplary, data-rich reference site for testing advanced remote sensing-based volume estimation frameworks.
2.2. Ground Data Acquisition
Systematically collected forest inventory data served as the primary reference for model development and validation in this study. These plot-level measurements were obtained from a region-wide monitoring program designed to capture species composition, structural attributes, and temporal variability in forest conditions. This method offered several advantages, including its extensive coverage, representation of various forest types, diverse survey subjects, standardized acquisition of key forest structural parameters, and strong temporal continuity. Following the principles of systematic sampling, sampling used a 1 km × 1 km ArcGIS fishnet with systematic random start to avoid bias. Edge effects were mitigated by a 500 m buffer and excluding cells <50% within the reserve, validated by spatial autocorrelation tests. In addition, forest stand characteristics were collected, such as the dominant tree species, average tree height, diameter at breast height (DBH), canopy cover, tree species origin, and age group. A total of 415 plots were surveyed, consisting of 187 plots of
Betula platyphylla, 83 plots of
Populus davidiana, and 145 plots of
Quercus mongolica, in 2023 (
Figure 1). To ensure a robust performance evaluation, model validation was conducted via spatially stratified 10-fold cross-validation, with stratification by dominant tree species to maintain inter-class balance. All 415 field plots were included in the cross-validation framework; no plots were reserved for independent validation due to sample size limitations. However, prior to modeling, the dataset was first split into training (80%) and testing (20%) subsets. Min-Max normalization was then applied independently to each subset, with scaling parameters derived solely from the training data. Summary statistics of the field-measured forest volumes are presented in
Table 1.
2.3. Remote Sensing Data Acquisition
2.3.1. Acquisition and Processing of Sentinel-2 Data
Sentinel-2 imagery was used to obtain high-resolution multispectral information characterizing vegetation structure, condition, and spatial heterogeneity across the study area. Sentinel-2 data acquired during peak vegetation conditions in September 2022 were selected for optimal volume signal retrieval (
Table 2). Two scenes of L1C-level multispectral imagery were downloaded from the European Space Agency’s website (
https://dataspace.copernicus.eu/). Using the Sen2cor plugin, radiometric calibration and atmospheric correction were applied [
37,
38], generating L2A surface reflectance spectral products. Based on previous studies, three 60 m resolution bands (aerosol, water vapor, and shortwave cirrus) were removed. The remaining four 10 m bands and six 20 m bands were resampled and fused into a unified 10 m resolution product using SNAP software tools (version 8.0). The 10 m and 20 m bands were resampled to a common 10 m grid using bilinear interpolation, ensuring spatial consistency. This resampling scheme balances spatial fidelity with spectral accuracy, as recommended for multi-resolution remote sensing data fusion. To address reflectance issues in complex terrain, the 2 m LiDAR DTM was downsampled to 10 m via bilinear interpolation for Sentinel-2 compatibility, maintaining topographic accuracy. Datasets were integrated in UTM 50N, with terrain corrections applied to spectral indices using the cosine method. Terrain correction was performed using the SCS+C model, and spectral variables extracted from the corrected data were used for volume modeling [
39,
40,
41]. The SCS+C correction model, which combines slope and aspect effects with sensor geometry, was selected for its proven effectiveness in correcting topographic-induced illumination variability in mountainous forest environments.
2.3.2. Canopy Height Acquisition
The ZY-3 (Resource-3) satellite, known for its efficient stereoscopic mapping and high-resolution coverage, is China’s first civil stereoscopic remote sensing satellite. In this study, ground control points and tie points were selected to generate stereoscopic images with a spatial resolution of 2.1 m, which were then used to calculate the digital surface model (DSM) for the Saihanwula forest area. First, the elevation information of the control points was obtained from airborne LiDAR-based DTM data. The LiDAR point clouds were collected using the LiAir220 lightweight LiDAR system, which incorporates a 40-channel Pandar40 laser sensor to acquire accurate 3D point cloud data. Data acquisition took place in September 2022, with a density of 3 points per square meter. Next, the DSM was precisely retrieved from the ZY-3 satellite by calculating the tie points and control points’ DTM. The regional network adjustment was used to construct the topological geometric relationship of matching image points in the 3D stereoscopic image. The point cloud model was reconstructed from the stereoscopic image pair. High-precision, low-noise front, and backward-facing images were used, and the DSM was derived after matching validation based on core-line images (Xie et al. 2019) [
27]. Finally, a 12.5 m spatial resolution open-source digital elevation model (DEM) was retrieved from NASA’s Earth Data Portal (
https://search.asf.alaska.edu/) (accessed on 20 September 2022). The Canopy Height Model (CHM) was generated by subtracting the digital elevation model (DEM) from the digital surface model (DSM), applying a 3 × 3 (
https://search.asf.alaska.edu/) (accessed on 20 September 2022) pixel median filter to reduce noise while preserving canopy structure, creating a continuous surface using inverse distance weighting (IDW) interpolation with a power parameter of 2 and a 3 m search radius to balance spatial influences and resampling the final CHM to a 30 m resolution using bicubic interpolation to ensure compatibility with field plot data and minimize spatial scale mismatch. After filtering and replacing anomalous pixels, the final CHM for the study area was obtained. The final model was resampled to a 30 m spatial resolution. The CHM was resampled to 30 m using bicubic interpolation, a method that preserves height gradient continuity while reducing spatial resolution, serving as a key structural variable in the subsequent volume estimation models. The CHM, which captured vertical complexity at fine spatial scales, was subsequently combined with spectral and textural metrics to enhance the model’s responsiveness to volume variation across forest types. The integration of stereo-derived CHM with spectral indices aimed to mitigate limitations inherent in single-source datasets, especially in structurally complex forest regions. A summary of all remote sensing and ground survey datasets is provided in
Table 2.
Together, these multi-source datasets—combining optical, structural, and in situ field observations—provided a robust, multidimensional basis for advanced feature engineering, hybrid model development, and cross-scale validation in volume estimation workflows.
2.4. Feature Extraction and Selection
To enhance model robustness and minimize overfitting risks arising from multicollinearity and high dimensionality, this study extracted a comprehensive set of spectral, textural, and structural predictors, followed by rigorous feature selection to mitigate redundancy and enhance model reliability. Spectral and texture features relevant to forest stock volume inversion were extracted from Sentinel-2 data with a 10 m spatial resolution. The central pixel of each 3 × 3 pixel window overlapping the plot center was extracted, with a 15 m buffer around the plot boundary applied to minimize edge effects. This approach reduces mixed-pixel effects and improves the spatial fidelity of feature-to-plot correspondence. The extracted remote sensing factors are as follows:
- (1)
Original Single-Band Factors:
Includes blue (B2), green (B3), red (B4), red-edge (B5), near-infrared (NIR) (B6, B7, B8A), and shortwave infrared (SWIR) (B11, B12).
- (2)
Texture Feature Factors:
Includes mean (ME), variance (Var), homogeneity (Hom), contrast (Con), correlation (Cor), dissimilarity (Dis), angular second moment (ASM), and entropy (Ent), which were extracted using a 7 × 7 window (70 × 70 m) with the gray level co-occurrence matrix method. The analysis considered four primary directions to capture anisotropic canopy patterns, and gray-level quantization was set to 16 levels to balance computational efficiency and textural detail. Each original single-band factor corresponds to eight second-order texture metrics, capturing spatial arrangement and radiometric contrast within canopy patterns—attributes closely linked to forest stand density, structural complexity, and species diversity.
- (3)
Vegetation Index Factors:
Includes chlorophyll index (CI), red-edge chlorophyll index (CIRE), difference vegetation index (DVI), enhanced vegetation index (EVI), red-edge enhanced vegetation index (EVIRE1, EVIRE2, EVIRE3), narrow near-infrared enhanced vegetation index (EVINNIR), green normalized difference vegetation index (GNDVI), red-edge chlorophyll index (IRECI), soil-adjusted vegetation index (MSAVI), morphological shadow index (MSI), normalized difference vegetation index (NDVI, NDVI705), plant senescence reflectance index (PSRI), renormalized difference vegetation index (RDVI), soil-adjusted vegetation index (SAVI), and vegetation stress index (STVI). These indices capture different aspects of canopy vigor, chlorophyll content, water stress, and senescence, which are highly relevant for volume variation.
- (4)
Canopy Height Model (CHM):
Includes the maximum value, mean, standard deviation, variance, and various height percentiles (5%, 20%, …, 90%) within the plot area, used as modeling variables. These metrics characterize vertical heterogeneity and canopy complexity, which are well-established predictors of aboveground volume, particularly in heterogeneous, multi-strata, and species-diverse forest ecosystems.
In total, 53 candidate features were extracted, including 10 original spectral bands, over 8 GLCM texture metrics, 20 vegetation indices, and 15 structural descriptors derived from CHM. Given the substantial redundancy among the extracted features from Sentinel-2 and ZY-3, applying all variables directly would degrade model performance and computational efficiency due to the curse of dimensionality. Therefore, feature selection is performed before modeling to improve training accuracy and interpretability. This feature selection process was not merely statistical but strategically designed to align with the requirements of machine learning models, ensuring that only the most informative variables were retained for downstream learning algorithms. Recursive feature elimination (RFE) is an iterative wrapper-based technique that ranks features by their predictive contribution and progressively removes the least informative ones. In this study, a spatially stratified 10-fold cross-validation was used during RFE, with folds created via k-means clustering of plot coordinates to ensure geographic contiguity. This approach minimized spatial autocorrelation and preserved regional variability in forest types and topographic gradients. RFE has proven to be highly effective in machine learning and was employed in this study to optimize remote sensing variables, enhancing model performance and efficiency [
41,
42]. Additionally, the RFE methodology was introduced by Guyon, Weston, and others in 2002 and operated on a wrapper model and cross-validation principle, iteratively removing features that do not contribute to classification accuracy or hinder model performance until the best feature set is found [
43,
44]. This process not only simplifies model complexity but also enhances computational scalability for large-scale volume applications. The refined feature subset obtained through RFE was subsequently used as input to train and evaluate multiple machine learning models, enhancing computational efficiency, model interpretability, and generalization capability across structurally diverse forest environments and unseen spatial domains.
2.5. Forest Stock Volume Model Construction and Evaluation
In forestry research, the integration of multi-source remote sensing and machine learning techniques for estimating forest stock volume has proven to be reliable. In this study, four machine learning algorithms were employed, each optimized via Bayesian hyperparameter optimization with 50 iterations and 5-fold cross-validation, including random forest (RF), k-nearest neighbors (KNN), gradient boosting regression trees (GBRT), and multiple linear regression (MLR). Random forest (RF), an ensemble method that constructs multiple decision trees and averages their predictions, was chosen for its high accuracy, adaptability to large datasets, and resistance to overfitting. It is widely used in forestry for variable prediction. K-nearest neighbors (KNN), a non-parametric supervised algorithm, bases its predictions on the similarity of nearby data points. It is prized for its straightforward implementation and reliable accuracy and has seen previous applications in forestry like species classification and timber volume estimation. Gradient boosting regression trees (GBRT), which build an ensemble of weak decision-tree models, were selected due to their capacity to capture complex nonlinear relationships and have been successfully applied in forestry for accurate parameter estimation. Multiple linear regression (MLR), a foundational statistical method for modeling linear relationships between variables, was chosen as a baseline for comparison thanks to its simplicity and interpretability. It has also been widely used in forestry for volume estimation, such as in modeling forest stock volume in secondary forests of the Saihanwula region. These machine learning algorithms were not only selected based on prior forestry applications, but were systematically optimized using cross-validated hyperparameter tuning strategies (Bayesian and grid search), reinforcing their methodological integration into the modeling framework. Before the training process, hyperparameters like the number of estimators (n_estimators), learning rate (learning_rate), subsample size (subsample), maximum depth (max_depth), and minimum samples per leaf (min_samples_leaf) were fine-tuned [
23,
45]. Ten-fold cross-validation approach was utilized, where one sample from the 415-sample dataset was taken as the validation set and the remaining 414 samples served as the training set. This approach guarantees a thorough evaluation, reduces random bias, and produces stable results. The evaluation metrics adopted in this research cover cross-validation results, the coefficient of determination (R
2), root mean square error (RMSE), and relative root mean square error (rRMSE) [
46]. All the models were implemented using the scikit-learn library in Python version 3.12. To ensure a robust performance evaluation, model performance was evaluated via the spatially stratified 10-fold cross-validation, with stratification by dominant tree species to maintain inter-class balance. Prior to modeling, to ensure a robust performance evaluation, model performance was evaluated via the spatially stratified 10-fold cross-validation, with stratification by dominant tree species to maintain inter-class balance. Prior to modeling, the dataset was first split into training (80%) and testing (20%) subsets. Min-Max normalization was then applied independently to each subset, with scaling parameters derived solely from the training data. This approach ensures test set statistics do not influence the normalization process. For hyperparameter optimization, a grid search with 5-fold cross-validation was carried out separately for each model. Specifically, the GBRT model was tuned for the number of estimators (n = 100–500), learning rate (0.01–0.1), and maximum tree depth (3–8), while the RF model was optimized for the number of trees, maximum features, and minimum samples per split. Hyperparameters were optimized using Bayesian search with 50 iterations and 10-fold cross-validation. The gradient boosting regression trees (GBRT) achieved optimal performance with n_estimators = 110, learning_rate = 0.05, max_depth = 9, subsample = 0.5, and min_samples_leaf = 0.6. For the random forest (RF) model, n_estimators = 200, max_depth = 11, and min_samples_split = 4 were selected. The k-nearest neighbors (kNN) model used n_neighbors = 7 with distance-weighted predictions. These configurations were chosen to optimize model complexity and generalization across diverse forest types. Ultimately, the performance of all models was quantitatively assessed using the coefficient of determination (R
2) and root mean square error (RMSE) on the independent test dataset.
The machine learning workflow, encompassing feature scaling, recursive selection, model tuning, and stratified validation, constitutes a robust pipeline aimed at capturing nonlinear interactions between canopy structure and spectral indicators. This pipeline ensures reliable prediction performance even in ecologically heterogeneous forest settings such as Saihanwula.
where y
i represents the measured stock volume,
is the estimated stock volume,
is the mean of the observed stock volume of the stand, and
N is the actual sample size.
4. Discussion
Feature factor selection is crucial for forest volume estimation. Previous studies have mostly relied on satellite spectral data and texture features for model construction [
48]. However, the saturation of spectral data severely limits the accuracy of remote sensing-based volume estimation. To explore the potential of multi-source data collaboration in improving estimation performance, this study extracted high-precision Canopy Height Model (CHM) data (RMSE = 4.10) by analyzing the differences between open-source digital elevation models (DEM) and ZY-3 digital surface models (DSM), thus expanding the feature selection range and enriching data dimensions while mitigating the risk of overfitting. Previous research has shown that CHM plays a structurally complementary role in overcoming spectral saturation, particularly when combined with red-edge indices, and holds great potential for improving forest volume estimation accuracy. For instance, Zhang et al. used Landsat 8 OLI and ZY-3 data to estimate Eucalyptus forest volume and confirmed that CHM helps alleviate optical data saturation [
49,
50]. Similarly, researchers successfully extracted high-precision information from ZY-3 data by differentiating between growing and dormant seasons of Larix forest plantations [
51]. These results align with the findings of this study, confirming the strong correlation between CHM and forest volume. Moreover, the scatter plots of the model demonstrated that combining CHM with spectral and texture data largely improved volume estimation accuracy (R
2 = 0.75). In this study, the CHM extracted from ZY-3 stereo pairs demonstrated a strong positive correlation with observed AGB, particularly when combined with red-edge and texture indices. However, a closer inspection of residuals revealed that the model tended to underestimate volume in high-density plots. This behavior indicates a potential limitation of the CHM in capturing canopy complexity in multi-strata forests. Given that the RMSE of CHM was approximately 4.10 m, errors may propagate in areas where tree height gradients are subtle, but volume accumulation is substantial—especially in mature stands. Comparable saturation effects have been reported in high-volume forests where structural height no longer scales with volume, typically observed at volumes >180 m
3/ha [
52]. This aligns with the upper quartile of our study’s volume distribution (110–150 m
3/hm
2), where wood density variations decouple height and volume relationships in mature stands. Our findings support this interpretation, as the model’s underestimation was more pronounced in the upper quartile of plot AGB. These observations emphasize the need for structural inputs beyond single-layer canopy height in regions with old-growth characteristics. While similar patterns were reported by researchers in subtropical forests, our results highlight that even mid-resolution CHMs such as those derived from ZY-3 can reach saturation thresholds that must be accounted for in operational volume mapping [
14,
53].
In this study, prior to RFE, multicollinearity was assessed using the variance inflation factor (VIF). This preprocessing step ensured model stability by mitigating collinear effects. RFE was then applied to the refined feature set to select optimal predictors, with spectral indices dominating importance. While CHM features showed modest single-feature importance, their combined contribution and ablation test results confirm their indispensable role in mitigating spectral saturation in high-biomass stands. This aligns with the theory that spectral indices capture physiological signals, while CHM provides structural context—a synergy that multi-source fusion uniquely enables. While CHM-related features demonstrated only moderate ranking in the RFE-derived individual importance scores, their combined effect—especially when fused with red-edge vegetation indices—contributed significantly to the model’s predictive performance. This observation is supported by the ablation results, where the exclusion of CHM features led to a notable reduction in R2 and increase in RMSE. Hence, although CHM may not emerge as the top-ranked single variable, it offers crucial complementary structural information that enhances volume prediction in complex canopy conditions. This nuanced role underscores the importance of interpreting feature contributions in a multidimensional context rather than relying solely on isolated rankings.
This finding aligns with [
54], who demonstrated that optimized feature selection techniques, including VIF-Lasso and Boruta, significantly boosted AGB prediction accuracy from multi-source data (R
2 = 0.75, RMSE = 16.5 m
3/hm
2), reinforcing the critical role of variable screening in model robustness. The analysis of feature factor interpretability showed that the red-edge chlorophyll index (IRECI) and the enhanced vegetation index in the red-edge region (EVIRE3) from Sentinel-2 imagery had high explanatory power for volume, indicating that red-edge band vegetation indices could influence estimation accuracy. This is consistent with the characteristic sensitivity of optical imagery’s red-edge bands to vegetation [
55]. Additionally, high-resolution optical imagery texture features also contributed positively to forest volume estimation, partially mitigating spectral data saturation [
56]. However, in this study, texture features had a relatively low correlation with volume and did not effectively improve model interpretability. This may be due to a mismatch between the sample size and pixel resolution, leading to the influence of mixed pixels on volume estimation accuracy. Moreover, the complex terrain in mountainous areas makes it difficult to capture forest structural information, and there are spectral and textural differences between tree species that affect remote sensing performance. Future research should focus on extracting high-resolution texture information and improving scale matching techniques to enhance forest volume estimation [
17,
57,
58,
59,
60]. Although several texture metrics were included in the modeling process, their overall contribution to volume prediction remained limited in this study. This finding, supported by the recursive feature elimination analysis, points to low correlation and interpretability of texture features across the full range of volume plots. A probable explanation lies in the resolution mismatch between texture derivation and ground reference scale: Sentinel-2 imagery, from which most texture features were extracted, has a spatial resolution of 10–20 m, whereas field plots were substantially smaller. This mismatch likely led to pixel mixing effects, especially in edge-dominated plots or areas with high species diversity. Moreover, complex topography and sun–canopy geometry in the Saihanwula Reserve may have introduced directional artifacts into texture layers, further reducing their reliability. As texture features are known to be sensitive to directional lighting and shadowing, their effective use in heterogeneous forest landscapes may require image preprocessing steps such as topographic normalization or directional filtering, which were beyond the scope of this study. Future research should explore these enhancements to unlock the full potential of texture-based structure proxies.
From a modeling perspective, the results of this study showed that models constructed using the gradient boosting regression tree (GBRT) algorithm outperformed models built using multiple linear regression (MLR), k-nearest neighbors (KNN), and random forests (RF) in terms of estimation accuracy. The testing set accuracy (R
2 = 0.86) aligns with similar multi-source fusion studies (R
2 = 0.82–0.89), but this similarity reflects shared methods (optical-LiDAR fusion, ensemble models) and limitations (spectral saturation, CHM errors in rugged terrain). However, when using the GBRT algorithm for volume estimation based on CHM data, there is an underestimation of volume for high-volume areas. This phenomenon may be attributed to two reasons: First, high-volume forests often have multi-layer canopy structures, and ZY-3′s CHM data may struggle to accurately distinguish different canopy layers, particularly in densely distributed trees. The registration errors in such areas may lead to a higher degree of underestimation [
61]. Second, the model may not sufficiently account for tree species diversity during volume estimation, and dense vegetation can complicate signal reflection, masking the true forest structure and reducing estimation accuracy [
62]. Comparable results have been reported by [
63], who reviewed remote sensing-based AGB models and emphasized that ensemble methods consistently outperformed parametric models, particularly when combined with well-curated remote sensing variables. This study did not explicitly stratify models by forest type; the variation in residual patterns across plots suggests that structural heterogeneity and species composition influence estimation performance. In particular, plots dominated by broadleaf species with irregular canopy layering showed more dispersed residuals, even in moderate volume ranges (e.g., 50–100 Mg/ha), compared to relatively uniform coniferous stands. These results, while not conclusive, imply that allometric diversity and inter-species variation in spectral response might limit the ability of a single model to generalize across complex landscapes. Previous studies in boreal and temperate forests [
64,
65] have identified similar trends, where species-specific models yielded lower error rates. In future work, our model could be further improved by including species indicator variables or developing forest-type-specific versions of GBRT, particularly for regions like Saihanwula, where ecological composition is diverse [
60].
Overall, the GBRT model demonstrates good robustness and stability, significantly improving the accuracy of large-scale forest volume estimations. Future research should further explore parameter adjustments and feature engineering in the GBRT model to optimize its performance and better meet practical application needs [
66]. The results also demonstrated that the fusion of CHM and spectral–textural features significantly enhanced model performance relative to single-source datasets. Compared to CHM-only (ZY-3) and spectral-only (Sentinel-2) configurations, the fusion dataset yielded the highest R
2 (0.75) and lowest RMSE (8.16 m
3/hm
2), with a reduction in estimation error of up to 13.36%. This improvement reflects the complementary value of structural and spectral information: while canopy height metrics capture forest vertical complexity, spectral indices—especially those from the red-edge region—offer physiological and biochemical insights. Their integration allows the model to better resolve volume variability across different forest stand types and structural gradients, particularly in heterogeneous environments like the Saihanwula Reserve. Similar improvements in model accuracy through the integration of CHM and spectral features were also reported by [
67,
68], who found that combining ZY-3 stereo data with Sentinel-2 and DEM inputs significantly enhanced AGB estimation in coniferous forests of northern China (R
2 = 0.77, RMSE = 13.2 Mg/ha), supporting the effectiveness of data fusion observed in our study. Beyond accuracy metrics, the spatial distribution of predicted forest stock volume further supports the superiority of the fusion model. While Sentinel-2 estimates were mainly clustered in the 50–90 m
3/hm
2 range and ZY-3 predictions were more variable but skewed, the fusion-based outputs showed a more uniform and realistic spread across the 50–110 m
3/hm
2 range. This pattern better aligns with expected ground conditions and field plot distributions, especially in the high-volume southeastern zone of the reserve. Such improvements in spatial consistency are essential for practical applications like stand-level forest management, sustainable yield planning, and spatially explicit carbon accounting. An often-overlooked source of error in volume modeling arises from scale mismatches and geolocation discrepancies among multi-source datasets. In this study, CHM was constructed from ZY-3 stereo imagery, spectral features were derived from Sentinel-2, and LiDAR plots provided field-based references. Despite careful co-registration, inherent differences in spatial resolution and temporal acquisition can introduce localized inconsistencies, particularly along terrain transitions or forest–nonforest boundaries [
5,
50,
69,
70]. For example, Sentinel-2′s coarser resolution may obscure fine-scale heterogeneity captured by ZY-3, leading to spectral homogenization in mixed-pixel zones. In turn, this can propagate through the model, introducing residual errors unrelated to the predictive features themselves. Additionally, the time lag between data acquisitions may exacerbate differences in vegetation phenology, which affects both spectral indices and CHM measurements. While these effects are partially mitigated through plot buffering and normalization techniques, their influence remains non-negligible, especially in rugged, seasonally dynamic environments like Saihanwula. While demonstrating multi-source fusion’s efficacy for mountain forest volume estimation, operational deployment requires external validation across ecoregions, uncertainty quantification with confidence intervals for forest volume reporting, and multi-decadal temporal analysis using satellite archives to resolve disturbance impacts [
71,
72,
73]. These advances bridge scientific innovation with policy-ready forest management. Incorporating co-temporal, high-resolution datasets or applying cross-sensor fusion techniques may help reduce these cross-scale distortions in future studies [
74,
75,
76]. Beyond model selection, this study underscores the importance of targeted feature engineering. The superior performance of GBRT in this context was driven not only by its ensemble structure, but by the integration of well-curated variables—particularly CHM, IRECI, EVIRE3, and selected texture bands. Nonetheless, residual dispersion in high-volume plots suggests that further tuning of model depth, learning rate, or integration of temporal features may enhance robustness. More broadly, these findings suggest that even with minimal ground data, multi-source remote sensing—if properly harmonized—can deliver volume estimates comparable to those reported in UAV-LiDAR-supported studies [
77,
78,
79,
80]. This scalability offers tangible value for regional forest inventory systems and supports emerging needs in carbon monitoring and REDD+ implementation, especially in montane and semi-humid forest zones. These challenges, while technical, have important implications for the transferability and operational use of the model in regional forest monitoring systems [
81,
82,
83].
5. Conclusions
This study on forest volume estimation via multi-source data fusion enhances forest resource understanding and offers practical value for management and policy; integrating CHM with optical data overcomes optical imagery saturation, improving estimation accuracy—crucial for complex terrain and diverse species. However, model accuracy is subject to data scale mismatches and single-temporal data limitations. Future work should include rigorous external validation using independent datasets. The approach supports sustainable logging quotas and conservation strategies. While showcasing multi-source fusion potential, it acknowledges limitations, urging future research to prioritize scale matching, temporal dynamics, and mechanistic modeling for improved biomass estimation.
We extracted 45 features from the multi-source fusion data, which were refined using recursive feature elimination (RFE) to a set of 14 key features for volume estimation. Among these, CHM data emerged as a critical variable, reflecting the forest canopy’s vertical structure and confirming its importance in accurately estimating volume. The multi-source fusion outperformed both Sentinel-2 and ZY-3 datasets, achieving R2 values ranging from 0.51 to 0.75 and reducing RMSE values to between 6.90 and 9.29 m3/hm2. These improvements highlight the synergistic value of integrating canopy height information with spectral data in a machine learning framework, where structural features address limitations of spectral saturation, particularly in areas with challenging terrains. The enhanced accuracy of stand volume estimation provides critical inputs for operational forestry planning, enabling data-driven decisions on thinning intensity and harvest scheduling. This study provides more reliable carbon stock assessments, offering valuable data for carbon trading, forest conservation policies, and greenhouse gas inventory reporting. Moreover, these findings provide not only technical validation but also practical relevance for national forest inventories and international climate commitments. Accurate and scalable volume maps can provide a regional template for the monitoring of REDD+ in temperate forest–steppe regions. It is applicable to the global MRV framework and needs to be further verified in other forest biomes as part of the future research agenda. Thus, the approach demonstrated here contributes toward aligning remote sensing-based volume assessment with global forest carbon accounting standards. Thus, the research supports the forestry sector’s role in climate change mitigation efforts. The gradient boosting regression tree (GBRT) algorithm outperformed other machine learning models, such as k-nearest neighbors (KNN), multiple linear regression (MLR), and random forest (RF), by effectively handling nonlinear relationships and integrating multi-source data. This underscores the potential of machine learning techniques in forest parameter prediction and encourages their broader use in forest monitoring and management systems.
In summary, this study bridges the gap between scientific research and practical forest management by providing methodologies that can be integrated into policy frameworks to promote sustainable forest management and environmental sustainability. Future research should focus on incorporating additional remote sensing technologies, such as LiDAR and SAR, to further refine volume estimation models. Furthermore, applying deep learning techniques like convolutional neural networks (CNNs) could significantly enhance the accuracy of volume predictions, addressing challenges such as underestimating volume in high-density forests and improving overall model performance.
Future improvements should also consider structural refinements of the CHM itself—e.g., by generating seasonal CHMs or using UAV-based height calibration to minimize underestimation in high-volume zones. Incorporating phenological variation into input features could further address temporal mismatches and improve sensitivity in regions with deciduous or mixed-species canopies.
Ultimately, the methodological framework provides a proof of concept for operational monitoring platforms in temperate forest–steppe zones, with cross-regional validation required to confirm broader applicability, high-resolution remote sensing, and scalable model architectures. The volume estimation methodology provides data support for forest resource monitoring under dynamic conditions, with integration into eco-economic modeling identified as a critical step toward adaptive management.