Accuracy Assessment of Remote Sensing Forest Height Retrieval for Sustainable Forest Management: A Case Study of Shangri-La

Xu, Haoxiang; Zuo, Xiaoqing; Li, Yongfa; Yang, Xu; Zhang, Yuran; Li, Yunchuan

doi:10.3390/su172210067

Open AccessArticle

Accuracy Assessment of Remote Sensing Forest Height Retrieval for Sustainable Forest Management: A Case Study of Shangri-La

by

Haoxiang Xu

¹,

Xiaoqing Zuo

¹,

Yongfa Li

^1,*

,

Xu Yang

^2,*,

Yuran Zhang

¹ and

Yunchuan Li

¹

Faculty of Land Resources Engineering, Kunming University of Science and Technology, Kunming 650093, China

²

School of Architecture and Civil Engineering, Kunming University, Kunming 650500, China

^*

Authors to whom correspondence should be addressed.

Sustainability 2025, 17(22), 10067; https://doi.org/10.3390/su172210067

Submission received: 23 September 2025 / Revised: 6 November 2025 / Accepted: 7 November 2025 / Published: 11 November 2025

(This article belongs to the Section Sustainable Forestry)

Download

Browse Figures

Versions Notes

Abstract

Forest height is a critical parameter for understanding ecosystem functions, assessing carbon stocks, and supporting sustainable forest management. Its accurate measurement is essential for climate change mitigation and understanding the global carbon cycle. While traditional methods like field surveys and airborne LiDAR provide accurate measurements, their high costs and limited spatial coverage make them impractical for the large-scale, dynamic monitoring required for effective sustainability initiatives. This research presents a multi-source remote sensing fusion approach to tackle this problem. For regional forest height inversion, it includes Sentinel-1 SAR, Sentinel-2 multispectral images, ICESat-2 lidar, and SRTM DEM data. Sentinel-1 + ICESat-2 + SRTM, Sentinel-2 + ICESat-2 + SRTM, and Sentinel-1 + Sentinel-2 + ICESat-2 + SRTM were the three data combination methods built using Shangri-La Second-class Category Resource Survey data as ground truth. An accuracy assessment was performed using three machine learning models: Light Gradient Boosting (LightGBM), Extreme Gradient Boosting (XGBoost), and Random Forest (RF). Based on the results, the ideal configuration using the LightGBM model and the following sensors: Sentinel-1, Sentinel-2, ICESat-2, and SRTM yields a correlation coefficient of 0.72, an

R M S E

of 5.52 m, and an

M A E

of 4.08 m. The XGBoost model obtained

r

= 0.716,

R M S E

= 5.55 m, and

M A E

= 4.10 m using the same data combination as the Random Forest model, which produced

r

= 0.706,

R M S E

= 5.63 m, and

M A E

= 4.16 m. The multi-source comprehensive fusion technique produced the greatest results; however, including either Sentinel-1 or Sentinel-2 enhances model performance, according to comparisons across multiple data combinations. This work presents an efficient technological strategy for monitoring forest height in complex terrains, thereby providing a scalable and robust methodological reference for supporting sustainable forest management and large-scale ecological assessment. The proposed multi-source spatiotemporal fusion framework, coupled with systematic model evaluation, demonstrates significant potential for operational applications, especially in regions with limited LiDAR coverage.

Keywords:

forest height retrieval; multi-source data fusion; ICESat-2; Sentinel-1; Sentinel-2; precision monitoring; sustainable forest management

1. Introduction

Forest height is a key parameter characterizing forest vertical structure and ecosystem function, playing a vital role in sustainable forest management, carbon stock estimation, stand structure analysis, and global carbon cycle research. Precise measurement and monitoring of forest height provide essential support for evaluating forest ecosystem services, advancing sustainable development, and addressing climate change [1,2,3,4]. Traditional forest height measurement primarily relies on ground plot surveys, which, despite high accuracy, suffer from low efficiency, high costs, and limited coverage, making them unsuitable for large-scale, high-timeliness monitoring requirements [5]. Therefore, achieving accurate large-scale forest height inversion using remote sensing technology has become a major research focus in forestry remote sensing [6,7].

For the purpose of accurately estimating forest heights, lidar (Light Detection and Ranging) data may be directly collected from the canopy, revealing its vertical structure [8,9,10]. The capability of remotely sensing forest height using the Geoscience Laser Altimeter System (GLAS) data was proven by early research like Lefsky et al. [11]. Simard et al. [12] developed a worldwide forest height map by merging GLAS, the Shuttle Radar Topography Mission (SRTM), and the Moderate Resolution Imaging Spectroradiometer (MODIS) data. Forest height retrieval has benefited greatly from the photon-counting LiDAR technology aboard the recently launched the Ice, Cloud, and land Elevation Satellite-2 (ICESat-2) owing to the satellite’s superior spatial resolution and signal-to-noise ratio [13,14,15,16]. Nevertheless, due to LiDAR’s limited sample capabilities, combining it with other remote sensing data is still necessary for its use on a regional scale [17,18].

Multi-source remote sensing data fusion provides a robust foundation for forest height modeling by leveraging the complementary strengths of different sensors [19,20]. For instance, synthetic aperture radar (SAR) data (e.g., Sentinel-1) are sensitive to the three-dimensional structure of forests and provide all-weather observation capabilities [21]; optical data (e.g., Sentinel-2) offer rich spectral information for identifying vegetation types and canopy conditions; SRTM terrain data aid in explaining how topography affects height distribution. Researchers have shown that using optical and radar data together greatly improves the accuracy of forest parameter estimation [22,23], and adding lidar samples further increases the generalizability of the model to other kinds of forests [24]. As a result, the feature basis for forest height modeling is more extensive and robust when multi-source remote sensing fusion is used [25,26,27].

But there are a lot of obstacles to multi-source fusion as well. Data registration and fusion become more complicated due to differences in spatial and temporal resolution among remote sensing data [28,29]. Additionally, optical, radar, and lidar data all require different processing approaches, which makes it technically challenging to efficiently integrate heterogeneous data in order to fully utilize information potential [30,31]. Model performance and stability may also be affected by data quality concerns, such as cloud cover, atmospheric conditions, and noise interference. This is especially true in complicated situations [32].

Regarding modeling approaches, traditional linear regression struggles to capture the complex nonlinear relationships among remote sensing features. In recent years, machine learning methods have gained prominence due to their advantage in handling high-dimensional data. Random Forest (RF) is widely applied in forest parameter estimation for its stability and interpretability [33,34]; Extreme Gradient Boosting (XGBoost) efficiently processes large-scale data through its gradient boosting framework [35]; Light Gradient Boosting Machine (LightGBM) further optimizes training speed and memory usage, making it suitable for modeling high-dimensional remote sensing features [36]. Research consistently demonstrates that these ensemble learning methods outperform traditional models in forest biomass and height inversion. It should be noted that relying solely on remote sensing data without effective machine learning methods makes it difficult to fully explore the complex relationships between features. Conversely, relying solely on machine learning models without multi-source data support makes it difficult for the model to accurately capture the spatial heterogeneity of forest structure.

Compared to previous studies, this work makes several distinct contributions: First, it systematically benchmarks three distinct multi-source data fusion schemes (incorporating Sentinel-1, Sentinel-2, ICESat-2, and SRTM) for forest height inversion, explicitly delineating their complementary characteristics in complex terrain. Second, under a unified geographical framework, it provides an empirical comparison of three advanced ensemble learning algorithms (LightGBM, XGBoost, and RF), offering practical guidance for model selection. Furthermore, and of significant practical value, this study pioneers the exploration of feasible inversion pathways in the absence of ICESat-2 data, thereby proposing a viable and scalable alternative for large-scale operational monitoring.

2. Materials and Methods

2.1. Technical Approach

There are five primary stages to the research pipeline, as shown in Figure 1. The first step was the collection and organization of forest inventory data in addition to multi-source remote sensing data from Sentinel-1, Sentinel-2, ICESat-2, and SRTM. Second, all of the remote sensing datasets were preprocessed using noise reduction, atmospheric correction, geometric correction, and radiometric calibration. Spectral bands, vegetation indices, and texture features from Sentinel-2; altimetry variables from ICESat-2; elevation and terrain metrics from SRTM; and backscatter coefficients and interferometric coherence from Sentinel-1, were the next feature variables systematically extracted from the preprocessed data. To execute the forest height inversion experiments with various data combinations, we employed three machine learning models: Random Forest, XGBoost, and LightGBM. The performance of each model and data combination was ultimately assessed by calculating the correlation coefficient (

r

), along with the root mean square error (

R M S E

) and mean absolute error (

M A E

).

2.2. Overview of the Study Area

Situated near the southern border of the Qinghai–Tibet Plateau and inside the core region of the Hengduan Mountains, the research area is located in Shangri-La City, Diqing Tibetan Autonomous Prefecture, Yunnan Province, China (26°52′–28°52′ N, 99°20′–100°19′ E) (Figure 2). Alpine canyon landscape with steep inclines and declines from 1800 to 5500 m in height characterizes the region. Annual precipitation ranges from 600 to 800 mm, with the majority falling between June and September, with temperatures ranging from 5 to 12 °C. The climate is characterized by a plateau monsoon pattern, which is shaped by the South Asian monsoon and complicated topography.

The region boasts high forest coverage, serving as a crucial natural forest distribution area in Southwest China. Primary vegetation types include Abies spp., Picea spp., Pinus yunnanensis, and Quercus spp. Distinct vertical forest zonation spans from sparse Yunnan pine forests at lower elevations to pristine fir forests at higher altitudes, forming a complex and diverse forest spatial structure. Due to its typical terrain–vegetation combinations and complete vertical zonation spectrum, this region has become an ideal area for forest remote sensing monitoring and ecosystem research.

2.3. Research Data

2.3.1. Spatial–Temporal Data Fusion Strategy

A systematic spatiotemporal fusion strategy was employed to harmonize the multi-source remote sensing data. All datasets were first resampled to a consistent 10-m spatial resolution and unified to the WGS84 global coordinate system to establish a common spatial baseline. Precise geometric correction was then applied to achieve sub-pixel co-registration (targeting registration error < 0.5 pixels). To mitigate phenological influences, the acquisition window for all remote sensing data was confined to the period of October through December 2018. Finally, to integrate the discrete ICESat-2 LiDAR data with the continuous satellite imagery, the median canopy height value of all ICESat-2 footprints within a 10-m radius buffer was assigned to each corresponding Sentinel pixel.

2.3.2. Sentinel-1 Data

This research made use of Sentinel-1 (S-1) data that was supplied by the European Space Agency (ESA). Ground Range Detected (GRD) and Single Look Complex (SLC) products obtained in Interferometric Wide Swath (IW) mode make up this C-band SAR data. In order to determine derived radar indices and extract backscatter coefficients, the GRD products were used. On the other hand, the surface targets’ temporal stability was characterized by the SLC products, which produced interferometric coherence characteristics based on bi-temporal.

In total, eight features were extracted from the S-1 data, including two backscatter coefficients, two interferometric coherence features, and four derived indices (see Table 1). The entire processing workflow was performed using ESA SNAP software (version 12.0.0). and corresponding plugins. The sampling time is as shown in Table 2.

2.3.3. Sentinel-2 Data

This investigation used the European Space Agency’s Sentinel-2 (S-2) Level-2A surface reflectance products. Six photos taken between October and December 2018 were chosen for this analysis. After resampling all bands to a spatial resolution of 10 m, 28 predictive factors were carefully separated into three groups. To start, 12 spectral bands were chosen as the primary spectral features: B1, B2, B3, B4, B5, B6, B7, B8, B8A, B9, B11, and B12. Additionally, six vegetation indices were computed using data from the SNAP software’s biophysical processor. These included the Normalized Difference Vegetation Index (NDVI) which is equal to (B8 − B4)/(B8 + B4), as well as the Fraction of Absorbed Photosynthetically Active Radiation (FAPAR), vegetation cover (FCOVER), leaf area index (LAI), chlorophyll content (CAB), and canopy water content (CW). Additionally, ten texture features were generated from the normalized difference vegetation index (NDVI) using a 5 × 5 gray-level co-occurrence matrix (GLCM), with the NDVI serving as the base layer for the calculation (see Table 3).

2.3.4. ICESat-2 Data

In this investigation, forest height data was obtained using the PhoREAL_v3.26 professional program [37] utilizing ICESat-2 ATL08 Version 5 data, henceforth referred to as ICE-2. Products obtained at the same time, ATL03 for global geolocated photon data and ATL08 for land vegetation elevation data, were inputted at the same time during processing. The parameters “Terrain Best Fit (m)” and “Max Canopy (m)” were extracted using the method. The difference between the two numbers was then used to determine the forest canopy height, which was directly seen by ICESat-2. There were 12,047 legitimate laser imprints produced by using this height value as an independent variable. Reliable vertical structural references for forest height inversion were provided by these footprints, which showed uniform spatial distribution throughout the research region. A map depicting the footprint of the research region shows the geographical dispersion of this dataset (Figure 3). Table 4 shows the sampling time.

2.3.5. SRTM Data

In this investigation, the data used are the digital elevation models (DEMs) produced by the Space Shuttle Radar Topography Mission (SRTM), which were made public by the NSA and NASA together. This dataset accurately describes the topographic undulations in the research region with a spatial resolution of around 30 m [38]. During data processing, two key topographic factors—slope gradient and aspect—were derived from the DEM using GIS terrain analysis tools. Ultimately, SRTM elevation (DEM), slope, and aspect were incorporated as independent variables describing terrain characteristics into the forest height inversion model. This approach mitigates terrain-induced interference in remote sensing signals, thereby enhancing inversion accuracy.

2.3.6. Ground Sample Data

This study uses the 2016 Shangri-La Second-class Category Resource Survey data (hereafter SL-SCRS) as ground truth to assess the performance of forest canopy height estimates generated by integrating ICE-2, S-1, S-2, and SRTM remote sensing data. The dataset contains 12,047 valid sample points, which cover the primary forest types and topographic gradients in the region and are well-distributed spatially, thus providing a good representation of the regional forest structural variation. All samples were randomly split into a training set (8433 samples) and a test set (3614 samples) using a 7/3 ratio, ensuring no statistically significant differences in canopy height distribution or spatial location between them. Professionally collected through field plot measurements, the SL-SCRS data encompass key forest characteristics—including forest type, stand structure, and mean top height—equipping it to reliably benchmark the results derived from remote sensing.

To enable a systematic evaluation of inversion accuracy, the validated canopy heights from the SL-SCRS dataset were used as a reference. Statistical metrics were then employed to compare this ground truth against the ICESat-2 ATL08 product as well as the heights estimated from the integrated ICE-2, S-1, S-2, and SRTM data.

2.4. Modeling Approach

The authors of this paper simulate forest height retrieval using multi-source remote sensing data using three state-of-the-art ensemble learning algorithms: LightGBM, XGBoost, AND Random Forest (RF). The three algorithms can examine the intricate nonlinear correlations between attributes and forest height from different angles since they all use decision trees as their basis models. However, their growth tactics for tree structures and optimization aims are different.

In order to drastically improve training efficiency, LightGBM uses a histogram-based approach with a leaf-wise growth strategy in conjunction with Gradient One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) methods. In order to achieve a balance between accuracy and efficiency, the model parameters were fine-tuned. These parameters mainly include: feature fraction at 0.8, number of leaves at 31, and minimum data in leaf nodes (min_data_in_leaf) set at 20. This model is great at dealing with high-dimensional features and massive datasets because it determines feature relevance by calculating the information gain and the number of times a feature is a split point across all trees.

A regularized objective function is minimized repeatedly via XGBoost’s use of a gradient boosting framework to build decision trees. The learning rate, max depth, and subsample parameters should be set to 0.1, 6, and 0.8, respectively, for optimal results. Both the feature’s occurrence frequency across all trees and the decrease in loss function it provides are taken into account when evaluating its relevance. When dealing with complicated nonlinear interactions, this approach really shines.

Using Bootstrap sampling and random feature selection, Random Forest builds numerous decision trees using a Bagging ensemble technique. Among its parameter settings that are used to regulate model complexity and diversity are the following: n estimators, which is set to 100, max features, which is set to sqrt, and random state, which is set to 42. In order to determine the significance of features, this approach adds up the decrease in Gini impurity that each feature has contributed across all trees. It has a robust resistance to overfitting and steady training.

2.5. Evaluation of Model Accuracy

To systematically evaluate the accuracy of forest height inversion results, this study employs a widely used quantitative evaluation metric system comprising Root Mean Square Error (

R M S E

), Mean Absolute Error (

M A E

), and Pearson’s Correlation Coefficient (

r

). The definitions of each metric are as follows:

Root Mean Square Error ( $R M S E$ )

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}

(1)

An improved model’s accuracy is shown by a reduced root-mean-squared error (

R M S E

), which evaluates the total divergence between the predicted and observed values.

2.: Mean Absolute Error ( $M A E$ )

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - \hat{y_{i}}|

(2)

The

M A E

is a simple way to see how off the model is; it is a measure of the average discrepancy between the expected and actual values. Performance improves as the value decreases.

3.: Correlation Coefficient ( $r$ )

r = \frac{\sum_{i = 1}^{n} (y_{i} - \bar{y}) (\hat{y_{i}} - \bar{\hat{y}})}{\sqrt{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2} \sqrt{\sum_{i = 1}^{n} {(\hat{y_{i}} - \bar{\hat{y}})}^{2}}}}

(3)

This statistic, which may take on values between −1 and 1, represents the linear relationship between the expected and observed values. A higher positive correlation is indicated by values that are closer to 1.

Each of the three measures above assesses model inversion performance from a unique angle:r primarily indicates model fit, while

R M S E

and

M A E

reflect error magnitude. Using these metrics collectively enables a more comprehensive comparison of different model-data combinations in forest height inversion.

3. Results

3.1. Evaluation of Forest Height Retrieval Results Under Different Data Combinations

To systematically evaluate the performance of integrated satellite data in estimating forest height, this study used SL-SCRS ground survey data as the dependent variable and established three data combination schemes: DC1 (S-1 + S-2 + ICE-2 + SRTM), DC2 (S-2 + ICE-2 + SRTM), and DC3 (S-1 + ICE-2 + SRTM). Three ensemble algorithms—LightGBM, XGBoost, and Random Forest (RF)—were employed for modeling to comprehensively evaluate each data combination’s performance in forest height inversion.

Figure 4 displays the scatter point density distributions corresponding to different data combinations and algorithms. Overall, scatter points for all combinations clustered near the 1:1 line, indicating that every data combination possesses a certain level of forest height inversion capability. Notably, the DC1 combination achieved the optimal accuracy metrics under the LightGBM model (

r

= 0.72,

R M S E

= 5.52 m). Its scatter points exhibited good fitting consistency across both high and low value ranges, with the smallest systematic bias. This superior performance can be attributed to the comprehensive feature set provided by the full integration of radar, optical, lidar, and topographic data, which collectively capture the multi-faceted characteristics of the forest canopy. Although the scatter point distributions of DC2 and DC3 are similar to DC1 in range, quantitative accuracy assessments confirm that the DC1 combination, which integrates multi-source features, holds a significant advantage in inversion performance. These findings attest to the value of a multi-sensor fusion approach for achieving superior accuracy in forest canopy height retrieval.

Table 5 presents the accuracy evaluation results for different data combinations and model combinations. Comprehensive analysis indicates that the DC1 combination yielded the best inversion performance across all three machine learning models, particularly with the LightGBM algorithm, which delivered superior accuracy (

r

= 0.72,

R M S E

= 5.52 m,

M A E

= 4.082 m). In contrast, XGBoost and Random Forest models performed slightly lower with the DC1 combination, though their accuracy metrics (XGBoost:

r

= 0.716,

R M S E

= 5.551 m; RF:

r

= 0.706,

R M S E

= 5.629 m) remained marginally superior to those of the DC2 and DC3 combinations.

The DC2 combination (S-2 + ICE-2 + SRTM) outperformed DC3 but fell short of DC1 in a cross-comparison of various data combinations. In this particular combination, the LightGBM model likewise attained the greatest accuracy with an

r

-value of 0.714 and an

R M S E

of 5.559 m. It is clear that multi-source data fusion is crucial for improving inversion performance, since the DC3 combination (S-1 + ICE-2 + SRTM) has the worst accuracy metrics out of the three.

Thanks to its gradient one-sided sampling technique and leaf-wise growth strategy, LightGBM is able to process high-dimensional features efficiently and fully utilize complementary information among optical, radar, and laser altimetry characteristics, which is why it performs better in the DC1 combination. Synergistic use of data from many remote sensing sources greatly enhances the accuracy of forest height inversion. The optimal performance is achieved when the DC1 data set is used in conjunction with the LightGBM model, which offers a trustworthy technological method for accurate monitoring of forest characteristics at the regional scale.

3.2. Feature Importance Analysis

Based on the feature importance analysis results across different models using three data combinations (black bars in Figure 5 represent the top 90% cumulative contribution rate of feature variables), the following findings emerge: The ICE-2 canopy height feature and DEM maintain high importance across all data combinations and models, highlighting the core role and irreplaceability of laser altimetry and terrain data in forest height inversion.

The DC1 combination exhibits the most balanced feature importance distribution. Beyond ICE-2 and SRTM features, S-2’s shortwave infrared band (B11) ranked fourth in average importance, while vegetation texture (Variance) ranked seventh. S-1’s VV polarization coherence (coh_VV) also placed sixth. This multi-feature contribution pattern fully reflects the synergistic and complementary advantages of multi-source data fusion.

Comparative analysis reveals distinct complementary relationships among different data types. In the DC2 combination, the absence of radar data significantly elevates the importance of optical features, resulting in higher overall contribution from S-2 features compared to DC1. Concurrently, the importance of slope among terrain factors rises from fifth to fourth place, while DEM and aspect maintain their first and second positions respectively. This indicates terrain factors play a more critical compensatory role under data-constrained conditions.

The DC3 combination exhibits a unique pattern of feature importance distribution. In the absence of optical data, the contribution of S-1 radar features significantly increases. Among terrain factors, DEM, slope, and aspect rank as the top three in importance, reflecting a significant synergistic effect between radar data and terrain features.

Feature importance rankings vary across different machine learning models. LightGBM exhibits a more balanced feature importance distribution, achieving equilibrium across different data sources and avoiding over-reliance on any single feature. XGBoost and Random Forest (RF) show similar feature importance distributions, revealing comparable feature dependency patterns, particularly with high consistency in core feature rankings. This difference indicates that different models exhibit distinct sensitivities and utilization strategies for features when processing multi-source data.

Based on the results of feature importance analysis, this study identified five key feature categories—vegetation indices, S-2 bands, vegetation biophysical parameters, grayscale co-occurrence matrix, and S-1 interferometry—that significantly contribute to forest height inversion. Spatial visualization and comparative analysis were conducted using SL-SCRS measured canopy height data (Figure 6a). The selected features include: S-2 Normalized Difference Vegetation Index (NDVI) (Figure 6b), shortwave infrared band B11 (Figure 6c), canopy water content (CW) (Figure 6d), GLCMMean texture feature (Figure 6e) calculated from NDVI using GLCM, and S-1 VH polarization interferometric coherence (coh_VH) (Figure 6f).

Comparative analysis of representative areas revealed significant correlations between spatial distributions of these features and forest height. Specifically, NDVI and its derived texture feature GLCMMean showed positive correlations with forest height, indicating enhanced vegetation greenness and canopy spatial heterogeneity with increasing tree height. CW similarly exhibits a positive correlation, reflecting rising canopy moisture content with increasing forest height. Conversely, B11 reflectance and VH polarization interferometric coherence coh_VH show negative correlations with forest height, reflecting the shortwave infrared band’s response to vegetation biomass and the phenomenon of reduced radar coherence in tall canopies due to enhanced bulk scattering. These features reveal the spatial distribution patterns of forest height across multiple dimensions—spectral, textural, moisture, and structural—providing multi-perspective remote sensing evidence for understanding regional forest vertical structure.

3.3. Removal of ICE-2 Data to Mitigate Its Impact on Forest Height Retrieval

In this study, although ICE-2 data can provide high-accuracy photon cloud measurements of forest canopies, its spatial coverage is relatively sparse, presenting significant limitations for large-scale forest height estimation. Therefore, this experimental section excludes ICE-2 data and instead conducts forest height inversion entirely based on S-1, S-2, and SRTM remote sensing sources to explore estimation schemes more suitable for large-area applications.

We adjusted the data combinations to construct the following three configurations: DC4 (S-1 + S-2 + SRTM), DC5 (S-2 + SRTM), and DC6 (S-1 + SRTM). Comparing the accuracy of the new combinations in Table 6 with the original combinations (including ICE-2) in Table 5 reveals that, except for a slight improvement in DC5 under the Random Forest (RF) model, the rest of the combinations show a minor decrease in inversion accuracy across different models, though the overall decline is not significant.

To visually demonstrate the inversion performance of different data combinations and models, Figure 7 presents spatial distribution maps of forest height for the DC4, DC5, and DC6 combinations under LightGBM, XGBoost, and Random Forest (RF) models within a representative area. The figures reveal that while numerical accuracy is slightly lower compared to combinations including ICE-2, all combinations effectively capture the spatial distribution trends of forest height, demonstrating particular stability in areas with continuous vegetation structures.

The results demonstrate that effective forest height inversion capability is maintained even without ICE-2 data, significantly expanding the spatial scope and feasibility of inversion operations. Therefore, in large-scale applications balancing efficiency and coverage requirements, multi-source remote sensing inversion strategies independent of ICE-2 data hold practical potential and application value.

4. Discussion

4.1. Feature-Importance-Based Path Optimization

Based on the above findings, a multi-level optimization strategy is recommended for feature selection: First, prioritize the inclusion of SRTM terrain factors to supplement spatial distribution and terrain-related details. Second, integrate ICE-2 canopy height products to ensure effective utilization of key forest stand structural information, thereby further enhancing overall model performance.

This data screening strategy, grounded in feature importance analysis, provides a scientifically sound and flexible optimization pathway for forest height inversion under varying data conditions. It not only enhances model accuracy but also strengthens the reliability and applicability of inversion results across diverse application scenarios.

4.2. Performance Comparison and Mechanism Analysis of Different Models

The LightGBM model achieved the highest accuracy across most data combinations. This performance advantage stems from its distinctive leaf-wise growth strategy and efficient handling of high-dimensional features through Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB). These characteristics allow LightGBM to more effectively leverage the complementary information from the heterogeneous remote sensing sources in our fused feature set, leading to a more robust and accurate model. In contrast, while Random Forest exhibits high stability, it lacks flexibility, whereas XGBoost strikes a balance between computational efficiency and accuracy. Differences in feature sensitivity across models also reflect their distinct learning mechanisms: LightGBM utilizes information from various data sources more evenly, while XGBoost and RF rely more heavily on a few core features (such as ICE-2 elevation and DEM).

4.3. The Key Role of S-1 and S-2 in Forest Height Estimation

In this study, S-1 and S-2 satellite data played a pivotal role in forest height estimation. As a synthetic aperture radar (SAR) system, S-1 possesses all-weather observation capabilities, enabling continuous acquisition of surface information under complex meteorological conditions such as cloud cover and precipitation. Its VV and VH polarization data exhibit strong sensitivity to forest canopy structure and vertical distribution, offering unique advantages for extracting structural parameters like forest density and height.

The spectrum and textural data provided by S-2 multispectral images is extensive. With its great spatial resolution, it accurately describes the cover of plants, the biogeochemical aspects of the canopy, and the growth condition of the organism. Its data is used to produce a number of vegetation indices and textural properties, which are vital for inverting forest biophysical metrics including canopy water content, leaf area index, and canopy cover.

This work greatly improves the accuracy and resilience of forest height estimate by combining structure-sensitive characteristics from S-1 with spectral-texture information from S-2. This way, the complementary capabilities of multi-source remote sensing data are completely used. To further clarify the contribution of each data type in the inversion process, this study generated feature importance heatmaps for both S-1 and S-2 data (Figure 8). By averaging the feature importance across each feature set, these heatmaps visually demonstrate the relative contribution of each feature in the prediction process, thereby identifying the variables most influential for forest height estimation. This analysis not only validated the synergistic effect of S-1 and S-2 in forest parameter inversion but also provided a basis for subsequent feature selection and model optimization.

4.4. Outlook

This study achieved high accuracy in forest height retrieval through multi-source remote sensing data fusion and machine learning methods. The model performance improved after incorporating ICESat-2 data, yet certain limitations remain. The primary issue lies in ICESat-2’s discrete point sampling nature. While it provides a high-precision forest height benchmark, it struggles to support large-scale spatial continuous retrieval. This results in low computational efficiency and limited scalability when applying the model at regional scales. Simultaneously, the preprocessing and fusion of massive multi-source remote sensing data impose significant demands on computational resources and algorithmic efficiency [39]. Finally, the heavy reliance on Sentinel and ICESat-2 data may restrict the application in areas with poor data coverage or persistent cloud contamination.

To address these challenges, future research will focus on enhancing the efficiency and feasibility of large-scale forest height retrieval while exploring ways to reduce dependence on ICESat-2 data in appropriate scenarios. Key research directions include: developing efficient data dimensionality reduction and feature compression methods to reduce big data processing burdens; systematically investigating fusion mechanisms between ICESat-2 and continuous coverage data like Sentinel-1/2, while developing large-area inversion modeling methods independent of ICESat-2 to expand application scope while maintaining accuracy; Building lightweight machine learning models with strong generalization capabilities to enhance adaptability across diverse environments and data conditions; Integrating multi-source validation data from sources like UAV remote sensing and ground observations to improve the reliability and stability of satellite inversion results [40].

Through these improvements, future work can significantly enhance the spatial coverage and computational efficiency of forest height estimation while maintaining high inversion accuracy, providing more practical and scalable technical support for regional and global forest resource monitoring.

5. Conclusions

This research established a machine learning-driven framework to retrieve forest height in the Shangri-La area of Yunnan Province by integrating data from multiple remote sensing platforms. The forest height retrieval model was built by combining data from Sentinel-1 radar, Sentinel-2 multispectral, ICESat-2 altimetry, and SRTM terrain sensors with three machine learning algorithms: Random Forest, XGBoost, and LightGBM. The greatest inversion accuracy was achieved by combining Sentinel-1, Sentinel-2, ICESat-2, and SRTM data, which completely exemplifies the synergistic benefits of multi-source data in terms of feature complementarity, according to the findings.

The inversion performance that LightGBM attained was the best among the machine learning models that were examined. The model’s capacity to accurately capture patterns of spatial variation in forest height was shown by its prediction results under the DC1 data combination, which achieved a correlation coefficient r of 0.72 with observed values and an

R M S E

of 5.52 m. Additional study of feature relevance demonstrated substantial contributions from ICESat-2 altimetry data. While Sentinel-1 backscatter characteristics successfully defined the vertical structure of the forest, Sentinel-2 spectral features and vegetation indices supplied vital canopy spectral information. Further, the model’s stability and flexibility in difficult mountainous areas were greatly improved by the SRTM terrain component. Beyond the specific case study, this research validates a scalable methodological framework that synergizes multi-source remote sensing fusion with ensemble learning for forest parameter retrieval in topographically complex regions. The demonstrated feasibility of achieving satisfactory estimation accuracy without ICESat-2 data significantly broadens the potential for large-scale, operational application of this approach.

In order to overcome the difficulties in estimating forest heights in areas with complicated terrain, this research confirms that combining machine learning algorithms with data from many remote sensing sources is a viable way to retrieve forest parameters. The research findings provide empirical evidence and practical guidance for future studies on forest carbon stock assessment, sustainable forest management, regional resource surveys, and dynamic monitoring.

Author Contributions

Conceptualization, H.X.; methodology, H.X., X.Z., Y.L. (Yongfa Li) and X.Y.; software, H.X.; validation, H.X., Y.L. (Yunchuan Li) and Y.Z.; investigation, Y.L. (Yunchuan Li) and Y.Z.; resources, X.Z. and Y.L. (Yongfa Li); data curation, H.X.; writing—original draft preparation, H.X.; writing—review and editing, X.Z., Y.L. (Yongfa Li) and X.Y.; supervision, X.Z., Y.L. (Yongfa Li), X.Y., Y.L. (Yunchuan Li) and Y.Z.; project administration, X.Y. All authors have read and agreed to the published version of the manuscript.

Funding

1. This research was funded by the National Natural Science Foundation of China (GrantNos. 42471483 and 42161067). 2. the Open Fund Program of Yunnan Key Laboratory of Intelligent Monitoring and Spatiotemporal Big Data Governance of Natural Resources (Grant No. 202449CE340023). 3. the Pilot Cooperation Project between the Ministry of Natural Resources of Chinaand Yunnan Province (Grant No. 2023ZRBSHZ048). 4. the Yunnan Fundamental Research Projects (Grant Nos. 202501AT070310 and 202401AU070173). 5. the Scientific Research Fund of Yunnan Provincial Department of Education (Grant No. 2024J0067). 6. the Talent Development Program of Kunming University of Science and Technology (Grant No. KKZ3202421128).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Some of the data that support the findings of this article are openly available in Copernicus Dataspace Ecosystem at https://dataspace.copernicus.eu/ (accessed on 5 September 2025), NASA’s Earthdata platform at https://www.earthdata.nasa.gov (accessed on 5 September 2025), the USGS Earth Resources Observation and Science (EROS) Center via EarthExplorer at https://earthexplorer.usgs.gov/ (accessed on 5 September 2025). The ground truth data, known as the Shangri-La Second-class Category Resource Survey data cannot be made publicly available because they contain commercially sensitive information. However, the data will be made available by the authors upon reasonable request.

Acknowledgments

The authors would like to express their sincere gratitude to Qingtai Shu from the Forestry College of Southwest Forestry University for generously providing the Shangri-La Second-class Category Resource Survey data, which was crucial for the validation of this study. We also acknowledge the European Space Agency (ESA), the National Aeronautics and Space Administration (NASA), and the United States Geological Survey (USGS) for providing free and open access to the Sentinel, ICESat-2, and SRTM data, respectively. The authors would also like to extend their appreciation to the Academic Editor and the anonymous reviewers for their valuable comments and constructive suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Le Quéré, C.; Andrew, R.M.; Canadell, J.G.; Sitch, S.; Korsbakken, J.I.; Peters, G.P.; Manning, A.C.; Boden, T.A.; Tans, P.P.; Houghton, R.A.; et al. Global Carbon Budget 2016. Earth Syst. Sci. Data 2016, 8, 605–649. [Google Scholar] [CrossRef]
Hu, Y.; Xu, X.; Wu, F.; Sun, Z.; Xia, H.; Meng, Q.; Huang, W.; Zhou, H.; Gao, J.; Li, W.; et al. Estimating Forest Stock Volume in Hunan Province, China, by Integrating In Situ Plot Data, Sentinel-2 Images, and Linear and Machine Learning Regression Models. Remote Sens. 2020, 12, 186. [Google Scholar] [CrossRef]
Ma, T.; Zhang, C.; Ji, L.; Zuo, Z.; Beckline, M.; Hu, Y.; Li, X.; Xiao, X. Development of forest aboveground biomass estimation, its problems and future solutions: A review. Ecol. Indic. 2024, 159, 111653. [Google Scholar] [CrossRef]
Zhao, N.; Wang, K.; Yuan, Y. Toward the carbon neutrality: Forest carbon sinks and its spatial spillover effect in China. Ecol. Econ. 2023, 209, 107837. [Google Scholar] [CrossRef]
Ferreira, M.P.; Zortea, M.; Zanotta, D.C.; Shimabukuro, Y.E.; de Souza Filho, C.R. Mapping tree species in tropical seasonal semi-deciduous forests with hyperspectral and multispectral data. Remote Sens. Environ. 2016, 179, 66–78. [Google Scholar] [CrossRef]
Asner, G.P. Biophysical and Biochemical Sources of Variability in Canopy Reflectance. Remote Sens. Environ. 1998, 64, 234–253. [Google Scholar] [CrossRef]
Potapov, P.; Li, X.; Hernandez-Serna, A.; Tyukavina, A.; Hansen, M.C.; Kommareddy, A.; Pickens, A.; Turubanova, S.; Tang, H.; Silva, C.E.; et al. Mapping global forest canopy height through integration of GEDI and Landsat data. Remote Sens. Environ. 2021, 253, 112165. [Google Scholar] [CrossRef]
Zhu, X.; Nie, S.; Wang, C.; Xi, X.; Lao, J.; Li, D. Consistency analysis of forest height retrievals between GEDI and ICESat-2. Remote Sens. Environ. 2022, 281, 113244. [Google Scholar] [CrossRef]
Dubayah, R.; Blair, J.B.; Goetz, S.; Fatoyinbo, L.; Hansen, M.; Healey, S.; Hofton, M.; Hurtt, G.; Kellner, J.; Luthcke, S.; et al. The Global Ecosystem Dynamics Investigation: High-resolution laser ranging of the Earth’s forests and topography. Sci. Remote Sens. 2020, 1, 100002. [Google Scholar] [CrossRef]
Coops, N.C.; Tompalski, P.; Goodbody, T.R.H.; Queinnec, M.; Luther, J.E.; Bolton, D.K.; White, J.C.; Wulder, M.A.; van Lier, O.R.; Hermosilla, T. Modelling lidar-derived estimates of forest attributes over space and time: A review of approaches and future trends. Remote Sens. Environ. 2021, 260, 112477. [Google Scholar] [CrossRef]
Lefsky, M.; Cohen, W.; Parker, G.; Harding, D. Lidar Remote Sensing for Ecosystem Studies. Bioscience 2009, 52, 19–30. [Google Scholar] [CrossRef]
Simard, M.; Pinto, N.; Fisher, J.B.; Baccini, A. Mapping forest canopy height globally with spaceborne lidar. J. Geophys. Res. Biogeosci. 2011, 116, 4021. [Google Scholar] [CrossRef]
Sun, G.; Ranson, K.J.; Kimes, D.S.; Blair, J.B.; Kovacs, K. Forest vertical structure from GLAS: An evaluation using LVIS and SRTM data. Remote Sens. Environ. 2008, 112, 107–117. [Google Scholar] [CrossRef]
Li, W.; Niu, Z.; Shang, R.; Qin, Y.; Wang, L.; Chen, H. High-resolution mapping of forest canopy height using machine learning by coupling ICESat-2 LiDAR with Sentinel-1, Sentinel-2 and Landsat-8 data. Int. J. Appl. Earth Obs. Geoinf. 2020, 92, 102163. [Google Scholar] [CrossRef]
Magruder, L.; Brunt, K.; Neumann, T.; Klotz, B.; Alonzo, M. Passive Ground—Based Optical Techniques for Monitoring the On—Orbit ICESat—2 Altimeter Geolocation and Footprint Diameter. Earth Space Sci. 2021, 8, e2020EA001414. [Google Scholar] [CrossRef]
Narine, L.; Popescu, S.; Malambo, L. Using ICESat-2 to Estimate and Map Forest Aboveground Biomass: A First Example. Remote Sens. 2020, 12, 1824. [Google Scholar] [CrossRef]
Sothe, C.; Gonsamo, A.; Lourenço, R.B.; Kurz, W.A.; Snider, J. Spatially Continuous Mapping of Forest Canopy Height in Canada by Combining GEDI and ICESat-2 with PALSAR and Sentinel. Remote Sens. 2022, 14, 5158. [Google Scholar] [CrossRef]
Qi, W.; Saarela, S.; Armston, J.; Ståhl, G.; Dubayah, R. Forest biomass estimation over three distinct forest types using TanDEM-X InSAR data and simulated GEDI lidar data. Remote Sens. Environ. 2019, 232, 111283. [Google Scholar] [CrossRef]
Liu, A.; Chen, Y.; Cheng, X. Improving Tropical Forest Canopy Height Mapping by Fusion of Sentinel-1/2 and Bias-Corrected ICESat-2–GEDI Data. Remote Sens. 2025, 17, 1968. [Google Scholar] [CrossRef]
Liu, C.; Gong, W.; Shi, S.; Wang, T.; Xu, T.; Shi, Z.; Niu, J. Deep learning-driven forest canopy height mapping in boreal regions through multi-source remote sensing fusion: Integrating Sentinel-1/2, PALSAR, and ICESat-2/LVIS data. Int. J. Appl. Earth Obs. Geoinf. 2025, 143, 104766. [Google Scholar] [CrossRef]
Campbell, M.J.; Dennison, P.E.; Kerr, K.L.; Brewer, S.C.; Anderegg, W.R.L. Scaled biomass estimation in woodland ecosystems: Testing the individual and combined capacities of satellite multispectral and lidar data. Remote Sens. Environ. 2021, 262, 112511. [Google Scholar] [CrossRef]
Neuenschwander, A.; Pitts, K. The ATL08 land and vegetation product for the ICESat-2 Mission. Remote Sens. Environ. 2019, 221, 247–259. [Google Scholar] [CrossRef]
Ge, M.; Dan, Z.; Cong, X.; Junhua, C.; Xiuwen, L.; Zhaoju, Z.; Yuan, Z. Forest aboveground biomass estimation combining ICESat-2 and GEDI spaceborne LiDAR data. Natl. Remote Sens. Bull. 2024, 28, 1632–1647. [Google Scholar] [CrossRef]
Lang, N.; Kalischek, N.; Armston, J.; Schindler, K.; Dubayah, R.; Wegner, J.D. Global canopy height regression and uncertainty estimation from GEDI LIDAR waveforms with deep ensembles. Remote Sens. Environ. 2022, 268, 112760. [Google Scholar] [CrossRef]
Nandy, S.; Srinet, R.; Padalia, H. Mapping Forest Height and Aboveground Biomass by Integrating ICESat-2, Sentinel-1 and Sentinel-2 Data Using Random Forest Algorithm in Northwest Himalayan Foothills of India. Geophys. Res. Lett. 2021, 48, e2021GL093799. [Google Scholar] [CrossRef]
Zhang, Q.; Ge, L.; Hensley, S.; Isabel Metternicht, G.; Liu, C.; Zhang, R. PolGAN: A deep-learning-based unsupervised forest height estimation based on the synergy of PolInSAR and LiDAR data. ISPRS J. Photogramm. Remote Sens. 2022, 186, 123–139. [Google Scholar] [CrossRef]
Ma, H.; Song, J.; Wang, J.; Hua, Y. Comparison of the Inversion Ability in Extrapolating Forest Canopy Height by Integration of LiDAR Data and Different Optical Remote Sensing Products. In Proceedings of the 2012 IEEE International Geoscience and Remote Sensing Symposium, Munich, Germany, 22–27 July 2012; pp. 3363–3366. [Google Scholar]
Silva, C.A.; Duncanson, L.; Hancock, S.; Neuenschwander, A.; Thomas, N.; Hofton, M.; Fatoyinbo, L.; Simard, M.; Marshak, C.Z.; Armston, J.; et al. Fusing simulated GEDI, ICESat-2 and NISAR data for regional aboveground biomass mapping. Remote Sens. Environ. 2021, 253, 112234. [Google Scholar] [CrossRef]
Xi, Z.; Xu, H.; Xing, Y.; Gong, W.; Chen, G.; Yang, S. Forest Canopy Height Mapping by Synergizing ICESat-2, Sentinel-1, Sentinel-2 and Topographic Information Based on Machine Learning Methods. Remote Sens. 2022, 14, 364. [Google Scholar] [CrossRef]
Ghamisi, P.; Rasti, B.; Yokoya, N.; Wang, Q.; Hofle, B.; Bruzzone, L.; Bovolo, F.; Chi, M.; Anders, K.; Gloaguen, R.; et al. Multisource and Multitemporal Data Fusion in Remote Sensing: A Comprehensive Review of the State of the Art. IEEE Geosci. Remote Sens. Mag. 2019, 7, 6–39. [Google Scholar] [CrossRef]
Pascual, C.; García-Abril, A.; Cohen, W.B.; Martín-Fernández, S. Relationship between LiDAR-derived forest canopy height and Landsat images. Int. J. Remote Sens. 2010, 31, 1261–1280. [Google Scholar] [CrossRef]
Yan, J.; Zhou, G.; Zhou, D. Research on Forest Canopy Height Inversion from Long Time Series and Multi Source Remote Sensing Data. For. Eng. 2024, 40, 1–10. [Google Scholar] [CrossRef]
Fang, K.-n. A Review of Technologies on Random Forests. Int. J. Comput. Sci. Issues 2011, 9, 272. [Google Scholar]
Weidong, Z.; Li, Y.; Luan, K.; Qiu, Z.; He, N.; Zhu, X.; Zou, Z. Forest Canopy Height Retrieval and Analysis Using Random Forest Model with Multi-Source Remote Sensing Integration. Sustainability 2024, 16, 1735. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Zhao, R.; Hu, Q.; Liu, Z.; Li, Y.; Zhang, K. A Pseudo-Waveform-Based Method for Grading ICESat-2 ATL08 Terrain Estimates in Forested Areas. Forests 2024, 15, 2113. [Google Scholar] [CrossRef]
Su, Y.; Guo, Q.; Ma, Q.; Li, W. SRTM DEM Correction in Vegetated Mountain Areas through the Integration of Spaceborne LiDAR, Airborne LiDAR, and Optical Imagery. Remote Sens. 2015, 7, 11202–11225. [Google Scholar] [CrossRef]
Tamiminia, H.; Salehi, B.; Mahdianpari, M.; Goulden, T. State-wide forest canopy height and aboveground biomass map for New York with 10 m resolution, integrating GEDI, Sentinel-1, and Sentinel-2 data. Ecol. Inform. 2024, 79, 102404. [Google Scholar] [CrossRef]
Liu, X.; Su, Y.; Hu, T.; Yang, Q.; Liu, B.; Deng, Y.; Tang, H.; Tang, Z.; Fang, J.; Guo, Q. Neural network guided interpolation for mapping canopy height of China’s forests by integrating GEDI and ICESat-2 data. Remote Sens. Environ. 2022, 269, 112844. [Google Scholar] [CrossRef]

Figure 1. Study Workflow Diagram.

Figure 2. Schematic map of the study area.

Figure 3. ICE-2 Footprint Distribution Map.

Figure 4. LightGBM, XGBoost, and RF Model Inversion of Canopy Height. (a–c), (d–f), (g–i) represent DC1, DC2, and DC3 scatter plots of the SL-SCRS dataset under LightGBM, XGBoost, and RF models respectively (Red is fit line).

Figure 5. Feature Importance. (a–c), (d–f), (g–i) represent the feature importance of DC1, DC2, and DC3 in the LightGBM, XGBoost, and RF models, respectively.

Figure 6. Feature Visualization Analysis. (a) SL-SCRS canopy height (b) NDVI (c) Band B11 (d) CW index (e) GLCMMean texture feature (f) VH interferogram.

Figure 7. Typical Forest Canopy Height in the Region. (a–c), (d–f), and (g–i) represent the forest heights inverted by DC4, DC5, and DC6, respectively, under the LightGBM, XGBoost, and RF models.

Figure 8. Feature Importance Heatmaps for Each Feature Set Across Different Models.

Table 1. Summary of S1 Feature Variables.

Category	Feature Name	Calculation Formula or Description
Interference Coherence	VV-polarized Interferometric Coherence (coh_VV)	Interference Coherence under VV Polarization
Interference Coherence	VH-Polarized Interferometric Coherence (coh_VH)	Interference Coherence under VH Polarization
Backward Scattering Coefficient	VV-Polarized Backscatter Coefficient (Gamma 0_VV)	Radiation-calibrated and topographically corrected backscatter intensity
Backward Scattering Coefficient	VH-Polarized Backscatter Coefficient (Gamma 0_VH)
Derivative Index	Total Scatter Intensity Total	Total = VV + VH
	Polarization Ratio Ratio	Ratio = VV/VH
	Radar Vegetation Index RVI	RVI = 4 * VH/(VV + VH)
	Normalized Difference Index NDI	NDI = (VV − VH)/(VV + VH)

Table 2. S1 data acquisition time.

Data Type	Acquisition Time	Number of Documents
Sentinel-1 (SLC)	1 October 2018–30 December 2018	6
Sentinel-1 (GRD)	1 October 2018–30 December 2018	3

Table 3. Description of GLCM Texture Features.

Parameter Symbol	Name	Description	Parameter Symbol	Name	Description
ASM	Second-order moment	Measuring the uniformity of image texture	Correlation	Relevance	Measuring the linear dependence of gray values in an image
Contrast	Contrast	Measuring local grayscale variations in an image	Mean	mean value	Calculate the average gray value of pixels within the window
Dissimilarity	Divergence	Measuring Local Changes	Variance	Variance	Measuring the degree of dispersion (fluctuation) of pixel grayscale values relative to the mean within a window
Energy	Energy	The square root of the angular second moment (ASM)	Homogeneity	Homogeneity	Measuring the uniformity of local variations in image texture
Entropy	Entropy	Measuring the randomness or complexity of image texture	MAX	Maximum Probability	Maximum value in the gray symbiosis matrix

Table 4. ICESat-2 data acquisition time.

Data Type	Acquisition Time	Number of Documents
ICESat-2 ATL08	1 October 2018–30 December 2018	4
ICESat-2 ATL03	1 October 2018–30 December 2018	4

Table 5. Accuracy Comparison of Different Data Combinations and Models.

	LightGBM			XGBoost			RF
	$r$	$R M S E$	$M A E$	$r$	$R M S E$	$M A E$	$r$	$R M S E$	$M A E$
DC1	0.72	5.52	4.082	0.716	5.551	4.099	0.706	5.629	4.163
DC2	0.714	5.559	4.12	0.711	5.586	4.159	0.677	5.655	0.494
DC3	0.659	5.972	4.567	0.66	5.968	4.576	0.655	6.005	4.611

Table 6. Accuracy Comparison of Different Data Combinations and Models (Excluding ICE-2).

	LightGBM			XGBoost			RF
	$r$	$R M S E$	$M A E$	$r$	$R M S E$	$M A E$	$r$	$R M S E$	$M A E$
DC4	0.715	5.56	4.132	0.705	5.634	4.193	0.702	5.66	4.2
DC5	0.709	5.605	4.171	0.706	5.628	4.178	0.701	5.672	0.4219
DC6	0.609	6.303	4.908	0.614	6.274	4.888	0.614	6.271	4.863

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, H.; Zuo, X.; Li, Y.; Yang, X.; Zhang, Y.; Li, Y. Accuracy Assessment of Remote Sensing Forest Height Retrieval for Sustainable Forest Management: A Case Study of Shangri-La. Sustainability 2025, 17, 10067. https://doi.org/10.3390/su172210067

AMA Style

Xu H, Zuo X, Li Y, Yang X, Zhang Y, Li Y. Accuracy Assessment of Remote Sensing Forest Height Retrieval for Sustainable Forest Management: A Case Study of Shangri-La. Sustainability. 2025; 17(22):10067. https://doi.org/10.3390/su172210067

Chicago/Turabian Style

Xu, Haoxiang, Xiaoqing Zuo, Yongfa Li, Xu Yang, Yuran Zhang, and Yunchuan Li. 2025. "Accuracy Assessment of Remote Sensing Forest Height Retrieval for Sustainable Forest Management: A Case Study of Shangri-La" Sustainability 17, no. 22: 10067. https://doi.org/10.3390/su172210067

APA Style

Xu, H., Zuo, X., Li, Y., Yang, X., Zhang, Y., & Li, Y. (2025). Accuracy Assessment of Remote Sensing Forest Height Retrieval for Sustainable Forest Management: A Case Study of Shangri-La. Sustainability, 17(22), 10067. https://doi.org/10.3390/su172210067

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Accuracy Assessment of Remote Sensing Forest Height Retrieval for Sustainable Forest Management: A Case Study of Shangri-La

Abstract

1. Introduction

2. Materials and Methods

2.1. Technical Approach

2.2. Overview of the Study Area

2.3. Research Data

2.3.1. Spatial–Temporal Data Fusion Strategy

2.3.2. Sentinel-1 Data

2.3.3. Sentinel-2 Data

2.3.4. ICESat-2 Data

2.3.5. SRTM Data

2.3.6. Ground Sample Data

2.4. Modeling Approach

2.5. Evaluation of Model Accuracy

3. Results

3.1. Evaluation of Forest Height Retrieval Results Under Different Data Combinations

3.2. Feature Importance Analysis

3.3. Removal of ICE-2 Data to Mitigate Its Impact on Forest Height Retrieval

4. Discussion

4.1. Feature-Importance-Based Path Optimization

4.2. Performance Comparison and Mechanism Analysis of Different Models

4.3. The Key Role of S-1 and S-2 in Forest Height Estimation

4.4. Outlook

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI