Next Article in Journal
Improved Performance of RT-PPP During Communication Outages Based on Position Constraints and Stochastic Model Optimization
Previous Article in Journal
DEMNet: A Small Object Detection Method for Tea Leaf Blight in Slightly Blurry UAV Remote Sensing Images
Previous Article in Special Issue
Leveraging the Potential of PRISMA Hyperspectral Data for Forest Tree Species Classification: A Case Study in Southern Italy
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improving Tropical Forest Canopy Height Mapping by Fusion of Sentinel-1/2 and Bias-Corrected ICESat-2–GEDI Data

1
College of Geography and Environment, Shandong Normal University, Jinan 250014, China
2
Key Laboratory of Comprehensive Observation of Polar Environment (Sun Yat-sen University), Ministry of Education, Zhuhai 519082, China
3
School of Geospatial Engineering and Science, Sun Yat-sen University, and Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai 519082, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(12), 1968; https://doi.org/10.3390/rs17121968
Submission received: 4 May 2025 / Revised: 27 May 2025 / Accepted: 4 June 2025 / Published: 6 June 2025
(This article belongs to the Special Issue Machine Learning in Global Change Ecology: Methods and Applications)

Abstract

Accurately estimating the forest canopy height is essential for quantifying forest biomass and carbon storage. Recently, the ICESat-2 and GEDI spaceborne LiDAR missions have significantly advanced global canopy height mapping. However, due to inherent sensor limitations, their footprint-level estimates often show systematic bias. Tall forests tend to be underestimated, while short forests are often overestimated. To address this issue, we used coincident G-LiHT airborne LiDAR measurements to correct footprint-level canopy heights from both ICESat-2 and GEDI, aiming to improve the canopy height retrieval accuracy across Puerto Rico’s tropical forests. The bias-corrected LiDAR dataset was then combined with multi-source predictors derived from Sentinel-1/2 and the 3DEP DEM. Using these inputs, we trained a canopy height inversion model based on the AutoGluon stacking ensemble method. Accuracy assessments show that, compared to models trained on uncorrected single-source LiDAR data, the new model built on the bias-corrected ICESat-2–GEDI fusion outperformed in both overall accuracy and consistency across canopy height gradients. The final model achieved a correlation coefficient (R) of 0.80, with a root mean square error (RMSE) of 3.72 m and a relative RMSE of 0.22. The proposed approach offers a robust and transferable approach for high-resolution canopy structure mapping and provides valuable support for carbon accounting and tropical forest management.

Graphical Abstract

1. Introduction

Tropical forests have rich carbon stocks and high biodiversity and play an important role in mitigating climate change and regulating global biogeochemical cycles [1,2]. The accurate and large-scale mapping of forest canopy heights is essential for monitoring forest conditions and quantifying the forest biomass and carbon stocks [3,4,5,6,7]. However, due to the vertically stratified vegetation structure and high canopy closure typical of tropical forests, estimating the canopy height in these ecosystems remains a major challenge [8,9,10,11]. Traditional remote sensing techniques, including passive optical and radar sensors, suffer from signal saturation and a limited ability to penetrate dense canopy layers, which reduces their reliability of structural estimates in complex tropical forests [12,13].
LiDAR (Light Detection and Ranging) technology can directly capture the three-dimensional structure of forests, providing vertical profiles of the canopy and sub-canopy layers and allows a relatively accurate estimation of the canopy height even under dense foliage [14,15,16]. However, ground-based and airborne LiDAR systems have limited spatial coverage and are costly to operate, making them unsuitable for routine large-scale monitoring. To overcome these limitations, NASA launched two complementary spaceborne LiDAR missions in 2018—the Ice, Cloud, and Land Elevation Satellite-2 (ICESat-2) [17] and the Global Ecosystem Dynamics Investigation (GEDI) [18]—which provide unprecedented opportunities for global forest canopy height mapping.
GEDI is a full-waveform, multi-beam laser altimeter installed on the International Space Station, providing more than 10 billion vertical structure waveform measurements at a footprint resolution of approximately 25 m [18]. Compared to the previous generation LiDAR system ICESat-1, GEDI offers denser observations, smaller footprints, and an improved estimation accuracy, significantly enhancing the availability of high-resolution forest structure and biomass products [19]. Unlike GEDI and ICESat-1, which both use full-waveform LiDAR technology, ICESat-2 carries the Advanced Topographic Laser Altimeter System (ATLAS), which employs a photon-counting technology characterized by a high repetition frequency, high peak power, narrow pulse width, and high sensitivity [20]. ATLAS collects denser photon cloud data with a smaller footprint size (about 11 m), enabling the detailed detection of vertical forest structures [21].
Due to the sparse sampling nature of spaceborne LiDAR systems, generating spatially continuous canopy height maps requires integrating footprint-level height observations from ICESat-2 and GEDI with auxiliary variables derived from optical and SAR imagery. These variables provide spatially exhaustive information on vegetation structures and spectral properties, enabling the extrapolation of canopy heights from discrete LiDAR footprints to continuous wall-to-wall maps [22,23]. In recent years, several studies have explored the fusion of LiDAR-derived canopy height estimates with satellite imagery to support high-resolution forest mapping [24,25,26,27,28]. Potapov et al. [24] combined GEDI data with multi-temporal Landsat-8 indicators and developed a bagged regression tree ensemble model to produce a global canopy height map at a 30 m resolution. Building on this progress, Lang et al. [25] employed a deep probabilistic model to fuse GEDI and Sentinel-2 imagery, generating a global canopy height map at a 10 m resolution, thus achieving finer spatial detail than previous products. At the national scale, Sothe et al. combined GEDI and ICESat-2 with PALSAR and Sentinel imagery to map canopy heights across Canada [26], while Liu et al. (2022) fused GEDI and ICESat-2 estimates with Sentinel-2 features to produce a 30 m canopy height map for China [27]. These studies demonstrate the value of GEDI and ICESat-2 integration for large-area canopy height mapping.
Despite these advances, a critical limitation remains largely unaddressed in the literature: both ICESat-2 and GEDI exhibit systematic, height-dependent biases [29,30]. Specifically, both sensors tend to underestimate the canopy height in tall forests and overestimate it in short or sparse forests [30]. These biases arise from sensor design differences, footprint geometry, and return energy sensitivity and can introduce residual trends that degrade model generalization across canopy height gradients [29]. Most existing studies have not explicitly corrected for these sensor-specific biases prior to model development or data fusion, which limits the accuracy, transferability, and ecological interpretability of their results, particularly in structurally complex tropical forests [31].
To address these gaps, we developed a novel bias-corrected data fusion approach for tropical forest canopy height mapping. Our study focuses on Puerto Rico, a tropical island with diverse forest types and varying canopy structures. We first corrected footprint-level canopy height estimates from ICESat-2 and GEDI using co-located airborne LiDAR measurements from NASA’s G-LiHT campaign [32], aiming to reduce systematic biases. The corrected LiDAR observations were combined with multi-source predictors derived from Sentinel-1, Sentinel-2, and 3DEP DEM data. Using an AutoGluon stacking ensemble model [33], we generated a 10 m resolution canopy height map that captures fine-scale structural variations across Puerto Rico’s diverse forest landscapes. The validation against airborne LiDAR data shows that the bias correction significantly improves the model accuracy and reduces residual errors compared to models based on uncorrected spaceborne LiDAR data. This study provides new insights into the benefits of correcting systematic LiDAR biases prior to data fusion and demonstrates the potential of AutoML-based modeling for generating high-resolution, spatially explicit canopy height products in tropical ecosystems. Our approach is scalable and transferable, supporting broader efforts in forest monitoring, biodiversity assessment, and carbon accounting under changing climate conditions.

2. Materials and Methods

2.1. Study Area

This study focuses on Puerto Rico (Figure 1), a mountainous island situated in the eastern Caribbean between approximately 65.6–67.3°W longitude and 17.9–18.5°N latitude. The main island extends about 180 km from east to west and 65 km from north to south, encompassing a land area of roughly 9800 km2. It is the third-largest island administered by the United States and is home to over three million people.
The island’s terrain is dominated by steep slopes and rugged landscapes, especially across the Cordillera Central, the main mountain range that runs east to west. The island’s highest elevation point, Cerro de Punta, rises to 1338 m above sea level. In contrast, broad alluvial plains are distributed along the northern and southern coastlines. Puerto Rico exhibits a tropical maritime climate with warm, stable temperatures throughout the year, typically ranging between 20 and 30 °C, and an annual precipitation averaging around 1400 mm [34]. The land cover across the island includes forests, shrublands, grasslands, wetlands, built-up zones, and inland water bodies. The complex forest structure and topographical variability make Puerto Rico an ideal location for evaluating remote sensing methods in tropical environments. Additionally, the availability of high-quality airborne LiDAR data provides an opportunity for the independent validation of satellite-derived canopy height products.
Ecologically, Puerto Rico is classified into six Holdridge Ecological Lifezones, determined by climatic factors such as humidity, annual precipitation, and potential evapotranspiration [35]. These include subtropical dry forest, moist forest, wet forest, lower montane wet forest, lower montane rain forest, and rain forest. Among them, dry forest, moist forest, and wet forest are the dominant types, collectively covering about 99% of the island’s surface. Dry forests, mostly in the south, are characterized by high temperatures and limited rainfall; moist forests, the most extensive zone, span much of the central and northern regions; while wet forests, typically at higher elevations or windward slopes, experience the highest precipitation and cooler temperatures. These ecological gradients contribute to diverse vegetation structures and microclimatic conditions across the island.
Figure 1. Location and Holdridge Ecological Lifezones in Puerto Rico, adapted from [35].
Figure 1. Location and Holdridge Ecological Lifezones in Puerto Rico, adapted from [35].
Remotesensing 17 01968 g001

2.2. Data

Canopy height inversion, in this context, refers to the modeling process used to estimate spatially continuous canopy height values from footprint-level LiDAR data and multi-source remote sensing predictors. This study integrated multiple Earth observation datasets to support canopy height inversion in tropical forests (Figure 2). Spaceborne LiDAR measurements from ICESat-2 and GEDI were used to provide footprint-level canopy height observations, while high-resolution airborne LiDAR data from NASA’s G-LiHT (Goddard’s LiDAR, Hyperspectral & Thermal Imager) mission served as independent reference data for model calibration and accuracy assessment. In addition, surface reflectance and radar backscatter from Sentinel-2 and Sentinel-1, along with terrain variables derived from the USGS 3DEP digital elevation model, were used as predictor variables in model development. All datasets were carefully co-registered and temporally matched to ensure consistency in spatial resolution, geographic alignment, and seasonal representation.

2.2.1. Airborne LiDAR Reference Data (G-LiHT)

The G-LiHT airborne system integrates a dual-scanning laser altimeter, hyperspectral imagers, and thermal sensors to map ecosystem structure and function at fine spatial scales (https://glihtdata.gsfc.nasa.gov/; accessed on 24 April 2025). For LiDAR measurements, G-LiHT employs the Riegl VQ-480i scanner, operating at a near-infrared wavelength of 1550 nm with a pulse repetition frequency of 50–300 kHz. The system produces discrete-return data with a footprint diameter of approximately 0.3 m and a sampling density of up to 12 points/m2. Each emitted laser pulse may return one to five echoes, allowing for detailed vertical profiling of vegetation structure. The spatial accuracy of G-LiHT data is high, with reported horizontal and vertical errors of less than 5 cm [32]. The canopy height model (CHM) is generated by subtracting the LiDAR-derived ground surface from the first return surface, resulting in a 1 m resolution product that captures fine-scale canopy structure. Compared to spaceborne LiDAR systems, such as GEDI and ICESat-2, G-LiHT offers significantly higher point densities and smaller footprint sizes, resulting in greater accuracy in both terrain and canopy height estimation [36,37].
In this study, we used G-LiHT CHM data acquired over Puerto Rico in 2017. The G-LiHT airborne LiDAR data consisted of 17 flight strips across Puerto Rico, as shown in Figure 2. The campaign covered a wide range of forest types and ecological zones across the island, including montane rain forests, dry forests, and moist forests. These data were used both to correct biases in the ICESat-2 and GEDI datasets and to validate the final model predictions. The G-LiHT CHM was used at its native resolution, and no spatial aggregation was applied to match satellite data resolution. Instead, a spatial overlay was performed to extract CHM values corresponding to each spaceborne LiDAR footprint.

2.2.2. Spaceborne LiDAR Datasets (GEDI and ICESat-2)

To obtain footprint-scale canopy height estimates across the study area, we utilized data from two spaceborne LiDAR missions: GEDI and ICESat-2 (Table 1). These datasets provide complementary measurements of forest structure, enabling detailed assessment and modeling of canopy height. The GEDI mission, launched by NASA in 2018 and mounted on the International Space Station, employs a full-waveform LiDAR system designed to measure the vertical structure of terrestrial vegetation [18]. GEDI collects waveform returns across elliptical footprints approximately 25 m in diameter, with each waveform representing the distribution of canopy and ground surfaces along a vertical profile. Relative height (RH) percentiles, ranging from RH01 to RH100, are derived from the cumulative waveform energy and represent the vertical distribution of canopy.
For this study, we used the GEDI L2A Version 2 product, accessed from the U.S. Geological Survey (USGS; https://lpdaac.usgs.gov/products/gedi02_av002/; accessed on 24 April 2025). Data preprocessing was conducted using official USGS batch tools, which support automated downloading, clipping, and filtering. To represent canopy top height, we selected the RH98 metric, which corresponds to the height at which 98% of the waveform energy is accumulated. This choice is consistent with the convention used in ICESat-2 ATL08 products [29,30,38].
To ensure data reliability, we applied a multi-step quality control process. First, all observations with invalid or missing waveforms were removed based on the quality_flag (Table 1). Then, only footprints collected using power beams and acquired during nighttime passes were retained to minimize solar contamination, as indicated by the beam_type and delta_time. Finally, GEDI data collected between May and September from 2019 to 2021 were included in the analysis. After filtering, approximately 170,000 valid GEDI canopy height footprints were retained. It is worth noting that the G-LiHT reference data were acquired in 2017, whereas the spaceborne LiDAR data used span 2019–2021, which may introduce temporal uncertainties.
ICESat-2 is a photon-counting LiDAR instrument that operates at high repetition frequency (10 kHz) and produces along-track measurements of terrain and canopy structure [17]. Each ATLAS beam has a nominal footprint size of about 11 m, and returns a sparse but highly precise distribution of individual photons, forming a vertical profile similar to a histogram [16,39]. We used the ATL08 Version 6 product, which contains segmented estimates of terrain and canopy height at nominal intervals of 20 m. This product was obtained from the National Snow and Ice Data Center (NSIDC; https://nsidc.org/data/atl08/versions/6/; accessed on 24 April 2025) and accessed via their API to enable customized spatial and temporal queries. As with GEDI, the 98th percentile canopy height (RH98) was extracted to represent the upper canopy surface. Several filtering criteria were applied to retain high-quality observations. Only strong beam data collected during nighttime were included, as indicated by the night_flag and gtx_beam variables. Segments with a signal-to-noise ratio (SNR) below 1 were excluded, and observations affected by clouds (layer_flag) or snow cover (segment_snowcover) were removed. To maintain seasonal consistency, ICESat-2 data acquired between May and September during 2019–2021 were used. The final ICESat-2 dataset contained approximately 60,000 valid footprint-level canopy height measurements.

2.2.3. Ancillary Remote Sensing Predictors (Sentinel-1/2 and 3DEP DEM)

To enhance the accuracy of canopy height estimation, we integrated optical, radar, and topographic datasets as auxiliary predictors. These datasets were sourced from the Google Earth Engine (GEE) platform and were selected for their complementary characteristics in capturing forest structure and terrain variability [40,41,42,43].
Sentinel-1 provides C-band Synthetic Aperture Radar (SAR) data, which are unaffected by cloud cover and can penetrate canopy layers to some extent. This capability makes SAR a valuable complement to optical data for forest structure assessment [44,45]. We used the Ground Range Detected (GRD) product in Interferometric Wide (IW) swath mode (COPERNICUS/S1_GRD), available in dual-polarization (VV and VH) from GEE. The GRD product has undergone preprocessing steps including thermal noise removal, radiometric calibration, geometric terrain correction, and backscatter normalization to decibel (dB) scale. From this dataset, we derived VH and VV backscatter intensities and calculated the VH/VV ratio to capture structural characteristics related to vegetation volume and moisture content.
Sentinel-2 provides high-resolution optical imagery through its Multi-Spectral Instrument (MSI), which captures reflectance in 13 spectral bands, including four red-edge bands and one water vapor band [46]. The imagery has a swath width of 290 km and includes bands at spatial resolutions of 10, 20, and 60 m. In this study, we used the Level-2A surface reflectance product (COPERNICUS/S2_SR) from GEE, which includes geometric, radiometric, and atmospheric corrections. To minimize cloud contamination, we applied a pixel-level mask using the QA60 cloud band. From the cloud-free composites, we extracted key spectral features and vegetation indices, such as NDVI, TCG, and red-edge-based indices, which are sensitive to canopy density, greenness, and health. Here, we used annual composite mosaics of Sentinel-1 and Sentinel-2 images from 2019. The composites were generated from 29 Sentinel-1 and 105 Sentinel-2 scenes, respectively, covering the entire island (Figure 2).
Topographic information was derived from the 10 m resolution DEM provided by the USGS’s 3D Elevation Program (3DEP) [47]. This dataset integrates airborne LiDAR and other elevation sources to generate high-resolution elevation surfaces for the United States and its territories. From the DEM, we computed terrain metrics such as elevation, slope, and aspect, which influence vegetation growth patterns and forest structural development. These topographic variables are particularly important in mountainous areas like Puerto Rico, where changes in elevation strongly affect canopy height and species distribution [48].
All predictors were spatially aligned and aggregated to 10 m resolution using bilinear resampling, which estimates new pixel values based on the weighted average of the neighboring pixels to preserve spatial continuity. Predictor variables were stacked into a unified feature set and spatially joined with LiDAR response values for model training. By integrating these multi-source datasets, we aimed to capture a comprehensive set of predictors that reflect the spectral, structural, and topographic characteristics influencing forest canopy height. This fusion of data sources enhances the robustness of the canopy height inversion models, particularly in heterogeneous tropical forest environments.

2.3. Method

Figure 3 outlines the methodological framework adopted in this study. We first extracted footprint-level canopy height from the ICESat-2 ATL08 and GEDI L2A datasets and performed quality filtering. Footprint-level canopy height refers to the height measured within the spatial extent of a single laser pulse, approximately 11 m for ICESat-2 and 25 m for GEDI. These height observations were spatially matched with airborne LiDAR canopy height models (CHMs) from the G-LiHT system to quantify and correct systematic biases. The bias-corrected GEDI and ICESat-2 estimates were then combined and used as reference labels for model training. To construct predictor variables, we derived a suite of spectral, textural, polarization, and terrain features from Sentinel-1, Sentinel-2, and 3DEP DEM data. These features were aligned at 10 m resolution and spatially collocated with spaceborne LiDAR footprints. We used the AutoGluon framework to implement a stacking ensemble regression model, which integrates multiple base learners and performs automated hyperparameter optimization. The final model was applied across the study area to generate a 10 m resolution canopy height map. Model accuracy was evaluated using independent airborne LiDAR measurements from the G-LiHT campaign, with residual patterns analyzed across canopy height gradients.

2.3.1. Bias Correction for Spaceborne LiDAR

Although ICESat-2 and GEDI provide valuable canopy height information, their footprint-level estimates are subject to systematic biases [29,30]. To address this issue, we performed bias correction using G-LiHT airborne LiDAR measurements as reference. Spatial overlaps between GEDI or ICESat-2 footprints and the G-LiHT CHM were identified by applying a footprint-specific buffer: a 25 m diameter circular buffer for GEDI footprints and a 20 × 11 m rectangular buffer for ICESat-2 footprints. For each overlapping footprint, we extracted the corresponding G-LiHT canopy height (RH98) values within the buffered area.
Bias correction models were then constructed separately for GEDI and ICESat-2 data. For each dataset, we used the raw LiDAR canopy height estimates (including RH10 to RH100 percentiles) and terrain features (elevation, slope, aspect, and hillshade) as predictor variables, with the G-LiHT RH98 values as the response variable. The use of RH10–RH100 percentiles captures the vertical structure of the forest canopy, providing profile-level information that helps distinguish between under- and overestimations across different height ranges. Terrain features such as elevation, slope, aspect, and hillshade are also included, as local topographic variation can affect LiDAR signal returns and introduce additional bias in canopy height estimation, particularly in steep landscapes [49,50]. A total of 80% of the overlapping footprints were used to train the bias correction model, and the remaining 20% were reserved for independent validation. The bias correction models were developed using the AutoGluon framework [30], which supports automated ensemble learning. A detailed explanation of AutoGluon and its stacking ensemble method is provided in Section 2.3.3. After correction, the GEDI and ICESat-2 footprints were merged into a unified training dataset for subsequent canopy height mapping.

2.3.2. Predictor Variable Construction

To support canopy height modeling, we constructed a set of predictor variables derived from Sentinel-1, Sentinel-2, and the 3DEP DEM. These variables were selected to reflect spectral, structural, and topographic characteristics known to influence forest canopy height retrieval. Spectral variables were extracted from ten Sentinel-2 bands (B2–B8, B8A, B11, and B12), which capture canopy reflectance characteristics across visible, near-infrared, and shortwave infrared regions. We also calculated NDVI, TCG [51], and four red-edge vegetation indices [52] (Formulas (1)–(6)), which are sensitive to canopy greenness, density, and biochemical properties.
NDVI = (NIR − R)/(NIR + R),
TCG = −0.36 ∙ B − 0.35 ∙ G − 0.47 ∙ R + 0.66 ∙ NIR + 0.0087 ∙ SWIR1 − 0.29 ∙ SWIR2,
NDVIre1 = (NIR − RedEdge1)/(NIR + RedEdge1),
NDVIre2 = (NIR − RedEdge2)/(NIR + RedEdge2),
NDVIre3 = (NIR − RedEdge3)/(NIR + RedEdge3),
NDVIre4 = (NIR − RedEdge4)/(NIR + RedEdge4),
In these formulas, B, G, R, NIR, SWIR1, SWIR2, and RedEdge1–4 represent the blue, green, red, near-infrared, shortwave infrared 1, shortwave infrared 2, and red-edge bands 1 to 4 of the Sentinel-2 imagery, respectively.
Polarization features included VV and VH backscatter coefficients from Sentinel-1, as well as the VV/VH ratio and their normalized difference. These features are responsive to canopy structure, volume scattering, and surface roughness. Texture metrics including entropy, contrast, and covariance were derived from the NDVIre1 layer using a gray-level co-occurrence matrix (GLCM) with a 3 × 3 kernel, capturing spatial variability in canopy patterns. Topographic variables, such as elevation, slope, aspect, and hillshade, were included to account for terrain-driven variation in vegetation distribution and sensor viewing geometry. All input layers were resampled to a 10 m spatial resolution and co-registered with the corrected LiDAR footprints to ensure consistency in model training.

2.3.3. AutoGluon-Based Model Training

To generate spatially continuous canopy height estimates across the study area, we used the AutoGluon automated machine learning framework to construct a regression model based on the bias-corrected ICESat-2 and GEDI footprint data. AutoGluon is an open-source automated machine learning (AutoML) toolkit developed by Amazon Web Services [33], designed for supervised learning tasks such as regression and classification. It streamlines the entire modeling pipeline by automating data preprocessing, model selection, hyperparameter tuning, and ensemble construction. Machine learning methods are well suited for canopy height inversion tasks, as they are capable of capturing complex and nonlinear relationships between forest structure and multi-source remote sensing predictors. In this study, the fused LiDAR-derived canopy height values served as the target variable, while the spectral, polarization, textural, and topographic features extracted from Sentinel-1, Sentinel-2, and 3DEP DEM formed the input feature set.
AutoGluon integrates a wide range of regression algorithms, including decision trees, ensemble methods, and neural networks, and automates the key steps of model selection, hyperparameter tuning, and ensemble construction. One of its core components is the stacking ensemble model, which combines the predictions of multiple base learners through a meta-learner to improve predictive performance and robustness. An ensemble combines predictions from multiple base models to improve overall accuracy and reduce overfitting compared to using a single model. In a stacking ensemble, base learners refer to individual models trained on the input data, while the meta-learner is a higher-level model that takes the predictions of the base learners as inputs to produce the final output. In our workflow (Figure 4), AutoGluon trained a two-layer ensemble, where the first layer consisted of base models such as k-nearest neighbors, neural networks, random forests, extremely randomized trees [53], and gradient boosting algorithms (XGBoost [54], LightGBM [55], and CatBoost [56]), and the second layer used their outputs as inputs for final prediction. This architecture has been shown to improve accuracy, particularly when integrating heterogeneous features from different sensors [30].
All features and LiDAR reference values were spatially aligned at 10 m resolution. During training, 80% of the footprint samples were used for model development, while 20% were held out for validation. Note that this data split was used solely for model training and internal validation to optimize model performance. For final accuracy assessment of canopy height maps, an independent airborne LiDAR dataset (G-LiHT) was used. We enabled AutoGluon’s multi-layer stacking and cross-validation by setting the auto_stack parameter to true and the number of bagging folds (num_bag_folds) to 5. The model was trained under the best_quality preset with a total time limit of 3600 s. Default hyperparameter ranges were used for all base models to allow AutoGluon to search for optimal configurations during training.
To improve generalization and minimize redundancy among input features, we conducted iterative feature refinement based on feature importance scores [57,58]. During the iterations, features with consistently low contributions were removed. Additionally, we evaluated multicollinearity using the Variance Inflation Factor (VIF) and excluded predictors with high collinearity [59]. This iterative optimization process ensured that the final model was both stable and interpretable. The trained model was then applied to the full set of gridded predictors across Puerto Rico to generate a 10 m resolution canopy height map.

2.3.4. Model Validation and Accuracy Assessment

Model performance was evaluated by comparing the predicted canopy height values with independent reference data derived from high-resolution G-LiHT airborne LiDAR measurements. The assessment focused on evaluating both overall accuracy and residual patterns across different canopy height ranges. Five statistical metrics were used to quantify model performance: the Pearson correlation coefficient (R), root mean square error (RMSE), relative root mean square error (rRMSE), mean absolute error (MAE), and bias. R measures the strength of the linear association between predicted and reference canopy heights. RMSE indicates the average magnitude of prediction errors, while rRMSE normalizes RMSE by the mean reference height to allow for comparison across different height ranges. MAE represents the average absolute difference between predicted and observed values, providing an overall measure of prediction accuracy without considering the direction of errors. Bias measures the average signed difference between predictions and observations, indicating systematic tendencies toward overestimation or underestimation.
To further evaluate model behavior across the canopy height gradient, validation samples were divided into discrete height classes, and residuals were summarized within each class. This stratified analysis helped to identify potential trends of overestimation or underestimation associated with specific canopy height ranges, which is particularly relevant in structurally complex tropical forests.

3. Results

3.1. Improved ICESat-2 and GEDI Canopy Height Estimation After Bias Correction

The bias correction substantially improved the consistency between spaceborne LiDAR estimates and airborne G-LiHT canopy height measurements. As shown in Figure 5, the Pearson correlation coefficient (R) increased from 0.64 to 0.82 for ICESat-2 and from 0.71 to 0.86 for GEDI after the correction. The mean bias was also reduced from −2.30 to −0.21 m for ICESat-2 and from −1.12 to −0.16 m for GEDI. In addition, the RMSE decreased from 6.14 m to 3.90 m for ICESat-2 and from 5.07 m to 3.51 m for GEDI, while the rRMSE dropped from 0.36 to 0.23 and from 0.30 to 0.21, respectively. These results demonstrate a clear enhancement in accuracy and consistency with the airborne reference. These statistics were calculated using a withheld 20% subset of overlapping footprints with G-LiHT data, which served as an independent validation set. These results indicate a marked improvement in both the accuracy and agreement with the airborne reference data.
The scatterplots in Figure 5 illustrate these improvements across the full range of canopy heights. Prior to the correction, both ICESat-2 and GEDI estimates show clear deviations below the 1:1 line, particularly for taller forests, reflecting a systematic underestimation trend. The uncorrected data also exhibit greater scatter and a compressed distribution, especially at the upper end of the canopy height spectrum. In contrast, the corrected estimates align more closely with the 1:1 line, and their spread better captures the variability observed in the G-LiHT measurements.
These improvements suggest that the applied bias correction models effectively reduced height-related systematic errors and improved the fidelity of canopy height estimates. This step is especially critical in tropical forests, where tall, structurally complex canopies often exacerbate retrieval biases in spaceborne LiDAR observations [30,44].
After the bias correction, ICESat-2 and GEDI canopy height estimates exhibit improved consistency and comparable levels of accuracy. This indicates that the two datasets can be reliably merged for their use as training samples in canopy height modeling. Compared to relying on a single LiDAR source, this integration not only enhances the robustness of the model by incorporating more diverse sampling conditions, but also significantly expands the available training dataset. This is particularly valuable in regions like Puerto Rico, where the spatial coverage of each individual mission is limited. By combining bias-corrected footprints from both sensors, we are able to construct a more representative and data-rich training set, which contributes to an improved model generalization across forest types and canopy height gradients.

3.2. Model Accuracy Assessment and Canopy Height Mapping

We evaluated the performance of seven base learners and the AutoGluon stacking ensemble model trained on bias-corrected ICESat-2–GEDI fusion data using five metrics. As shown in Table 2, the AutoGluon stacking ensemble model achieved the best overall performance, with an R = 0.80, RMSE = 3.72 m, and rRMSE = 0.24. This represents a consistent improvement over individual base models, which yielded R values ranging from 0.71 to 0.79 and RMSE values between 3.80 and 4.27 m. In addition, the stacking model reduced both the systematic bias and absolute error, suggesting a stronger generalization capability across canopy height gradients.
Compared with previous studies using uncorrected ICESat-2 or GEDI data, our model achieved a significantly higher accuracy. This improvement stems not only from the use of bias-corrected ICESat-2–GEDI fusion data, which provide more reliable and consistent canopy height estimates, but also from the combination of diverse predictor variables and ensemble learning. As shown in Figure 6, the most important predictor was the Sentinel-2 RedEdge1 band, followed by the red-edge vegetation index (NDVIre1), elevation, and the spatial variability of NDVIre1. Previous studies have shown that red-edge bands from Sentinel-2 are particularly effective in capturing vertical vegetation structures, making them valuable for canopy height estimations [52,60]. The VH polarization from Sentinel-1 also played a key role, offering useful information on the canopy structure and moisture due to its greater sensitivity to volume scattering compared to VV. The combined contribution of spectral, polarization, and terrain features highlights the advantage of integrating different remote sensing modalities to capture the multi-dimensional heterogeneity of forest canopies, both horizontally and vertically.
The final canopy height map at the 10 m resolution is displayed in Figure 7. Compared with 30 m resolution products, the 10 m map provides much greater spatial detail, enabling the clearer delineation of forest patches, gaps, and transitions along ecological and topographic gradients. The zoomed-in view illustrates the ability of the model to detect fine-scale canopy height variations that would otherwise be smoothed or obscured in coarser-resolution products. This level of spatial fidelity is particularly valuable in heterogeneous tropical forests, where the forest structure varies substantially across short distances due to natural disturbance, land use, and successional processes. In particular, detailed structural patterns can be observed along ridges, valley bottoms, and transitional forest zones, where elevation and moisture gradients often lead to sharp shifts in canopy height. By capturing these localized variations, the 10 m product provides a more realistic and operationally useful representation of the forest structure, supporting applications such as habitat mapping, edge detection, and biomass estimations in complex terrains [23].

3.3. Residual Analysis and Comparison with Previous Studies

To better understand how our model compares with previous efforts, we analyzed prediction residuals and directly contrasted our results with those derived from single-source ICESat-2 and GEDI models reported in the earlier research [30]. As shown in Figure 8, the bias-corrected ICESat-2–GEDI fusion model achieved noticeably improved performance, with a correlation coefficient of 0.80 and an RMSE of 3.72 m. In comparison, models based solely on ICESat-2 or GEDI achieved lower correlations (0.69 and 0.72, respectively) and higher RMSE values (4.99 and 4.81 m). The relative RMSE of our model (0.24) was also significantly lower than those reported previously (0.33 and 0.32), indicating a substantial reduction in the overall prediction error.
We also evaluated models trained separately on the bias-corrected ICESat-2 and GEDI datasets. The ICESat-2 model achieved an RMSE of 3.91 m and an rRMSE of 0.25, while the GEDI model performed slightly better, with an RMSE of 3.79 m and an rRMSE of 0.24. Compared to their uncorrected counterparts (RMSE of 4.99 m for ICESat-2 and 4.81 m for GEDI), these results demonstrate that the bias correction alone leads to meaningful accuracy gains. However, the fusion model combining both corrected datasets still performed the best overall (RMSE = 3.72 m and rRMSE = 0.24), suggesting that integrating complementary LiDAR observations further enhances canopy height estimations.
Figure 9 illustrates residual patterns stratified by the canopy height class and highlights the improvement achieved through the bias correction and data fusion. Earlier models based on uncorrected ICESat-2 or GEDI data exhibited pronounced height-dependent biases, with overestimations in low-stature forests and underestimations in taller canopies. For canopy heights above 20 m, previous models showed median underestimations ranging from 4 to 6 m [30]. In contrast, our model presents a more balanced residual distribution, with values more symmetrically centered around zero, reduced extreme deviations, and a substantially mitigated bias across different height intervals. While the minor overestimation remains below 10 m and a slight underestimation persists above 20 m, these patterns are far less pronounced than in previous studies, indicating an improved consistency across the full height range. This enhanced residual stability supports more reliable canopy height predictions, particularly in structurally complex forest environments. Notably, taller canopies play a disproportionately important role in applications such as forest biomass modeling and carbon stock estimations, making the improved accuracy in this height range particularly valuable for ecological and climate-related assessments [29].
These improvements stem not only from the correction of systematic biases in the original LiDAR data, which are influenced by differences in sensor configurations, footprint geometry, and signal sensitivity, but also from the complementary characteristics of ICESat-2 and GEDI. ICESat-2 offers dense along-track sampling and a broad spatial coverage, while GEDI provides a finer vertical resolution and higher sensitivity to canopy structures. After applying the footprint-level bias correction, combining the two datasets yields a more accurate and coherent training dataset for model development.
Residual patterns also reflect the challenges of modeling canopy heights in structurally complex tropical environments. In regions with a dense understory, short-stature vegetation, or mixed-age forest stands, it remains difficult to consistently distinguish the true canopy top from lower vegetation layers, contributing to residual noise. Even so, the integration of high-resolution spectral, radar, and terrain predictors from Sentinel-2, Sentinel-1, and 3DEP DEM improves the model’s ability to capture local structural variations. As a result, our fusion-based approach substantially improves the predictive consistency and reduces the systematic errors reported in previous studies.

4. Discussion and Conclusions

This study presents a bias-corrected data fusion framework for high-resolution canopy height mapping in tropical forests, demonstrated using ICESat-2 and GEDI spaceborne LiDAR data, fused with Sentinel-1/2 imagery and 3DEP terrain features. By correcting systematic biases at the footprint level using airborne G-LiHT LiDAR measurements, we significantly improved the consistency and accuracy of ICESat-2 and GEDI canopy height estimates. These corrected datasets were subsequently integrated into an AutoGluon-based ensemble learning model to generate a 10 m resolution canopy height map for Puerto Rico. Compared with previous 30 m resolution products and single-sensor models [30], our approach captured a finer spatial variability and produced more accurate height estimates, particularly for tall forests that are crucial to biomass and carbon assessments.
A key contribution of this study is the demonstration that a footprint-level bias correction enables ICESat-2 and GEDI to be used jointly and effectively. Prior work often used one or the other sensor, but rarely both in tandem due to their differing sampling characteristics and systematic errors. By aligning their estimates with G-LiHT reference data, we reduced these discrepancies and generated a larger, more coherent training dataset. This expanded sample base allowed for a more robust learning process, enhancing the generalization capacity of the model across heterogeneous landscapes [31].
Our findings also reinforce the value of integrating multi-source remote sensing predictors for forest structural modeling. The AutoGluon stacking model highlighted the importance of Sentinel-2 red-edge bands and derived vegetation indices, which are known to correlate with the canopy density and foliage biomass [52]. Terrain features such as elevation and slope further improved the model performance by helping account for orographic and ecological gradients. The inclusion of the Sentinel-1 VH backscatter added a complementary structural sensitivity, particularly in densely vegetated or cloud-prone areas where optical imagery may be limited.
Despite these advances, several limitations warrant discussion. First, the spatial extent of airborne LiDAR reference data remains limited. While the G-LiHT dataset provided critical calibration and validation support, its coverage does not span the full range of forest types in Puerto Rico, potentially limiting the model’s transferability to some underrepresented regions. Second, our model relies on cloud-free Sentinel-2 composites, which can be challenging to acquire consistently in tropical regions with a high cloud persistence [61]. While temporal compositing and QA filtering can reduce noise, data gaps and misclassifications remain a concern.
Furthermore, although the bias correction step substantially reduced systematic errors, some residual height-dependent bias persists, especially for the tallest canopy classes. This may reflect limitations in spaceborne waveform and photon-counting technologies, particularly their reduced sensitivity to the uppermost canopy layers in dense forests [36]. Future efforts could explore alternative correction techniques, such as uncertainty modeling [25], to further refine predictions.
Looking ahead, the framework developed here is scalable and transferable to other tropical or subtropical forest regions with available spaceborne and airborne datasets. As more airborne campaigns and LiDAR missions become available, they can further expand the calibration base and facilitate improved global-scale mapping. Additionally, integrating ecosystem functional parameters, such as the leaf area index, canopy closure, or photosynthetic capacity, may enrich future canopy height modeling efforts [62]. With the increasing attention on forest-based climate mitigation strategies, high-resolution and accurate canopy structure maps, such as the one produced in this study, can play a key role in carbon accounting, biodiversity monitoring, and forest management planning.

Author Contributions

Conceptualization, A.L. and Y.C.; methodology, A.L.; software, A.L.; validation, A.L. and Y.C.; formal analysis, A.L., Y.C. and X.C.; data curation, A.L.; writing—original draft preparation, A.L.; writing—review and editing, A.L., Y.C. and X.C.; visualization, A.L.; supervision, Y.C.; project administration, Y.C.; funding acquisition, A.L. and Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 42306254), the Young Taishan Scholars Program of Shandong Province (Grant No. tsqn202408142), and the Natural Science Foundation of Shandong Province, China (Grant No. ZR2023QD022 and ZR2024QD110).

Data Availability Statement

The data can be sourced from the following providers: Sentinel-2 Surface Reflectance data, the Sentinel-1 Ground Range Detected backscatter product, and the 3DEP digital elevation data are available through Google Earth Engine (https://developers.google.com/earth-engine/datasets/; accessed on 24 April 2025). The G-LiHT Airborne Lidar Product is accessible at https://glihtdata.gsfc.nasa.gov/ (accessed on 24 April 2025). The GEDI L2A Product can be obtained from https://lpdaac.usgs.gov/products/gedi02_av002/ (accessed on 24 April 2025), and the ICESat-2 ATL08 Product is available at https://nsidc.org/data/atl08/versions/6/ (accessed on 24 April 2025).

Acknowledgments

We would like to express our gratitude to the teams behind the Google Earth Engine platform for providing access to essential remote sensing datasets. We also extend our appreciation to the G-LiHT team for providing the airborne lidar data used in this study. Special thanks to the GEDI and ICESat-2 science teams for making their spaceborne lidar products accessible. Their contributions have been invaluable to this research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Doughty, C.E.; Keany, J.M.; Wiebe, B.C.; Rey-Sanchez, C.; Carter, K.R.; Middleby, K.B.; Cheesman, A.W.; Goulden, M.L.; da Rocha, H.R.; Miller, S.D.; et al. Tropical Forests Are Approaching Critical Temperature Thresholds. Nature 2023, 621, 105–111. [Google Scholar] [CrossRef] [PubMed]
  2. Roberts, P.; Hamilton, R.; Piperno, D.R. Tropical Forests as Key Sites of the “Anthropocene”: Past and Present Perspectives. Proc. Natl. Acad. Sci. USA 2021, 118, e2109243118. [Google Scholar] [CrossRef] [PubMed]
  3. Pugh, T.A.M.; Lindeskog, M.; Smith, B.; Poulter, B.; Arneth, A.; Haverd, V.; Calle, L. Role of Forest Regrowth in Global Carbon Sink Dynamics. Proc. Natl. Acad. Sci. USA 2019, 116, 4382–4387. [Google Scholar] [CrossRef]
  4. Harris, N.L.; Gibbs, D.A.; Baccini, A.; Birdsey, R.A.; de Bruin, S.; Farina, M.; Fatoyinbo, L.; Hansen, M.C.; Herold, M.; Houghton, R.A.; et al. Global Maps of Twenty-First Century Forest Carbon Fluxes. Nat. Clim. Change 2021, 11, 234–240. [Google Scholar] [CrossRef]
  5. Wang, M.; Sun, R.; Xiao, Z. Estimation of Forest Canopy Height and Aboveground Biomass from Spaceborne LiDAR and Landsat Imageries in Maryland. Remote Sens. 2018, 10, 344. [Google Scholar] [CrossRef]
  6. Nandy, S.; Srinet, R.; Padalia, H. Mapping Forest Height and Aboveground Biomass by Integrating ICESat-2, Sentinel-1 and Sentinel-2 Data Using Random Forest Algorithm in Northwest Himalayan Foothills of India. Geophys. Res. Lett. 2021, 48, e2021GL093799. [Google Scholar] [CrossRef]
  7. Li, H.; Hiroshima, T.; Li, X.; Hayashi, M.; Kato, T. High-Resolution Mapping of Forest Structure and Carbon Stock Using Multi-Source Remote Sensing Data in Japan. Remote Sens. Environ. 2024, 312, 114322. [Google Scholar] [CrossRef]
  8. Adrah, E.; Wan Mohd Jaafar, W.S.; Omar, H.; Bajaj, S.; Leite, R.V.; Mazlan, S.M.; Silva, C.A.; Chel Gee Ooi, M.; Mohd Said, M.N.; Abdul Maulud, K.N.; et al. Analyzing Canopy Height Patterns and Environmental Landscape Drivers in Tropical Forests Using NASA’s GEDI Spaceborne LiDAR. Remote Sens. 2022, 14, 3172. [Google Scholar] [CrossRef]
  9. Lahssini, K.; Baghdadi, N.; le Maire, G.; Fayad, I. Influence of GEDI Acquisition and Processing Parameters on Canopy Height Estimates over Tropical Forests. Remote Sens. 2022, 14, 6264. [Google Scholar] [CrossRef]
  10. Pourshamsi, M.; Xia, J.; Yokoya, N.; Garcia, M.; Lavalle, M.; Pottier, E.; Balzter, H. Tropical Forest Canopy Height Estimation from Combined Polarimetric SAR and LiDAR Using Machine-Learning. ISPRS J. Photogramm. Remote Sens. 2021, 172, 79–94. [Google Scholar] [CrossRef]
  11. Roy, D.P.; Kashongwe, H.B.; Armston, J. The Impact of Geolocation Uncertainty on GEDI Tropical Forest Canopy Height Estimation and Change Monitoring. Sci. Remote Sens. 2021, 4, 100024. [Google Scholar] [CrossRef]
  12. Healey, S.P.; Yang, Z.; Gorelick, N.; Ilyushchenko, S. Highly Local Model Calibration with a New GEDI LiDAR Asset on Google Earth Engine Reduces Landsat Forest Height Signal Saturation. Remote Sens. 2020, 12, 2840. [Google Scholar] [CrossRef]
  13. Joshi, N.; Mitchard, E.T.A.; Brolly, M.; Schumacher, J.; Fernández-Landa, A.; Johannsen, V.K.; Marchamalo, M.; Fensholt, R. Understanding ‘Saturation’ of Radar Signals over Forests. Sci. Rep. 2017, 7, 3505. [Google Scholar] [CrossRef] [PubMed]
  14. Coops, N.C.; Tompalski, P.; Goodbody, T.R.H.; Queinnec, M.; Luther, J.E.; Bolton, D.K.; White, J.C.; Wulder, M.A.; van Lier, O.R.; Hermosilla, T. Modelling Lidar-Derived Estimates of Forest Attributes over Space and Time: A Review of Approaches and Future Trends. Remote Sens. Environ. 2021, 260, 112477. [Google Scholar] [CrossRef]
  15. Bolton, D.K.; Tompalski, P.; Coops, N.C.; White, J.C.; Wulder, M.A.; Hermosilla, T.; Queinnec, M.; Luther, J.E.; van Lier, O.R.; Fournier, R.A.; et al. Optimizing Landsat Time Series Length for Regional Mapping of Lidar-Derived Forest Structure. Remote Sens. Environ. 2020, 239, 111645. [Google Scholar] [CrossRef]
  16. Neuenschwander, A.L.; Magruder, L.A. Canopy and Terrain Height Retrievals with ICESat-2: A First Look. Remote Sens. 2019, 11, 1721. [Google Scholar] [CrossRef]
  17. Markus, T.; Neumann, T.; Martino, A.; Abdalati, W.; Brunt, K.; Csatho, B.; Farrell, S.; Fricker, H.; Gardner, A.; Harding, D.; et al. The Ice, Cloud, and Land Elevation Satellite-2 (ICESat-2): Science Requirements, Concept, and Implementation. Remote Sens. Environ. 2017, 190, 260–273. [Google Scholar] [CrossRef]
  18. Dubayah, R.; Blair, J.B.; Goetz, S.; Fatoyinbo, L.; Hansen, M.; Healey, S.; Hofton, M.; Hurtt, G.; Kellner, J.; Luthcke, S.; et al. The Global Ecosystem Dynamics Investigation: High-Resolution Laser Ranging of the Earth’s Forests and Topography. Sci. Remote Sens. 2020, 1, 100002. [Google Scholar] [CrossRef]
  19. Schneider, F.D.; Ferraz, A.; Hancock, S.; Duncanson, L.I.; Dubayah, R.O.; Pavlick, R.P.; Schimel, D.S. Towards Mapping the Diversity of Canopy Structure from Space with GEDI. Environ. Res. Lett. 2020, 15, 115006. [Google Scholar] [CrossRef]
  20. Magruder, L.A.; Brunt, K.M. Performance Analysis of Airborne Photon- Counting Lidar Data in Preparation for the ICESat-2 Mission. IEEE Trans. Geosci. Remote Sens. 2018, 56, 2911–2918. [Google Scholar] [CrossRef]
  21. Magruder, L.A.; Brunt, K.M.; Alonzo, M. Early ICESat-2 on-Orbit Geolocation Validation Using Ground-Based Corner Cube Retro-Reflectors. Remote Sens. 2020, 12, 3653. [Google Scholar] [CrossRef]
  22. Torresani, M.; Rocchini, D.; Alberti, A.; Moudrý, V.; Heym, M.; Thouverai, E.; Kacic, P.; Tomelleri, E. LiDAR GEDI Derived Tree Canopy Height Heterogeneity Reveals Patterns of Biodiversity in Forest Ecosystems. Ecol. Inform. 2023, 76, 102082. [Google Scholar] [CrossRef] [PubMed]
  23. Lang, N.; Jetz, W.; Schindler, K.; Wegner, J.D. A High-Resolution Canopy Height Model of the Earth. Nat. Ecol. Evol. 2023, 7, 1778–1789. [Google Scholar] [CrossRef] [PubMed]
  24. Potapov, P.; Li, X.; Hernandez-Serna, A.; Tyukavina, A.; Hansen, M.C.; Kommareddy, A.; Pickens, A.; Turubanova, S.; Tang, H.; Silva, C.E.; et al. Mapping Global Forest Canopy Height through Integration of GEDI and Landsat Data. Remote Sens. Environ. 2021, 253, 112165. [Google Scholar] [CrossRef]
  25. Lang, N.; Kalischek, N.; Armston, J.; Schindler, K.; Dubayah, R.; Wegner, J.D. Global Canopy Height Regression and Uncertainty Estimation from GEDI LIDAR Waveforms with Deep Ensembles. Remote Sens. Environ. 2022, 268, 112760. [Google Scholar] [CrossRef]
  26. Sothe, C.; Gonsamo, A.; Lourenço, R.B.; Kurz, W.A.; Snider, J. Spatially Continuous Mapping of Forest Canopy Height in Canada by Combining GEDI and ICESat-2 with PALSAR and Sentinel. Remote Sens. 2022, 14, 5158. [Google Scholar] [CrossRef]
  27. Liu, X.; Su, Y.; Hu, T.; Yang, Q.; Liu, B.; Deng, Y.; Tang, H.; Tang, Z.; Fang, J.; Guo, Q. Neural Network Guided Interpolation for Mapping Canopy Height of China’s Forests by Integrating GEDI and ICESat-2 Data. Remote Sens. Environ. 2022, 269, 112844. [Google Scholar] [CrossRef]
  28. Ghosh, S.M.; Behera, M.D.; Kumar, S.; Das, P.; Prakash, A.J.; Bhaskaran, P.K.; Roy, P.S.; Barik, S.K.; Jeganathan, C.; Srivastava, P.K.; et al. Predicting the Forest Canopy Height from LiDAR and Multi-Sensor Data Using Machine Learning over India. Remote Sens. 2022, 14, 5968. [Google Scholar] [CrossRef]
  29. Liu, A.; Cheng, X.; Chen, Z. Performance Evaluation of GEDI and ICESat-2 Laser Altimeter Data for Terrain and Canopy Height Retrievals. Remote Sens. Environ. 2021, 264, 112571. [Google Scholar] [CrossRef]
  30. Liu, A.; Chen, Y.; Cheng, X. Evaluating ICESat-2 and GEDI with Integrated Landsat-8 and PALSAR-2 for Mapping Tropical Forest Canopy Height. Remote Sens. 2024, 16, 3798. [Google Scholar] [CrossRef]
  31. Zhu, X.; Nie, S.; Wang, C.; Xi, X.; Lao, J.; Li, D. Consistency Analysis of Forest Height Retrievals between GEDI and ICESat-2. Remote Sens. Environ. 2022, 281, 113244. [Google Scholar] [CrossRef]
  32. Cook, B.D.; Corp, L.A.; Nelson, R.F.; Middleton, E.M.; Morton, D.C.; McCorkel, J.T.; Masek, J.G.; Ranson, K.J.; Ly, V.; Montesano, P.M. NASA Goddard’s LiDAR, Hyperspectral and Thermal (G-LiHT) Airborne Imager. Remote Sens. 2013, 5, 4045–4066. [Google Scholar] [CrossRef]
  33. Erickson, N.; Mueller, J.; Shirkov, A.; Zhang, H.; Larroy, P.; Li, M.; Smola, A. AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data. Available online: https://arxiv.org/abs/2003.06505v1 (accessed on 7 September 2024).
  34. Daly, C.; Helmer, E.H.; Quiñones, M. Mapping the Climate of Puerto Rico, Vieques and Culebra. Int. J. Climatol. 2003, 23, 1359–1381. [Google Scholar] [CrossRef]
  35. Torres-Valcárcel, Á.; Harbor, J.; González-Avilés, C.; Torres-Valcárcel, A. Impacts of Urban Development on Precipitation in the Tropical Maritime Climate of Puerto Rico. Climate 2014, 2, 47–77. [Google Scholar] [CrossRef]
  36. Neuenschwander, A.L.; Magruder, L.A. The Potential Impact of Vertical Sampling Uncertainty on ICESat-2/ATLAS Terrain and Canopy Height Retrievals for Multiple Ecosystems. Remote Sens. 2016, 8, 1039. [Google Scholar] [CrossRef]
  37. Yu, Q.; Ryan, M.G.; Ji, W.; Prihodko, L.; Anchang, J.Y.; Kahiu, N.; Nazir, A.; Dai, J.; Hanan, N.P. Assessing Canopy Height Measurements from ICESat-2 and GEDI Orbiting LiDAR across Six Different Biomes with G-LiHT LiDAR. Environ. Res. Ecol. 2024, 3, 025001. [Google Scholar] [CrossRef]
  38. Li, W.; Niu, Z.; Shang, R.; Qin, Y.; Wang, L.; Chen, H. High-Resolution Mapping of Forest Canopy Height Using Machine Learning by Coupling ICESat-2 LiDAR with Sentinel-1, Sentinel-2 and Landsat-8 Data. Int. J. Appl. Earth Obs. Geoinf. 2020, 92, 102163. [Google Scholar] [CrossRef]
  39. Neuenschwander, A.; Pitts, K. The ATL08 Land and Vegetation Product for the ICESat-2 Mission. Remote Sens. Environ. 2019, 221, 247–259. [Google Scholar] [CrossRef]
  40. Liu, A.; Wu, Q.; Cheng, X. Using the Google Earth Engine to Estimate a 10 m Resolution Monthly Inventory of Soil Fugitive Dust Emissions in Beijing, China. Sci. Total Environ. 2020, 735, 139174. [Google Scholar] [CrossRef]
  41. Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-Scale Geospatial Analysis for Everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
  42. Torres de Almeida, C.; Gerente, J.; Rodrigo dos Prazeres Campos, J.; Caruso Gomes Junior, F.; Providelo, L.A.; Marchiori, G.; Chen, X. Canopy Height Mapping by Sentinel 1 and 2 Satellite Images, Airborne LiDAR Data, and Machine Learning. Remote Sens. 2022, 14, 4112. [Google Scholar] [CrossRef]
  43. Xi, Z.; Xu, H.; Xing, Y.; Gong, W.; Chen, G.; Yang, S. Forest Canopy Height Mapping by Synergizing ICESat-2, Sentinel-1, Sentinel-2 and Topographic Information Based on Machine Learning Methods. Remote Sens. 2022, 14, 364. [Google Scholar] [CrossRef]
  44. Kacic, P.; Hirner, A.; Da Ponte, E. Fusing Sentinel-1 and -2 to Model GEDI-Derived Vegetation Structure Characteristics in GEE for the Paraguayan Chaco. Remote Sens. 2021, 13, 5105. [Google Scholar] [CrossRef]
  45. Bruggisser, M.; Dorigo, W.; Dostálová, A.; Hollaus, M.; Navacchi, C.; Schlaffer, S.; Pfeifer, N. Potential of Sentinel-1 C-Band Time Series to Derive Structural Parameters of Temperate Deciduous Forests. Remote Sens. 2021, 13, 798. [Google Scholar] [CrossRef]
  46. Radeloff, V.C.; Roy, D.P.; Wulder, M.A.; Anderson, M.; Cook, B.; Crawford, C.J.; Friedl, M.; Gao, F.; Gorelick, N.; Hansen, M.; et al. Need and Vision for Global Medium-Resolution Landsat and Sentinel-2 Data Products. Remote Sens. Environ. 2024, 300, 113918. [Google Scholar] [CrossRef]
  47. Stoker, J.; Miller, B. The Accuracy and Consistency of 3D Elevation Program Data: A Systematic Analysis. Remote Sens. 2022, 14, 940. [Google Scholar] [CrossRef]
  48. Lugo, A.E.; Helmer, E. Emerging Forests on Abandoned Land: Puerto Rico’s New Forests. For. Ecol. Manag. 2004, 190, 145–161. [Google Scholar] [CrossRef]
  49. Tang, H.; Huang, H.; Zheng, Y.; Qin, P.; Xu, Y.; Ding, S. Improved GEDI Canopy Height Extraction Based on a Simulated Ground Echo in Topographically Undulating Areas. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5705915. [Google Scholar] [CrossRef]
  50. Rahman, M.F.; Onoda, Y.; Kitajima, K. Forest Canopy Height Variation in Relation to Topography and Forest Types in Central Japan with LiDAR. For. Ecol. Manag. 2022, 503, 119792. [Google Scholar] [CrossRef]
  51. Shi, T.; Xu, H. Derivation of Tasseled Cap Transformation Coefficients for Sentinel-2 MSI At-Sensor Reflectance Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 4038–4048. [Google Scholar] [CrossRef]
  52. Sun, Y.; Qin, Q.; Ren, H.; Zhang, T.; Chen, S. Red-Edge Band Vegetation Indices for Leaf Area Index Estimation from Sentinel-2/MSI Imagery. IEEE Trans. Geosci. Remote Sens. 2020, 58, 826–840. [Google Scholar] [CrossRef]
  53. Geurts, P.; Ernst, D.; Wehenkel, L. Extremely Randomized Trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
  54. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
  55. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems; Curran Associates Inc.: New York, NY, USA, 2017; Volume 30. [Google Scholar]
  56. Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased Boosting with Categorical Features. In Advances in Neural Information Processing Systems; Curran Associates Inc.: New York, NY, USA, 2018; Volume 31. [Google Scholar]
  57. Chen, Y.; Cheng, X.; Liu, A.; Chen, Q.; Wang, C. Tracking Lake Drainage Events and Drained Lake Basin Vegetation Dynamics across the Arctic. Nat. Commun. 2023, 14, 7359. [Google Scholar] [CrossRef] [PubMed]
  58. Liu, A.; Chen, Y.; Cheng, X. Monitoring Thermokarst Lake Drainage Dynamics in Northeast Siberian Coastal Tundra. Remote Sens. 2023, 15, 4396. [Google Scholar] [CrossRef]
  59. Liu, A.; Chen, Y.; Cheng, X. Effects of Thermokarst Lake Drainage on Localized Vegetation Greening in the Yamal–Gydan Tundra Ecoregion. Remote Sens. 2023, 15, 4561. [Google Scholar] [CrossRef]
  60. Lang, N.; Schindler, K.; Wegner, J.D. Country-Wide High-Resolution Vegetation Height Mapping with Sentinel-2. Remote Sens. Environ. 2019, 233, 111347. [Google Scholar] [CrossRef]
  61. Nazarova, T.; Martin, P.; Giuliani, G. Monitoring Vegetation Change in the Presence of High Cloud Cover with Sentinel-2 in a Lowland Tropical Forest Region in Brazil. Remote Sens. 2020, 12, 1829. [Google Scholar] [CrossRef]
  62. Wang, Y.; Fang, H. Estimation of LAI with the LiDAR Technology: A Review. Remote Sens. 2020, 12, 3457. [Google Scholar] [CrossRef]
Figure 2. Remote sensing data sources used for canopy height mapping over Puerto Rico. (a) The digital elevation model overlaid with the spatial footprints of GEDI, ICESat-2, and G-LiHT; (b) the Sentinel-1 image (VH polarized backscatter composite); and (c) the Sentinel-2 image (true-color composite).
Figure 2. Remote sensing data sources used for canopy height mapping over Puerto Rico. (a) The digital elevation model overlaid with the spatial footprints of GEDI, ICESat-2, and G-LiHT; (b) the Sentinel-1 image (VH polarized backscatter composite); and (c) the Sentinel-2 image (true-color composite).
Remotesensing 17 01968 g002
Figure 3. Multi-source data integration and modeling framework for 10 m canopy height mapping in tropical forests.
Figure 3. Multi-source data integration and modeling framework for 10 m canopy height mapping in tropical forests.
Remotesensing 17 01968 g003
Figure 4. Workflow of ensemble modeling for forest canopy height estimation using AutoGluon.
Figure 4. Workflow of ensemble modeling for forest canopy height estimation using AutoGluon.
Remotesensing 17 01968 g004
Figure 5. Validation results showing comparisons of ICESat-2 and GEDI canopy height estimates against airborne G-LiHT measurements, before and after the bias correction. Panels (a,c) show uncorrected ICESat-2 and GEDI estimates; panels (b,d) show the corresponding bias-corrected results. Accuracy metrics (R, Bias, RMSE, and rRMSE) are reported for each panel.
Figure 5. Validation results showing comparisons of ICESat-2 and GEDI canopy height estimates against airborne G-LiHT measurements, before and after the bias correction. Panels (a,c) show uncorrected ICESat-2 and GEDI estimates; panels (b,d) show the corresponding bias-corrected results. Accuracy metrics (R, Bias, RMSE, and rRMSE) are reported for each panel.
Remotesensing 17 01968 g005
Figure 6. Key predictor variables for canopy height estimation, derived from Sentinel-1/2 and 3DEP DEM data.
Figure 6. Key predictor variables for canopy height estimation, derived from Sentinel-1/2 and 3DEP DEM data.
Remotesensing 17 01968 g006
Figure 7. Examples of the 10 m resolution canopy height map for Puerto Rico. Multiple zoomed-in regions illustrate improved spatial detail compared to the 30 m resolution across ridges, valleys, and fragmented lowland forests.
Figure 7. Examples of the 10 m resolution canopy height map for Puerto Rico. Multiple zoomed-in regions illustrate improved spatial detail compared to the 30 m resolution across ridges, valleys, and fragmented lowland forests.
Remotesensing 17 01968 g007
Figure 8. Predicted vs. observed canopy height based on independent validation using G-LiHT data. (a) Our model using bias-corrected ICESat-2–GEDI fusion data; (b) uncorrected ICESat-2 model; and (c) uncorrected GEDI model from Liu et al. [30]. Panels (df) show corresponding residual distributions. Here, N indicates number of validation samples. Differences in N reflect spatial resolution difference between our 10 m product and 30 m products from Liu et al. [30].
Figure 8. Predicted vs. observed canopy height based on independent validation using G-LiHT data. (a) Our model using bias-corrected ICESat-2–GEDI fusion data; (b) uncorrected ICESat-2 model; and (c) uncorrected GEDI model from Liu et al. [30]. Panels (df) show corresponding residual distributions. Here, N indicates number of validation samples. Differences in N reflect spatial resolution difference between our 10 m product and 30 m products from Liu et al. [30].
Remotesensing 17 01968 g008
Figure 9. Residuals of canopy height predictions across height bins for (a) our bias-corrected ICESat-2–GEDI fusion model and uncorrected (b) ICESat-2 and (c) GEDI models from Liu et al. [30].
Figure 9. Residuals of canopy height predictions across height bins for (a) our bias-corrected ICESat-2–GEDI fusion model and uncorrected (b) ICESat-2 and (c) GEDI models from Liu et al. [30].
Remotesensing 17 01968 g009
Table 1. Key parameters extracted from the GEDI L2A and ICESat-2 ATL08 products.
Table 1. Key parameters extracted from the GEDI L2A and ICESat-2 ATL08 products.
Data SourceVariable NameDescription
GEDI L2Alon_lowestmodeLongitude of the footprint center
lat_lowestmodeLatitude of the footprint center
RH 1-100Canopy height percentiles derived from waveform inversion
quality_flagFlag used for quality assessment
delta_timeUsed to determine whether the data were acquired during day/night
beam_flagUsed to identify whether the beam is a coverage or power beam
SensitivitySignal-to-noise ratio measure related to canopy cover
ICESat-2 ATL08longitudeLongitude of the segment center
latitudeLatitude of the segment center
canopy_h_metricsRelative (RH##) canopy height metrics from RH10 to RH95
h_canopyRH98 canopy height within the segment
h_max_canopyRH100 percentile canopy height within the segment
n_seg_phNumber of ground and canopy photons detected in the segment
SNRRatio of signal photons to noise photons
ground_track_flagIdentifies beam strength based on orbital information
night_flagIndicates whether data were acquired during day or night
layer_flagIndicates whether data were affected by cloud cover
segment_snowcoverIndicates whether data were affected by snow cover
Table 2. An accuracy comparison of base learners and the AutoGluon stacking ensemble model for canopy height inversion using bias-corrected ICESat-2–GEDI fusion data, validated against independent G-LiHT airborne LiDAR measurements.
Table 2. An accuracy comparison of base learners and the AutoGluon stacking ensemble model for canopy height inversion using bias-corrected ICESat-2–GEDI fusion data, validated against independent G-LiHT airborne LiDAR measurements.
ModelsRBias (m)MAE (m)RMSE (m)rRMSE
K-nearest neighbors0.71−0.363.164.270.28
Neural networks0.780.162.853.840.25
Random forests0.780.232.853.840.25
Extremely randomized trees0.780.262.883.820.25
XGBoost0.780.322.843.830.25
LightGBM0.790.262.823.820.25
CatBoost0.790.402.843.800.25
AutoGluon stacking ensemble0.800.222.773.720.24
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, A.; Chen, Y.; Cheng, X. Improving Tropical Forest Canopy Height Mapping by Fusion of Sentinel-1/2 and Bias-Corrected ICESat-2–GEDI Data. Remote Sens. 2025, 17, 1968. https://doi.org/10.3390/rs17121968

AMA Style

Liu A, Chen Y, Cheng X. Improving Tropical Forest Canopy Height Mapping by Fusion of Sentinel-1/2 and Bias-Corrected ICESat-2–GEDI Data. Remote Sensing. 2025; 17(12):1968. https://doi.org/10.3390/rs17121968

Chicago/Turabian Style

Liu, Aobo, Yating Chen, and Xiao Cheng. 2025. "Improving Tropical Forest Canopy Height Mapping by Fusion of Sentinel-1/2 and Bias-Corrected ICESat-2–GEDI Data" Remote Sensing 17, no. 12: 1968. https://doi.org/10.3390/rs17121968

APA Style

Liu, A., Chen, Y., & Cheng, X. (2025). Improving Tropical Forest Canopy Height Mapping by Fusion of Sentinel-1/2 and Bias-Corrected ICESat-2–GEDI Data. Remote Sensing, 17(12), 1968. https://doi.org/10.3390/rs17121968

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop