High-Resolution Canopy Height Mapping: Integrating NASA’s Global Ecosystem Dynamics Investigation (GEDI) with Multi-Source Remote Sensing Data

: Accurate structural information about forests, including canopy heights and diameters, is crucial for quantifying tree volume, biomass, and carbon stocks, enabling effective forest ecosystem management, particularly in response to changing environmental conditions. Since late 2018, NASA’s Global Ecosystem Dynamics Investigation (GEDI) mission has monitored global canopy structure using a satellite Light Detection and Ranging (LiDAR) instrument. While GEDI has collected billions of LiDAR shots across a near-global range (between 51.6 ◦ N and >51.6 ◦ S), their spatial distribution remains dispersed, posing challenges for achieving complete forest coverage. This study proposes and evaluates an approach that generates high-resolution canopy height maps by integrating GEDI data with Sentinel-1, Sentinel-2, and topographical ancillary data through three machine learning (ML) algorithms: random forests (RF), gradient tree boost (GB), and classification and regression trees (CART). To achieve this, the secondary aims included the following: (1) to assess the performance of three ML algorithms, RF, GB, and CART, in predicting canopy heights, (2) to evaluate the performance of our canopy height maps using reference canopy height from canopy height models (CHMs), and (3) to compare our canopy height maps with other two existing canopy height maps. RF and GB were the top-performing algorithms, achieving the best 13.32% and 16% root mean squared error for broadleaf and coniferous forests, respectively. Validation of the proposed approach revealed that the 100th and 98th percentile, followed by the average of the 75th, 90th, 95th, and 100th percentiles (AVG), were the most accurate GEDI metrics for predicting real canopy heights. Comparisons between predicted and reference CHMs demonstrated accurate predictions for coniferous stands (R-squared = 0.45, RMSE = 29.16%).


Introduction
Forests play a vital role in regulating the carbon and water cycles, supporting biodiversity and providing economic benefits to society [1].However, the stability of forests, and the ecosystem-services they provide, are increasingly threatened by anthropogenic change.Consequently, regular monitoring across large spatial scales is needed for effective forest conservation and management [2].Forests have been recognised as an important nature-based solution to climate change, as they remove CO 2 from the atmosphere and store it as biomass.This has led to increased efforts to quantify global forest biomass and carbon stocks.In particular, remote sensing (RS) technologies have been used to assess canopy height, a key variable for estimating above-ground biomass and ultimately carbon stocks, as well as for identifying ecosystems services [3][4][5][6].Satellite-based measurements of canopy height are now available on an unprecedented global scale [7][8][9][10]; however, there remain certain limitations that hinder the use of these data for localised forest studies.
LiDAR (Light Detection and Ranging) is one of the most useful remote sensing tools for quantifying forest structure.Different LiDAR instruments have different spatial resolutions depending on the distance from the sensor to the object, from terrestrial (millimetric) to airborne (centimetric) to satellite LiDAR systems (metric) [11,12].Terrestrial LiDAR is by far the most accurate system for capturing forest structure, but the cost and time required to complete a survey means that these systems are rarely used [11].However, airborne laser scanning (ALS) is a more cost-effective option, capable of surveying forest stands in a relatively short amount of time [11,13,14].Consequently, ALS is one of the most frequently used LiDAR systems for field-based forest surveys in most EU countries.Finally, satellite LiDAR monitors changes in forest structure on a near-global scale.The Global Ecosystem Dynamics Investigation (GEDI) NASA mission characterises the structure of Earth's forests using a LiDAR instrument onboard the International Space Station (between 51.6 • N and S) [15].Since its launch, GEDI's satellite products have been widely adopted for forest-related research [16][17][18].For instance, GEDI's products have been used to create forest composition maps [19], quantify carbon stocks [16], investigate the relationship between biodiversity and forest structure [20], and detect human logging activities, forest disturbance, and changes in forest structure [21][22][23].
The GEDI mission has improved our ability to understand and monitor changes in forest structure, but there are still a number of caveats to the data.The Level-2A Geolocated Elevation and Height Metrics GEDI Product (GEDI02_A) consists of a hundred relative height (Rh) metrics with a 25 m pixel spatial resolution (average footprint size) [15] (https://gedi.umd.edu/data/products/;accessed on 11 September 2023).This large pixel size can be problematic for stand-level forest monitoring, since a small stand might only have a few pixels covering the area.Furthermore, relative height averaged within a 25 m pixel loses information on structural heterogeneity.Another key caveat for using GEDI data is that they are discontinuous, resulting in large quantities of missing data.This is especially problematic for estimating above ground biomass and carbon stocks, since it leaves many areas unaccounted.The quality of GEDI's acquisition accuracy also depends on various factors, including footprint geolocation and footprint variability [22,24].Consequently, although GEDI data are appropriate for investigating global patterns in forest structure, the large spatial resolution of individual pixels and discontinuous acquisitions pose challenges at smaller geographical scales.Trees are the fundamental building blocks of forests, and management decisions are ultimately conducted on a tree-by-tree basis.Therefore, downscaling GEDI data to enhance pixel resolution and extend coverage has become an area of active research in recent years (Table 1).
In order to achieve high-resolution, full-coverage canopy height maps, numerous studies have used machine learning (ML) for downscaling and spatializing satellite-LiDAR data.These studies often combine satellite LiDAR with satellite imagery to reduce the original footprint size and extend coverage for entire forest ecosystems [25][26][27][28][29] (Table 1).ML algorithms have proven effective in interpreting complex patterns within RS datasets, adjusting for overfitting biases, and efficiently handling large volumes of data [30].Cur-rently, the most widely used ML methods include random forests (RF), deep learning (DL), and gradient tree boost (GB) (Table 1).Despite the prevalence of RF and DL algorithms, there remains a gap in utilizing classification and regression trees (CART) algorithms for downscaling purposes (Table 1).
Table 1.Canopy height assessment studies using satellite LiDAR (light detection and ranging) data.The accuracy measurements from these studies are mean absolute error (MAE), coefficient of correlation (r) and determination (R-squared), bias, root mean square error (RMSE), overall accuracy (OA), and mean error (ME).Deep learning (DL), random forests (RF), linear model (LM), gradient tree boost (GB), ordinary least squares (OLS), convolutional neural network (CNN), NA for unknown methodology, Landsat (LDT), Sentinel-1 (S1), Sentinel-2 (S2), the ice, cloud, and land elevation satellite (ICESat), Global Ecosystem Dynamics Investigation (GEDI), and the National Terrestrial Ecosystem Monitoring System (NTEMS).Google Earth Engine (GEE) is widely used for remote sensing analyses due to its cloud-based computing architecture and easy access to multi-temporal global satellite data, removing the computational limitations associated with local analysis [23,[33][34][35].This capability has empowered researchers to use a cloud-based platform to analyse petabytes of RS images and generate canopy height raster data, predominantly leveraging Sentinel-1 (S1), Sentinel-2 (S2), and Landsat (LDT) as data sources (Table 1).For instance, Potapov et al. [27] used a bagged regression trees ensemble method to merge GEDI data with multi-temporal Landsat images, to produce a global canopy height map with a spatial resolution of 30 m (hereafter CH-Potapov2019).However, they observed that the low resolution of Landsat prompted an overestimation in measurements of forest canopy height in temperate forests.A comparable map was produced by Lang et al. [28] using multi-temporal S2 images and a deep learning method (hereafter CH-Lang2020) without considering locally calibrated models.

Site
However, none of the previous studies include topographic characteristics as independent variables for predicting canopy heights, despite the demonstrated signifi-cance of topography in enhancing the accuracy of GEDI footprint measurements in forest canopies [25,26].
Although the downscaling method is quite similar across the previously mentioned studies, a significant difference lies in the metric used as a proxy to predict the top-ofcanopies.For instance, the 90th, 95th, and 98th percentiles were most frequently used as Rh metrics (from 90th to Rh90) [27,28,36,37].Interestingly, Potapov et al. [27] revealed that Rh90 tended to underestimate canopy height, whilst Rh100 tended to overestimate canopy height.Furthermore, to the best of our knowledge, most of the developed canopy height maps using locally calibrated models were focused on specific areas [8,25,32].By contrast, global canopy height maps were year-specific (i.e., 2019 and 2020) and utilized GEDI footprints at the national or continental level [27,28].Given these lacks (i.e., absence of topographic predictors, adoption of single ML models, and individual relative height without locally calibrations), there is a pressing need to investigate the effect of different Rh metrics on predicting canopy height, using locally calibrated ML models to investigate the influence of additional RS and topographic data on model outcomes.
In response to these gaps, this study proposes and evaluates an approach to generate high-resolution canopy height maps (10 m) by combining GEDI with S1, S2, and topographical data using different locally calibrated ML algorithms.To reach this aim, this study dealt with the following three specific research objectives: (1) To assess the performance of three ML algorithms, RF, GB, and CART, in predicting canopy heights from the most commonly used GEDI metrics.(2) To evaluate the performance of our canopy height maps using reference ALS-based CHMs.
The proposed approach is tested in two structurally contrasting Mediterranean coniferous and broadleaved forest sites, as they represent two of the most important European forest ecosystems with contrasting forest stand structures [2].

Study Area
Two Mediterranean forest test sites were selected, based on their contrasting structural and demographic profiles.Both sites are located in central Italy, known locally as Pennataro (41  [38,39].Lago di Occhito is characterized by approximately 997 ha of structurally homogenous forests, whilst Pennataro is covered by approximately 270 ha of structurally heterogeneous forests (Figure 1) [40,41].Pennataro is a mixed broadleaved tree species, dominated by Turkey oak (Quercus cerris, 40%) and including European beech (Fagus sylvatica, 21%) and Italian maple (Acer opalus, 9.6%) [42].In Lago di Occhito, forest plantation is characterized by Aleppo pine (Pinus halepensis, 61%), which is the dominant species, and Arizona cypress (Cupressus arizonica, 20%), plus a limited number of other coniferous tree species.

Data
The following four main data sources were used in this study for model development and validation: (1) ALS data collection and processing for canopy height validation (Section 3.1); (2) GEDI relative height metrics to be downscaled (Section 3.2); (3) multispectral Sentinel-1 and Sentinel-2 satellite imagery for canopy height prediction (Section 3.3); and (4) topographical features such as elevation, slope, and aspect, as additional predictors for canopy height (Section 3.4).In addition, we used two existing GEDI-derived canopy height maps for further validation (Section 3.5).

Airborne Laser Scanning Data Collection and Processing
ALS data were acquired for both the broadleaf and coniferous forest sites.ALS data for the broadleaf forest site were collected in June 2016 during the leaf-on season, while data for the coniferous forest site were collected in July 2021.YellowScan Mapper+ sensors were used and mounted on a helicopter for flying over a broadleaf forest site and on a

Data
The following four main data sources were used in this study for model development and validation: (1) ALS data collection and processing for canopy height validation (Section 3.1); (2) GEDI relative height metrics to be downscaled (Section 3.2); (3) multispectral Sentinel-1 and Sentinel-2 satellite imagery for canopy height prediction (Section 3.3); and (4) topographical features such as elevation, slope, and aspect, as additional predictors for canopy height (Section 3.4).In addition, we used two existing GEDI-derived canopy height maps for further validation (Section 3.5).

Airborne Laser Scanning Data Collection and Processing
ALS data were acquired for both the broadleaf and coniferous forest sites.ALS data for the broadleaf forest site were collected in June 2016 during the leaf-on season, while data for the coniferous forest site were collected in July 2021.YellowScan Mapper+ sensors were used and mounted on a helicopter for flying over a broadleaf forest site and on a Matrice 300 RTK unmanned aerial vehicle (UAV) for flying over a coniferous forest site.In the broadleaf site, the sensor was configured with a maximum scan angle of ±50 • and a pulse frequency of 20 kHz, resulting in an average point density of 60/m 2 .In the coniferous site, a most recent version of YellowScan Mapper+ sensor was used, mounted on a UAV flying at an altitude of 70 m above the canopy.This resulted in a higher scanning frequency of 10 Hz, generating a denser point cloud with an average density of 300 points/m 2 (https://www.yellowscan-lidar.com/products/mapper-3/,accessed on 3 March 2024).ALS data collection covered 90% of the broadleaf forest site and 16% of the coniferous forest site.
The following four main steps were used for generating a canopy height map from the ALS data: (1) point classification as ground/non-ground; (2) outlier removal; (3) Z-point normalization; and (4) CHM generation.The 'lidR' package [42] and 'rlas' package [43] (R version 4.3.0)were used to create CHMs [44].The final CHMs (.tiff) with a spatial resolution of 1 m, for broadleaf and coniferous forest sites, were obtained.Hence, we standardized the spatial resolution of the CHMs of the study areas to 1 m, to ensure consistency in the validation approach and outputs.

Global Ecosystem Dynamics Investigation (GEDI) Level-2A Data
GEDI footprint and gridded datasets come in different spatial resolutions for 3D Earth observation [45].Since we were focused on canopy height, we used the GEDI-L2A (Level-2A) top-of-canopy and relative height metrics.This dataset has high geolocation accuracy, equivalent to a horizontal error of 10.3 m [46], and comprises 25 m footprints, acquired at 60 m intervals along the track and 600 m intervals across the track [15].
Three quality control filters, "sensitivity", "quality", and "degrade flags", were used to filter the GEDI-L2A data.These flags indicate waveform reliability for measuring 3D surface structure [46,47].Waveforms unsuitable for measuring potential top canopy height (quality flag = 1) were excluded, and only non-degraded samples (degrade flag = 0) were selected.Subsequently, four metrics, Rh90, Rh95, Rh98, and Rh100, were chosen due to their potential to accurately measure the top-of-canopy height [27,28,35,36].Additionally, a metric based on the average of Rh75, Rh90, Rh95, and Rh100 (hereafter the AVG metric) was incorporated into this study.Rh75 is an effective metric for characterizing forest structure [48], but it is rarely used in deriving top-of-canopy heights.We hypothesize that integrating the Rh75 metric into an AVG metric will allow us to mitigate under-and over-estimations associated with Rh90 and Rh100 metrics [27].

Sentinel Mission Data
GEE was used to access harmonised Sentinel-2 Level-2A data and Sentinel-1 ground range detected (GRD) data between 1 July and 31 August 2023.To reduce leaf canopy occlusion, leaf-off canopy conditions were selected for the study period [49].Sentinel-1 bands, with both ascending and descending orbits, were selected using the IW (interferometric wide) mode, including single ("VV" and "HH") and dual polarization ("VH") signals [50].The Sentinel-1 bands were pre-processed using border noise correction, speckle filtering, and radiometric terrain normalization [51].For the broadleaf forest site, 116 multi-temporal images were available, and 125 multi-temporal images for the broadleaf and coniferous forest sites, respectively, from S1 during the setting period.Each available image contained three channels, including the incident angle range ("Angle"), as well as single and dual polarizations ("VV"-"VH") bands.As preprocessing, a spatial filtering operation that computes the median value within a neighbourhood around each pixel in an image was applied to the previously acquired multi-temporal images.This process performed a composite band [51].
For the study period, 25 Sentinel-2 images were available for the broadleaf site and 13 for the coniferous site, after ensuring all images had less than 70% cloud cover.Images were masked using the QA60 band to limit the presence of opaque and cirrus clouds [23,52].A final median composite image was then generated for each study site, excluding clouds and shadows [52][53][54].

Topographical Data
Three topographical variables were included (elevation, slope, and aspect) since they have been demonstrated to significantly impact forest structure [22,[55][56][57].The Global Multi-Resolution Terrain Elevation Data (GMTE)-2010 dataset, available from GEE, was used to compute slope and aspect in addition to elevation [58].

Existing Global Ecosystem Dynamics Investigation (GEDI)-Derived Canopy Height Maps
Two previously published global canopy height maps are readily accessible through GEE [28] and the Earth Map platform [59].Both maps integrate 3D GEDI data with multi-spectral and multi-temporal satellite imagery for downscaling, making them ideal candidates for benchmarking future canopy height maps.The first canopy height map, referred to as CH-Lang2020 [28], predicts canopy height with a spatial resolution of 10 m.The second map, referred to as CH-Potapov2019 [27], predicts canopy height with a spatial resolution of 30 m.The CH-Potapov2019 map uses Rh95 to represent top-of-canopy, whilst CH-Lang2020 uses Rh98.Landsat satellite imagery and land surface elevation were used in CH-Potapov2019, whereas Sentinel-2 imagery and land surface elevation was in CH-Lang2020, resulting in a higher resolution of 10 m.For both studies, map accuracy was validated using ALS data, with CH-Potapov2019 achieving a RMSE of 9.07 m (Mean Absolute Error "MAE" = 6.36 m) and an RMSE of 6 m (MAE = 4 m) for CH-Lang2020 [28].

Methods
In this study we leveraged the readily available data and computational tools in GEE to develop canopy height maps for two distinct Mediterranean forest types [33].The main steps of the workflow are as follows (Figure 2): (1) feature selection, model construction, and performance evaluation (Section 4.1); (2) map validation with ALSbased CHMs (Section 4.2); and (3) benchmarking with existing canopy height maps from CH-Potapov2019 and CH-Lang2020 (Section 4.3).

Canopy Height Map Prediction
GEE offers a number of built-in machine learning tools for supervised classification, unsupervised classification, and regression.To predict canopy heights from different GEDI Rh metrics, we used 18 predictors in total, including multi-spectral Sentinel-1 (three predictors) and Sentinel-2 (twelve predictors) imagery and ancillary topographical data (three predictors).The dataset was split into training (70% corresponding to 2586 and 2464 samples in broadleaf and coniferous forests, respectively) and testing (30% corresponding to 1108 and 1056 samples in broadleaf and coniferous forests, respectively) data [53,[60][61][62], using a random sampling strategy within the masked forest sites.To downscale and spatialize GEDI data, three ML algorithms from the GEE Classifier were used: RF, GB, and classification and regression (CART).These algorithms were selected based on their demonstrated suitability for predicting canopy height and robustness to overfitting, due to their potential in applying decision tree analysis, enabling regression analysis for large datasets, assessing feature importance across variables (thus reducing overfitting), and replicating human reasoning in data processing [17,60,61].The number of trees was set to 500 for the RF and GB models, with other hyperparameters set to default values (see Table 2) [63][64][65].The CART models were run with default hyperparameters [66].In total, 30 ML models were trained: five GEDI metrics (Rh90, Rh95, Rh98, Rh100, and AVG) were analysed using the RF, GB, and CART algorithms to obtain 15 canopy height maps, with 10 m pixel resolution, for each study area (i.e., the coniferous site and the broadleaf site).The testing data were used to validate the performance of the ML models through the coefficient of determination (R-squared, ranging from zero to one) and root mean squared error (RMSE, %).R-squared was used to gauge the extent to which ML models explain variability, whilst RMSE was used to assess the predictive accuracy of each model and map (lower RMSE values indicate more precise and reliable predicted outcomes).

Canopy Height Map Prediction
GEE offers a number of built-in machine learning tools for supervised classification, unsupervised classification, and regression.To predict canopy heights from different GEDI Rh metrics, we used 18 predictors in total, including multi-spectral Sentinel-1 (three predictors) and Sentinel-2 (twelve predictors) imagery and ancillary topographical data (three predictors).The dataset was split into training (70% corresponding to 2586 and 2464 samples in broadleaf and coniferous forests, respectively) and testing (30% corresponding to 1108 and 1056 samples in broadleaf and coniferous forests, respectively) data [53,[60][61][62], using a random sampling strategy within the masked forest sites.To downscale and spatialize GEDI data, three ML algorithms from the GEE Classifier were used: RF, GB, and classification and regression (CART).These algorithms were selected based on their demonstrated suitability for predicting canopy height and robustness to overfitting, due to their potential in applying decision tree analysis, enabling regression analysis for large datasets, assessing feature importance across variables (thus reducing overfitting), and

Comparison of Predicted Canopy Height Maps with Reference Airborne Laser Scanning (ALS)-Based Canopy Height Models (CHMs)
To evaluate the performance of the resulting canopy height maps, we compared the predicted heights with a reference ALS-based CHM map.The maps were resampled and aligned using a nearest neighbour method, as we relied on the assumption that near point data tend to be more similar than far point data.As it is considered suitable for continuous data populations and forestry studies [67], the nearest neighbour method was applied to complete the following tasks: (1) to resample our canopy height maps (10 m) to the resolution of ALS-based CHM map (1 m); and (2) to check errors in raster alignment and adjust alignment using ALS-based CHM map as snap raster.Subsequently, we applied Cook's distance method to remove influential values (outliers) from pixel values from the predicted and reference maps [68][69][70].Then, the predicted and reference data were compared using the linear regression model.Finally, we assessed map accuracy using RMSE (%) and the variability explained by the regression models using R-squared (which ranged from zero to one) [71].

Comparison of Predicted Canopy Height Maps with Other Existing Global Ecosystem Dynamics Investigation (GEDI)-Derived Canopy Height Maps
To further evaluate the robustness of the predicted canopy height maps, we compared map accuracy statistics with those of CH-Potapov2019 and CH-Lang2020 for the same ALS-based reference CHM.The existing CH-Potapov2019 and CH-Lang2020 maps were processed using the same method outlined in Section 4.2 to allow for fair comparisons.The overall evaluations were made using the R-squared (which ranged from zero to one), RMSE (%), and pixel frequency distribution [27,28].

Canopy Heights Map Prediction
A single canopy height map was constructed for each of the study sites with 10 m pixel resolution.Contrasting results were obtained for the coniferous and broadleaf sites, with R-squared values ranging from 0.61 to 0.93 (RMSE % = 16.35-26.58)in the coniferous site and from 0.64 to 0.78 (RMSE % = 13.32-17.79)in the broadleaf site (Figure 3).Notably, the RF and GB models achieved better predictive performance than the CART models.In fact, the CART models exhibited low accuracy, with the highest RMSE reaching 17.79% in the broadleaf site and 26.58% in the coniferous site.Some variation was observed in the suitability of different GEDI Rh metrics for predicting canopy heights.We found that, in the study sites, the Rh98, Rh100, and AVG performed slightly better than Rh90 and Rh95, but no significant difference was found overall.
In the coniferous site, the RF and GB models produced the most accurate canopy height maps compared to the CART models (Figure 4).The maps derived from CART models exhibited pixelation, blurriness, and inconsistencies in brightness (see Figure 4).
In the broadleaf site, the RF and GB models also outperformed the CART models.However, the difference between the RF (RMSE = 2.61%) and GB (RMSE = 3.32%) models was minimal in terms of RMSE % (see Figures 3 and 5).CART-derived maps exhibited greater pixelation, blurriness, and brightness inconsistencies compared to RF and GB maps (Figure 5).
site and from 0.64 to 0.78 (RMSE % = 13.32-17.79)in the broadleaf site (Figure 3).Notably, the RF and GB models achieved better predictive performance than the CART models.In fact, the CART models exhibited low accuracy, with the highest RMSE reaching 17.79% in the broadleaf site and 26.58% in the coniferous site.Some variation was observed in the suitability of different GEDI Rh metrics for predicting canopy heights.We found that, in the study sites, the Rh98, Rh100, and AVG performed slightly better than Rh90 and Rh95, but no significant difference was found overall.In the coniferous site, the RF and GB models produced the most accurate canopy height maps compared to the CART models (Figure 4).The maps derived from CART models exhibited pixelation, blurriness, and inconsistencies in brightness (see Figure 4).In the broadleaf site, the RF and GB models also outperformed the CART models.However, the difference between the RF (RMSE = 2.61%) and GB (RMSE = 3.32%) models was minimal in terms of RMSE % (see Figures 3 and 5).CART-derived maps exhibited greater pixelation, blurriness, and brightness inconsistencies compared to RF and GB maps (Figure 5).

Comparison of Predicted Canopy Height Map with Reference Airborne Laser Scanning-Based Canopy Height Models
When we compared the predicted canopy heights against the site-specific ALS-based CHMs, we found a better fit with coniferous (best model: a RMSE of 29.16% with an Rsquared of 0.45; Figure 6) than for broadleaf forest sites (best model: a RMSE of 20.94% with an R-squared of 0.14) (Figure 7).We observed that the RF and GB models outperformed the CART models in the coniferous forest (see Figure 6).The Rh90 and AVG GEDI metrics enabled us to derive more accurate canopy heights in coniferous stands, as indicated by the lower RMSE values, ranging between 29.16% and 37.42% for Rh90 and 30.15% and 38.69% for AVG GEDI, respectively.The results revealed contrasting patterns in variable importance across the models and forest types.The most influential predictors in RF models developed for predicting canopy height in the broadleaf site were topographic data (slope, aspect, and DEM-digital elevation model) and Sentinel-2 band B1 (Figure S1).In contrast, RF models trained in the coniferous site relied more heavily on a combination of Sentinel-1, Sentinel-2, and topographic data (VH, band B11, aspect, and slope) (Figure S2).Conversely, the most important variable in training GB models for both forest sites was the B1 Sentinel-2 band.
CART models exhibited the most significant disparity in key predictors between forest types; Sentinel-2 bands (B1, B9) and topographic data (slope, DEM) held the most weight in broadleaf stands (Figure S1), whereas, in the coniferous site, Sentinel-1/2 bands (i.e., B1, B11, VH, and VV) and slope/aspect emerged as the most influential predictors of canopy heights (Figure S2).

Comparison of Predicted Canopy Height Map with Reference Airborne Laser Scanning-Based Canopy Height Models
When we compared the predicted canopy heights against the site-specific ALS-based CHMs, we found a better fit with coniferous (best model: a RMSE of 29.16% with an Rsquared of 0.45; Figure 6) than for broadleaf forest sites (best model: a RMSE of 20.94% with an R-squared of 0.14) (Figure 7).We observed that the RF and GB models outperformed the CART models in the coniferous forest (see Figure 6).The Rh90 and AVG GEDI metrics enabled us to derive more accurate canopy heights in coniferous stands, as indicated by the lower RMSE values, ranging between 29.16% and 37.42% for Rh90 and 30.15% and 38.69% for AVG GEDI, respectively.
Our findings highlighted that the RF and GB models outperformed the CART models when predicting canopy height in the broadleaf site (Figure 7).Leveraging the AVG GEDI metric allowed us to obtain more accurate canopy heights in broadleaf stands, as evidenced by the low RMSE values ranging between 20.94% and 26.54%.Our findings highlighted that the RF and GB models outperformed the CART models when predicting canopy height in the broadleaf site (Figure 7).Leveraging the AVG GEDI metric allowed us to obtain more accurate canopy heights in broadleaf stands, as evidenced by the low RMSE values ranging between 20.94% and 26.54%.

Comparison of Predicted Canopy Height Maps with Other Existing Global Ecosystem Dynamics Investigation (GEDI)-Derived Canopy Height Maps
Our canopy height predictions were a better fit with the ALS-based CHM in both forest sites than those obtained using the CH-Potapov2019 and CH-Lang2020 maps (see Figures 8 and 9).Contrasting findings emerged for the two available maps across forest sites.For instance, when using canopy heights obtained from the coniferous site, the CH-Potapov2019 demonstrated a more accurate estimation (RMSE = 49.68%; Figure 8), compared to the CH-Lang2020 (RMSE = 75.67%; Figure 8).In contrast, somewhat similar results were obtained for broadleaf trees using the CH-Lang2020 map (RMSE = 22.54%; Figure 8) and the CH-Potapov2019 map (RMSE = 22.91%; Figure 8).In the coniferous site,

Comparison of Predicted Canopy Height Maps with Other Existing Global Ecosystem Dynamics Investigation (GEDI)-Derived Canopy Height Maps
Our canopy height predictions were a better fit with the ALS-based CHM in both forest sites than those obtained using the CH-Potapov2019 and CH-Lang2020 maps (see Figures 8 and 9).Contrasting findings emerged for the two available maps across forest sites.For instance, when using canopy heights obtained from the coniferous site, the CH-Potapov2019 demonstrated a more accurate estimation (RMSE = 49.68%; Figure 8), compared to the CH-Lang2020 (RMSE = 75.67%; Figure 8).In contrast, somewhat similar results were obtained for broadleaf trees using the CH-Lang2020 map (RMSE = 22.54%; Figure 8) and the CH-Potapov2019 map (RMSE = 22.91%; Figure 8).In the coniferous site, both existing maps have a low explanatory power for variability, with R-squared values below 0.19.When using canopy heights obtained from the broadleaf site, somewhat similar results were obtained for broadleaf trees using the CH-Lang2020 map (RMSE = 22.54%; Figure 9) and the CH-Potapov2019 map (RMSE = 22.91%; Figure 9).Similarly to the results for the coniferous site, the broadleaf site also showed low explanatory power for the existing maps, with R-squared values falling below 0.11.

Discussion
In this study we introduce a novel approach for generating localised high-resolution canopy height maps (10 m) using widely available satellite data and machine learning in GEE.GEDI metrics, considered to represent the top-of-canopy height (Rh90, Rh95, Rh98, Rh100, and AVG), were downscaled and spatialized for two Mediterranean forest sites, using Sentinel-1, Sentinel-2, and topographical data.Three different ML models were tested (RF, GB, and CART) in structurally and demographically contrasting study sites, and the resulting maps were compared with ALS-based CHMs for validation.We further tested the robustness of the predicted canopy height maps by comparing their performance with that of the global GEDI-derived canopy height maps: CH-Potapov2019 [27] and CH-Lang2020 [28].When using canopy heights obtained from the broadleaf site, somewhat similar results were obtained for broadleaf trees using the CH-Lang2020 map (RMSE = 22.54%; Figure 9) and the CH-Potapov2019 map (RMSE = 22.91%; Figure 9).Similarly to the results for the coniferous site, the broadleaf site also showed low explanatory power for the existing maps, with R-squared values falling below 0.11.

Discussion
In this study we introduce a novel approach for generating localised high-resolution canopy height maps (10 m) using widely available satellite data and machine learning in GEE.GEDI metrics, considered to represent the top-of-canopy height (Rh90, Rh95, Rh98, Rh100, and AVG), were downscaled and spatialized for two Mediterranean forest sites, using Sentinel-1, Sentinel-2, and topographical data.Three different ML models were tested (RF, GB, and CART) in structurally and demographically contrasting study sites, and the resulting maps were compared with ALS-based CHMs for validation.We further tested the robustness of the predicted canopy height maps by comparing their performance with that of the global GEDI-derived canopy height maps: CH-Potapov2019 [27] and CH-Lang2020 [28].

Canopy Heights Map Prediction
Our findings revealed that Rh98, Rh100, and AVG were the most accurate GEDI metrics for predicting real canopy heights derived from ALS-based CHM in both Mediterranean forest sites.However, when comparing the top-performing GEDI metrics (AVG, Rh98, and Rh100) among the ML models, both RF and GB models provided similarly accurate predictions in both Mediterranean forest types (see RMSE % in Figure 3), with only marginal improvements observed in coniferous forest sites.
In line with previous studies, theRh95 metric was a good predictor of realistic canopy heights [32,72,73].The AVG metric (the average of Rh75, Rh90, Rh95, and Rh100) emerged as a promising GEDI metric for predicting real canopy heights.Rh95-Rh100 and Rh90 tended to either overestimate or underestimate GEDI canopy height predictions.Accordingly, the Rh75 metric (representing 75% of returned energy between the top of the canopy and the ground surface) was crucial in regulating the overestimations of GEDI metrics higher than Rh90.Despite these considerations, several other factors can influence the effectiveness of Rh95 and AVG.These include the quality of laser pulses, forest canopy structure, topography, and the time difference between GEDI and Sentinel data acquisition.Our study aimed to mitigate the impact of these factors by standardizing the acquisition time of Copernicus data and using power laser shots [32,35].
The quality of raw GEDI data quality is a significant source of uncertainty when estimating canopy height.It can be influenced by factors such as tree density, slope, the type of laser shots used, and canopy heterogeneity [74].This is reflected in the lower map accuracy of the broadleaf forest site, characterized by its heterogenous structure, compared to the higher map accuracy of the homogenous coniferous forest site.Furthermore, even the 10 m pixel resolution of the final downscaled maps is much larger than individual tree crowns, leading to a loss of canopy height data in heterogeneous forest types.Thus, predicting canopy height from GEDI data might be more suitable for forestry purposes, because forestry stands are typically homogeneous and evenly aged.Therefore, it is essential to carefully consider the forest stand structure of the test area when using GEDI data for canopy height prediction, in addition to other secondary factors.
In all cases, the RF and GB models outperformed the CART models (Figure 3).As expected, the RF algorithm performed well in both regression and classification approaches, while GB emerged as one of the most versatile algorithms for regression analysis [60,61].The superior performance of RF and GB models can be attributed to their capacity to regulate overfitting by employing decision tree techniques [61].In this regard, the decision tree learning approach increased the efficiency of processing large datasets, as it employs a boosting technique to incorporate random sampling with replacement across weighted data [60].On the contrary, the reduced predictive potential of the CART models for predicting GEDI metrics can be attributed to their sensitivity to outliers and the influence of numerous observations on the decision tree analysis [75].
Notably, the RF and GB models were more computationally intensive than the CART models, which are known for their lower computational requirements [75].Moreover, our findings aligned with or exceeded those reported in other studies (Figure 3).For instance, Lang et al. [28] reported an RMSE value equal to 6 m (RMSE = 13%) for global forests and Mediterranean forest class; Potapov et al. [27] reached an accuracy equal to 6.6 m for RMSE and 0.62 for R-squared for global forests.Matasci et al. [36] reported accurate values equal to 2.72 m for RMSE and 0.5 for R-squared for the Canadian boreal zone, and Schwartz et al. [73] indicated a map accuracy equal to 2.98 m for RMSE and 0.73 for R-squared for French coniferous forests.

Comparison of Predicted Canopy Height Maps with Reference Airborne Laser Scanning (ALS)-Based Canopy Height Model Results
The best-performing RF and GB models achieved an R-squared accuracy of 0.46 in the coniferous site and 0.14 in the broadleaf site, when compared to the 'realistic' canopy height derived from the ALS-based CHMs.ALS-based CHMs were selected for this investigation as a proxy for true canopy height because ALS LiDAR is highly correlated with ground truth canopy height [42].
Nevertheless, several hindering factors may have influenced the results obtained for the coniferous and broadleaf study sites.For example, Dubayah et al. [14] note that forests covered by "power beam" GEDI data types are twice as precise as "coverage beams".This is because power beams can penetrate dense canopies more effectively, thus reducing saturation at high tree densities [14,76,77].Consequently, if the "power beam" GEDI data are unavailable, this can lead to greater uncertainty in the GEDI Rh metrics for broadleaf stands [47].Moreover, different species have different canopy architectures: in general, coniferous canopies grow straight and symmetrically in a conical shape, whereas broadleaf canopies frequently display plagiotropic development [78][79][80].Although our models used both full power and coverage beam data, we recommend that future studies consider the beam-type data when investigating broadleaf forests.Topographical features are another secondary factor influencing GEDI Rh measurements [81].However, we integrated topographic ancillary data from 2010 (GMTE-2010) in the predictor's dataset.Nevertheless, recent land changes may have not been accounted for, potentially affecting the map accuracy in disturbed zones.Further research using recently acquired ALS data in broadleaved forests could help to detect occlusion in this forest type and evaluate the robustness of canopy height maps.Studies indicate that seasonal differences in leaf cover can significantly impact the consistency of canopy height measurements, particularly for trees with thick, crooked, and spreading branches [8,10,49].Natural processes such as growing, regeneration, fragmentation, and disturbance can provoke variable canopy heights and structure over time, particularly through forest gaps [82,83], which affects the accuracy of forest maps [41,52].In our studied forests, human-induced disturbance likely affected only coniferous forests, as broadleaf sites were managed for conservation purposes.

Comparison of Predicted Canopy Height Maps with Other Existing Canopy Height Maps
Our findings demonstrate that locally calibrated canopy height maps are a better fit to the reference ALS-based CHMs, when compared with the global canopy height maps from CH-Potapov2019 and CH-Lang2020.This is expected, since global maps predict canopy height for a wide range of forested biomes.Global maps must deal with a higher degree of variation in satellite data availability, terrain, and canopy structure, in addition to greater uncertainty in raw GEDI data [84].Consequently, both CH-Potapov2019 and CH-Lang2020 use machine learning techniques.The existing maps use convolutional neural networks (CNNs) and bagged regression trees methods to overcome the issues associated with predicting canopy height at a global scale, which enable contextual learning comparable with the capability of the RF, GB, and CART algorithms.However, it is important to highlight that enhanced model complexity comes with greater computational and programmatic demands.For example, GEE does not offer a freely available package for deep learning with CNNs and, despite the GEE cloud computing infrastructure, model construction is not possible on a global scale.This requires users to have both the skill and hardware to construct a model outside the GEE ecosystem, when for many forest-based research questions this is neither practical nor feasible.This challenge, however, can be overcome through collaboration with qualified coding experts, potentially facilitated by Internation projects.Notably, we found that estimated canopy heights using the simpler RF and GB models were roughly twice as accurate as CH-Potapov2019 and CH-Lang2020 for the coniferous forest site (see Figures 8 and 9).This highlights the value of using simple but effective machine learning algorithms at smaller spatial scales.The lower computational burdens of RF, GBs and CART also allow for the inclusion of additional meaningful predictors, i.e., Sentinel-1 and topographical predictors, compared to only Sentinel-2 as in CH-Lang2020 or Landsat in CH-Potapov2019.Our proposed approach differs from CH-Lang2020's in several aspects, including the following: (1) input data predictors (S2 vs. S1, S2, and topographical data); (2) input data target (country-level GEDI shoots vs. local-level GEDI shoots); and (3) forest mask (no-forest mask vs. global forest/non-forest mask (2017-2020) [44], (4) modelling (deep learning vs. RF/GB or CART), and (5) validation (country-level vs. local-level).
The results reveal that the CH-lang2020 map better fits the ALS-based CHM than CH-Potapov2019, with the best R-squared achieved in the coniferous forest sites.Accordingly, the comparison of canopy height maps from CH-Potapov 2019 and CH-Lang2020 to ALS-based CHMs can be sensitive to noise, lack, and registration errors [82].In addition to most hindering factors affecting the mapping (Section 4.2), secondary factors, such as the following, may have influenced the comparison pixel-by-pixel approach: (1) ALS point density (60 vs. 300 points/m 2 ); (2) number of GEDI shots (acquired from late 2018 to 2019 vs. 2020); (3) mismatch between GEDI shot dates (2019 vs. 2020) and ALS acquisition date (2016 vs. 2021); (4) forest management plan (anthropic forestry interventions >50 past years vs. 2022); (5) tree species composition (mixed-species vs. unique); (6).forest structural complexity (multi-layered and mono-layered; and (7) canopy conditions.Among the previously listed factors, we believe that the number of GEDI shots considered for constructing those maps was lower than what has been collected up to now, considering that NASA's GEDI mission started collecting data in late 2018.Our forest sites data revealed a time discrepancy between GEDI (acquisitions from 2018 to 2023) and ALS (acquisitions from 2016 to 2021).This gap ranges from 2 to 7 years for broadleaf species and from 2 to 3 years for coniferous species.This difference in acquisition times could significantly impact the agreement between our developed map and CHMs due to potential changes in vegetation structure and species composition [85], particularly for broadleaf trees.

Conclusions
This study presents a novel and robust methodology for producing high-resolution canopy height maps at a 10m resolution by integrating GEDI relative height (Rh) metrics with Copernicus and topographical data using machine learning (ML) algorithms.Seven key conclusions emerge from our research.
Firstly, our approach combines GEDI data with Sentinel-1, Sentinel-2, and topographical variables to generate canopy height maps employing random forest (RF), gradient boosting (GB), and classification and regression trees (CART) algorithms.Secondly, RF and GB models consistently outperformed CART models in predicting canopy heights across various GEDI metrics and forest types.Thirdly, the influential predictors for canopy height prediction varied by forest type, with the B1 band from Sentinel-2 and topographical variables being more significant in broadleaf forests, while a combination of Sentinel-1, Sentinel-2, and topographical predictors played a vital role in coniferous forests.Fourthly, the choice of GEDI metric can enhance prediction accuracy, with Rh90 and AVG metrics yielding slightly better results, particularly for coniferous trees.Fifthly, the generated maps exhibited higher accuracy compared to existing ones.Sixthly, our proposed method offers advantages over existing approaches by locally calibrating GEDI footprints, ensuring better fit for predicting canopy heights, and identifying patterns resembling canopy height models (CHMs) derived from airborne laser scanning (ALS).Seventhly, our methodology holds global applicability, and a web-based application leveraging cloud computing platforms could facilitate its widespread adoption.
Additionally, while our approach was validated in two structurally contrasting Mediterranean forest types using a pixel-by-pixel approach, further comprehensive studies focused on similar forest types are warranted to ascertain its potential, particularly in complex broadleaved forests.Nonetheless, our findings hold promise for applications in forest management and environmental monitoring.

Figure 1 .
Figure 1.Study site locations and diametric class distribution of trees.(a) Locations visualised using a single RBG Sentinel − 2 image.(b) The diametric class distribution of trees in Pennataro (broadleaf stand).(c) The diametric class distribution of trees in Lago di Occhito (coniferous stand).The diametric class distribution of trees in Pennataro is positively skewed, indicating higher structural heterogeneity, compared to Lago di Occhito, which follows a normal distribution, indicating structural homogeneity.

Figure 1 .
Figure 1.Study site locations and diametric class distribution of trees.(a) Locations visualised using a single RBG Sentinel − 2 image.(b) The diametric class distribution of trees in Pennataro (broadleaf stand).(c) The diametric class distribution of trees in Lago di Occhito (coniferous stand).The diametric class distribution of trees in Pennataro is positively skewed, indicating higher structural heterogeneity, compared to Lago di Occhito, which follows a normal distribution, indicating structural homogeneity.

Figure 2 .
Figure 2. Canopy height map workflow.Random forests , gradient tree boost , and classification and regression trees were used to generate canopy height maps.The resulting canopy height map was validated using canopy height models from airborne laser scanning (ALS-based CHM) data through root mean square error (RMSE) and coefficient determination (R-squared).

Figure 2 .
Figure 2. Canopy height map workflow.Random forests, gradient tree boost, and classification and regression trees were used to generate canopy height maps.The resulting canopy height map was validated using canopy height models from airborne laser scanning (ALS-based CHM) data through root mean square error (RMSE) and coefficient determination (R-squared).

Figure 4 .
Figure 4. Predicted canopy height maps for each GEDI Rh metric and ML algorithm for the coniferous forest site.A subsection has been enlarged for interpretability.

Figure 4 .
Figure 4. Predicted canopy height maps for each GEDI Rh metric and ML algorithm for the coniferous forest site.A subsection has been enlarged for interpretability.

Figure 5 .
Figure 5.The predicted canopy height maps for all GEDI relative height metrics and ML algorithms for the broadleaf forest site.

Figure 5 .
Figure 5.The predicted canopy height maps for all GEDI relative height metrics and ML algorithms for the broadleaf forest site.

24 Figure 6 .
Figure 6.Scatter plots comparing predicted and reference canopy heights in the coniferous site.Five GEDI metrics (Rh90, Rh95, Rh98, Rh100, and the average of Rh75, Rh90, Rh95, and Rh100-AVG) were processed through RF (Random Forests), GB (Gradient Tree Boost), and CART (Classification and Regression Trees) algorithms.Root mean square error (RMSE in meter and percentage) and Rsquared values are presented.

Figure 6 .
Figure 6.Scatter plots comparing predicted and reference canopy heights in the coniferous site.Five GEDI metrics (Rh90, Rh95, Rh98, Rh100, and the average of Rh75, Rh90, Rh95, and Rh100-AVG) were processed through RF (Random Forests), GB (Gradient Tree Boost), and CART (Classification and Regression Trees) algorithms.Root mean square error (RMSE in meter and percentage) and R-squared values are presented.

Figure 7 .
Figure 7. Scatter plots comparing predicted and reference canopy heights in the broadleaf site.Five GEDI metrics (Rh90, Rh95, Rh98, Rh100, and the average of Rh75, Rh90, Rh95, and Rh100-AVG) were processed through RF (Random Forests), GB (Gradient Tree Boost), and CART (Classification and Regression Trees) algorithms.Root mean square error (RMSE in meter and percentage) and Rsquared values are presented.

Figure 7 .
Figure 7. Scatter plots comparing predicted and reference canopy heights in the broadleaf site.Five GEDI metrics (Rh90, Rh95, Rh98, Rh100, and the average of Rh75, Rh90, Rh95, and Rh100-AVG) were processed through RF (Random Forests), GB (Gradient Tree Boost), and CART (Classification and Regression Trees) algorithms.Root mean square error (RMSE in meter and percentage) and R-squared values are presented.
Remote Sens. 2024, 16, x FOR PEER REVIEW 15 of 24 both existing maps have a low explanatory power for variability, with R-squared values below 0.19.

Figure 8 .
Figure 8. Scatter plot and horizontal pixel frequency distribution comparing predicted canopy heights with reference data in a coniferous forest site.Canopy heights from ALS, RF_Rh90, CH-Potapov2019, and CH-Lang2020 were utilized.Root mean square error (RMSE in m and %) and Rsquared values are presented.

Figure 8 .
Figure 8. Scatter plot and horizontal pixel frequency distribution comparing predicted canopy heights with reference data in a coniferous forest site.Canopy heights from ALS, RF_Rh90, CH-Potapov2019, and CH-Lang2020 were utilized.Root mean square error (RMSE in m and %) and R-squared values are presented.

Figure 9 .
Figure 9. Scatter plot and horizontal pixel frequency distribution comparing predicted canopy heights with reference data in a broadleaf forest site.Canopy heights from ALS, RF_Rh90, CH-Potapov2019, and CH-Lang2020 were utilized.Root mean square error (RMSE in m and %) and Rsquared values are presented.

Figure 9 .
Figure 9. Scatter plot and horizontal pixel frequency distribution comparing predicted canopy heights with reference data in a broadleaf forest site.Canopy heights from ALS, RF_Rh90, CH-Potapov2019, and CH-Lang2020 were utilized.Root mean square error (RMSE in m and %) and R-squared values are presented.

Table 2 .
Machine learning algorithms and their parameter settings to predict GEDI canopy height Rh metrics, based on Sentinel-1, Sentinel-2, and topographical data.Random Forests (RF), Gradient Tree Boost (GB), and Classification and Regression Trees (CART) are configurated.