1. Introduction
Accurate estimates of above-ground biomass density (AGBD) are essential for understanding carbon dynamics, supporting ecosystem management, and addressing global climate change. Mapping AGBD at global scale enables researchers and policymakers to monitor forest carbon stocks, assess the impacts of land-use change, and track progress toward climate mitigation goals [
1,
2]. Large-scale biomass mapping also aids in biodiversity conservation [
3] by identifying habitats in critical situation and in risk of deforestation [
4]. Furthermore, these estimates provide valuable data for improving climate models and informing sustainable land management practices, making them a crucial tool in global environmental monitoring efforts [
5,
6]. In recent years, the field of mapping AGBD or above ground carbon stocks from remote sensing has received an ever-increasing attention [
7,
8]. Approaches range from combining remote sensing imagery with in-situ measurements of trees—translated into AGB via allometric equations—[
9,
10], to the use of Airborne Laser Scanning (ALS) data [
11,
12,
13]. Until recently, attempts to map AGBD globally have relied on the use of globally relevant datasets of in-situ tree measurements and ALS flights to calibrate models which produce maps covering the entire planet. This is the case of a few datasets that have become critical benchmarks to compare to [
14,
15,
16]. However, these global models usually lack the spatial resolution to resolve fine-grain patterns at the local level and their accuracy can be constrained by ecological diversity and insufficient representation of regional variability [
17,
18].
A critical turning point on the advancement of the state-of-the-art on AGBD mapping has been the deployment of the Global Ecosystem Dynamics Investigation (GEDI) mission aboard the International Space Station. GEDI mission provides 25 m footprint-level AGBD data obtained from well-calibrated allometric equations on the height measurements obtained through its LiDAR system [
19,
20]. Common approaches combine GEDI data with either one or more of the following remote sensing technologies: Synthetic Aperture Radar (SAR) (such as Sentinel-1) [
21], which captures vegetation structural information; multi-spectral sensors (such as Sentinel-2) [
22], which provide insights into vegetation composition and health; and topographic or environmental data as auxiliary variables [
23]. This combination of data from different sources has facilitated the development of machine learning and deep learning models that provide highly detailed biomass maps, often outperforming traditional approaches based on empirical relationships or physical models [
21,
22,
23,
24,
25,
26]. Despite the potential of these approaches, which often achieve superior accuracy by tailoring inputs and parameters to local ecosystem characteristics, their scalability to the global scale has seldom been explored [
27,
28]. It is therefore critical to explore strategies that combine the accuracy of regional models with the scalability and consistency of global approaches, leveraging publicly available, high-resolution datasets with near-global coverage such as Sentinel-1, Sentinel-2, and GEDI.
This study aims to address this challenge by applying a modified UNet architecture to predict AGBD across forests located in four different biomes—Mediterranean, taiga (boreal forests), tropical rainforests, and semi-arid savannas. The ranges of AGBD values, the species present in each ecosystem, the varying vegetation densities and the availability of data in these biomes were the main drivers for the selection of the areas of study. using harmonized satellite data for training and prediction, and validates with inventory datasets in each biome. We evaluate the model’s performance and compare it to a state-of-the-art, open-source, widely used global biomass dataset, the European Space Agency’s Climate Change Initiative (ESA CCI) AGB product [
14]. We seek to determine whether regionally tailored models can achieve the expected superior accuracy while maintaining global applicability. We then compare the performance of our regional models against a single model trained on data from all four biomes simultaneously—an approach that more closely resembles global modeling while preserving the same methodology and resolution. By integrating an explainable artificial intelligence (XAI) approach we assess the contribution of individual variables, offering insights into the drivers of biomass prediction in each biome and the potential for improving large-scale AGBD estimation. We expect multi-spectral and C-band frequency SAR data to be useful in lower AGBD biomes, since saturation is not as much of an issue as in higher AGBD areas like tropical rainforests. Moreover, L-band SAR is expected to bring meaningful contributions in biomes with higher vegetation density, as well as infrared bands from multi-spectral data showing canopy moisture, which can be a proxy for vegetation density.
2. Materials and Methods
2.1. Study Areas
This study focuses on four distinct regions to evaluate model performance across diverse ecological contexts. The diversity in species, AGBD ranges and data availability for model validation were the main drivers for the selection of the 4 regions.
Figure 1, shows the distribution of plot locations across biomes and sub-biomes from [
29]. Each biome is characterized by distinct climate conditions.
The study encompasses forests distributed across four distinct regions, each characterized by unique climatic conditions and vegetation types. In the boreal forests of Quebec, Canada, mean annual temperatures range from 0 °C to −2.5 °C, with annual precipitation between 700 mm and 1100 mm. These cold environments support coniferous-dominated forests adapted to long, harsh winters. In contrast, the Mediterranean forests and shrublands of Catalonia, northeastern Spain, experience a typical Mediterranean to subMediterranean climate with hot, dry summers and mild, humid winters. Here, average annual temperatures range from 10 °C to 17 °C, and precipitation varies between 350 mm and 1000 mm, fostering sclerophyllous vegetation adapted to seasonal drought. The tropical rainforests of the Brazilian Amazon present a stark contrast, with consistently warm and humid conditions, where average temperatures range from 25 °C to 28 °C and annual rainfall is abundant, between 2000 mm and 3000 mm, supporting a dense, highly diverse forest structure. Finally, the tropical shrublands and woodlands of Burkina Faso and Niger are characterized by high temperatures, averaging between 28 °C and 32 °C annually, and low, variable precipitation ranging from 300 mm to 600 mm per year. These arid conditions result in sparse tree cover and scattered drought-adapted shrubs, constituting the typical savanna landscape.
2.2. Satellite Data
The training data to build AGBD models were derived from the GEDI Level 4A (L4A) AGBD dataset, which provides, globally distributed, LiDAR-derived biomass estimates. This dataset was downloaded and subjected to extensive preprocessing and filtering to ensure retention of the highest quality data points and alignment with the temporal coverage of the input data. Data was downloaded spanning the years 2019 through 2021 to ensure sufficient spatial and temporal coverage. GEDI L4A files containing full orbital paths that intersected each area of interest were downloaded, clipped, and subsetted to extract the relevant data variables. Data points were retained only if the
l2_quality_flag,
l4_quality_flag, and
algorithm_run_flag were all set to 1, indicating successful processing and reliable output. Measurements collected during leaf-off conditions, as identified by a
leaf_off_flag value of 1, were excluded due to their potential to impact biomass estimates. To reduce noise from sunlight, only observations with a
solar_elevation below 0 were included. Additionally, data points were filtered to ensure a
sensitivity value of at least 0.95, guaranteeing high confidence in the lidar-derived biomass estimates [
20]. Lastly, an upper limit of 800 Mg ha
−1 was applied to the AGBD values to exclude outliers and maintain a realistic range of biomass predictions, based on reviews on global distribution of AGBD values across biomes [
30,
31]. This upper limit only excluded around 0.01% and 0.1% of the data points in the Mediterranean and Tropical biomes respectively, while no points were reaching such high values for the other two biomes. Around 90% of the total GEDI data points were discarded as they did not satisfy any of the previous conditions. The GEDI L4A dataset serves as the primary reference for model training, providing a high-quality and globally consistent baseline for biomass predictions.
The satellite data used as potential AGBD predictors for this study consists of: (i) Sentinel-1 Synthetic Aperture Radar (SAR) Radiometrically Terrain Corrected (RTC) with dual-band cross-polarization data, (ii) Sentinel-2 multi-spectral Level-2A imagery, and (iii) the Copernicus 30-meter Digital Elevation Model (DEM), all of which were accessed and processed through Microsoft’s Planetary Computer platform. These datasets provide high-resolution, globally consistent inputs for biomass modeling. For Sentinel-1, we first separated the images between ascending and descending orbits, to account for the different information provided by opposing viewing angles from the satellite. Each separate image was acquired and processed individually, filtering out those values lower than −30 dB, and applying a Lee filter, with a window size of 3 pixels, for speckle reduction. Then, a temporal average was calculated from all images taken during the growing season for each site. For Sentinel-2, we selected all images with a cloud cover lower than 30%, and we then applied the Scene Classification Layer (SCL) produced by Sen2Cor L2A processor [
32,
33], to filter out those pixels classified as clouds, cloud shadows, snow or ice, saturated or defective pixels, and topographic casted shadows, ensuring cleaner data for biomass analysis. Additionally, we corrected for the added offset in the processing baseline changes of January 2022 and computed the yearly median to reduce residual cloud cover, retaining the most representative image for each year. In addition to these primary datasets, PALSAR (Phased Array type L-band Synthetic Aperture Radar) data were incorporated for the tropical rainforests and the semi-arid savannas of Burkina Faso and Niger, where the data was openly available, as opposed to the Northern Hemisphere locations. The inclusion of PALSAR data allowed us to assess the added value of longer-wavelength SAR data, and its ability to penetrate deeper into the canopy, which can enhance biomass predictions in specific ecosystems. These PALSAR datasets were obtained and processed using Google Earth Engine, where, as with Sentinel-1, the temporal average was calculated from all images taken during the growing season for each site. Together, these datasets provided a comprehensive and robust foundation for the biomass prediction model, leveraging multi-source remote sensing data to account for various environmental and temporal factors. To ensure uniformity in spatial resolution, all satellite-derived variables were resampled to a common grid at 10-m resolution.
2.3. Forest Inventory Validation Data
We used different sources of field data to validate our results. For boreal and Mediterranean forests we used data from the Quebec permanent National forest inventory (NFI) [
34] and the 4th Spanish National Forest Inventory [
35], respectively. Forest inventory datasets from non-governmental surveys were used for the Brazilian Amazonian rainforests [
36] and the semi-arid regions of Burkina Faso and Niger [
37].
To align with our objective of developing region-specific models applicable to digitized, continuous forest inventories and similar applications, we selected study areas accordingly. Catalonia, as a whole, was chosen as a representative political region where AGBD mapping could support carbon stock monitoring. In contrast, Quebec, which spans a much larger area and includes both boreal and semi-boreal forests, required a more targeted selection. To maintain a comparable spatial extent and ensure consistency across regions, we restricted the Quebec dataset to plots within the boreal forest biome, resulting in an area similar in size to Catalonia. Regarding Burkina Faso and Brazil, since the availability of the data was scattered across the biome and the distances were large between the plots, only buffers of each area were taken, resulting in scattered data collection (5 areas in Brazil and 2 in Burkina Faso and Niger).
The characteristics and sizes of forest inventory plots also varied across the study regions, as seen in
Figure 1. In Brazil, plot sizes differed by site, with most plots measuring 50 × 50 m, while others extended to 500 × 20 m [
36]. In contrast, data collection in Burkina Faso and Niger utilized circular plots with a radius of 20 m (see [
37] for a detailed description of the plot design). The Spanish National Forest Inventory consists of circular permanent plots distributed along a 1 × 1 km grid. Each plot actually consists on four concentric circular plots where trees are measured depending on their size, with a maximum plot radius of 25 m. For Quebec, data were collected within circular areas encompassing 200 square meters. From all datasets, observations collected from 2015 onward were selected, ensuring temporal consistency between the validation data and the satellite imagery used to generate biomass predictions.
To ensure robust and consistent validation across the four distinct ecosystems, all datasets were subjected to a harmonization process. This process was undertaken to align metrics and data structures across biomes, enabling accurate comparisons and validation outcomes. Measurements from individual trees were converted into estimates of aboveground biomass using the most site-specific allometric equations available, rather than using a general allometric equation, which can result in biased and unaccurate estimations for some biomes [
38]. For Mediterranean forests, we used biomass estimates provided by the Spanish National Forest Inventory, based on province- and species-specific allometric equations [
35], while in Burkina Faso and Niger, site-specific equations were applied as in Perpinyà-Vallès et al., (2024) [
37]. For boreal forests, since no specific allometric equations were available from the dataset producers, we used the R package
allodb [
39], which is specifically designed for extra-tropical tree allometries, to select the most appropriate allometric equation based on species and geographic coordinates. The application of extra-tropical allometric equations to boreal forests is supported by their empirical calibration using datasets from temperate and boreal regions, ensuring their applicability to the specific structural and ecological characteristics of boreal tree species. Moreover, boreal forests share similar growth constraints with other extra-tropical ecosystems, such as temperature limitations and slow biomass accumulation rates, further justifying this approach. Finally, well-known tropical allometric equations from Chave et al. (2014) [
38] were used for tropical rainforests of Brazil. To harmonize the different sampling desings, individual AGB estimates for all trees within a plot were converted into aboveground biomass density values (Mg ha
−1) using the total number of measured trees and the specific plot area.
The range of AGBD values in the inventories were substantially different in the four biomes, ranging up to around 350 Mg ha
−1 in Catalonia, 90 Mg ha
−1 in Burkina Faso and Niger, over 700 Mg ha
−1 in Brazil and over 300 Mg ha
−1 in Quebec. Based on previous studies, we discarded plots with AGBD field estimates over 500 Mg ha
−1 in Brazil [
40] and 200 Mg ha
−1 in Quebec [
41,
42] to avoid an excessive weight of outliers. These represented less than 10% and 5% of the plots for Brazil and Quebec respectively. The main goal of this filtering was to obtain a dataset that would be representative of the overall range of values without including many outliers that could come from human error upon measurement or anomalously large trees. This last point was the main driver of the outliers detected, with almost all the outliers having at least one tree with diameters larger than 80 cm. Indeed, allometric equations are typically calibrated for a limited range of tree diameters, and given their non-linear nature, they can lead to unrealistic AGB estimates when trees with very large diameters are considered [
43].
The total number of data points available for validation varied across biomes: boreal forests (308), Mediterranean forests (2532), tropical rainforests (117), and semi-arid savannas (113). These datasets provide a representative and geographically diverse foundation for evaluating model performance in predicting above-ground biomass density.
2.4. Model Structure & Training
The Deep Learning model used in this study is built on a U-Net architecture [
44], based on the work developed by Schwartz et al. (2023) [
45], designed for image-to-image regression tasks. The model incorporates residual connections, enhancing gradient flow, and dropout regularization and L2 weight decay to mitigate overfitting. It follows a fully convolutional encoder-decoder structure, where the contracting path consists of four convolutional blocks, each followed by a 2 × 2 max pooling operation, progressively increasing the number of filters from 64 to 1024. The bottleneck layer captures high-level feature representations before the expanding path symmetrically reconstructs spatial information using bilinear upsampling and skip connections. The final output layer applies a 1 × 1 convolution with a linear activation to produce a single-channel continuous-valued prediction. The model’s loss function is specifically tailored to the sparse nature of the training data, similar to the approach by Schwartz et al. [
45], where loss is computed only at pixels containing valid GEDI observations. A custom Root Mean Squared Error (RMSE) loss function, applied only to pixels with GEDI-derived AGBD values, ensures unbiased learning while disregarding areas with missing data. Additionally, land cover points from ESA WorldCover 2020 and 2021 [
46,
47], where AGBD is necessarily zero (e.g., urban areas, bare rock, ice/snow, water, and grassland), are randomly sampled and incorporated into the training dataset to help the model learn “hard zeroes”. The training process is further optimized using an Exponential Decay learning rate schedule, starting at 0.0005 and decreasing by a factor of 0.95 every 1000 steps, combined with the Adam optimizer for stable convergence. The model was trained for 50 epochs using a batch size of 16. To prevent overfitting, we employed an early stopping techinque, using the validation loss as the monitoring metric, with a patience of six epochs before restoring the weights of the epoch with best validation loss. Once trained, the model was applied to generate wall-to-wall biomass predictions at 10-m resolution across the study areas for all years where inventory validation data was available, ensuring spatially comprehensive estimates of AGBD.
In each study area, the model was trained using image patches of 256 × 256 pixels, ensuring that each patch contained at least five GEDI data points to avoid incorporating regions with insufficient information. Depending on the size of the study area, two different sampling strategies were applied. For larger areas, a total of 10,000 patches were randomly sampled across the entire extent, with each patch assigned an inverse weight based on the average GEDI-derived AGBD value to balance representation across biomass ranges. From these, a weighted random selection of 3000 patches was used for training. In smaller study areas where fewer patches were available, all patches meeting the GEDI density criterion were included in the training set, ensuring sufficient spatial coverage while maintaining data quality for model learning.
Additionaly, a single “global” model trained on patches from all 4 biomes was carried out. The data extracted for each biome to train their respective regional models were sampled to obtain the final dataset containing 8283 patches. Of those, 2500 were randomly selected from each of the larger areas (Quebec and Catalonia) and the remaining were filled with all the available patches from Brazil and Burkina Faso/Niger (approximately 1500 each), totaling 8283 patches. The input datasets had to be filtered for them to be available globally. This consisted in keeping only one of both Ascending and Descending passes from Sentinel-1 instead of using both when available, leaving the Ascending track for all biomes except the tropical rainforest, where only Descending was used instead as it was the only one available. PALSAR data was also not used, as it was not available over all the areas of study. The remaining datasets were therefore Sentinel-2, Sentinel-1 (Ascending only, Descending in the case of Brazil) and the DEM.
2.5. Variable Importance Analysis
To assess the contribution of each explanatory variable to the predictions of the model, saliency maps were used as an XAI technique [
48]. Saliency maps capture the sensitivity of the model output with respect to changes in the input features, allowing visualization and quantification of the importance of each spectral band in predicting AGBD.
For each input patch, the gradient of the model output with respect to each input band was computed using automatic differentiation. The absolute value of these gradients was then averaged across all spatial locations within the patch to obtain a measure of the importance of each band. This process was repeated in all samples, using all image patches from each biome for a single year. Since the gradients are not normalized, the resulting values reflect the actual magnitude of change in the model output in response to variations in each band. This means that differences in gradient values across biomes can provide insight not only into which bands are most important but also into the extent to which changes in a band influence the predicted AGBD in each biome. Higher gradient values indicate that small changes in a band result in larger changes in the model output, revealing bands with a stronger influence on the predictions.
The analysis was conducted separately for each biome to account for the heterogeneity in ecological and environmental conditions that influence biomass dynamics. This biome-specific evaluation provides insights into the relevance of different spectral bands across diverse ecosystems, highlighting potential regional variations and limitations when applying a globally trained model.
2.6. Validation, Benchmark and Global Comparison
The performance of the models was determined using three complementary approaches: (1) comparing predictions against the GEDI L4A data used during training, (2) using each field inventory plot to validate the closest prediction to its center, and (3) comparing the AGBD values in each field inventory plot to the mean, minimum and maximum value of all pixel predictions intersecting the area of the inventory plot, in order to account for regional variations in plot definitions and their location uncertainty. The latter validation method has been identified as one of the key issues to solve in future studies of AGBD mapping [
49].
The predictive performance of the model was quantified using four statistical metrics calculated using the UNet estimations as precited values and the forest inventory datasets as observed values: Pearson correlation coefficient (r), coefficient of determination (R
2), Mean Absolute Error (MAE) and Normalized Root Mean Square Error (nRMSE). The Pearson Correlation Coefficient (r) assesses the strength and direction of the linear relationship between predicted and observed values, with values closer to 1 indicating a strong positive correlation. The Coefficient of Determination (R
2) represents the proportion of variance in the observed data explained by the model, with higher values suggesting better predictive performance, values closer to 0 suggesting no predictability of the model. Negative values may occur when the model does not see the validation data, which is the case when we use independent forest inventories, indicating no predictive capability. MAE measures the average magnitude of errors between predicted and observed values, providing an indication of overall prediction accuracy. Finally, Normalized Root Mean Square Error (nRMSE) is a normalized version of RMSE that expresses the prediction error relative to the mean observed value, facilitating comparisons across different study regions. It is calculated as:
where
are the observed values,
are the predicted values, and
N is the total number of observations. This formulation ensures that errors are assessed in proportion to the scale of the observed biomass values, enabling robust performance evaluation across biomes.
By combining multiple validation approaches with these statistical metrics, we ensured a thorough and reliable assessment of the model’s ability to estimate above-ground biomass density across different ecological regions.
Additionally, to evaluate the performance of the adapted UNet model against an established global biomass product, we used the ESA CCI AGB dataset [
14] as a benchmark. This dataset offers global coverage of biomass estimates at a 100-m spatial resolution and provides data for multiple time points, making it the most comparable product to the approach presented in this study, despite its coarser resolution. The ESA CCI data were downloaded and processed to align with the temporal and spatial scales of our study regions. The comparison included both a qualitative and quantitative analysis. Continuous biomass maps from the ESA CCI dataset were compared to those generated by the adapted UNet model in representative areas of the four biomes. Furthermore, errors in the ESA CCI estimates were evaluated using the same inventory plots and validation metrics applied to the adapted UNet model. As the ESA CCI dataset has a coarser resolution than the UNet model estimations, the individual pixel containing the field inventory plot area was used for this comparison. This ensured a direct and consistent comparison of model performance. By assessing both spatial accuracy and quantitative agreement, this benchmark analysis provides insights into the relative strengths and limitations of the adapted UNet model in comparison to a widely used global dataset.
Moreover, we investigated the effects of resolution and methodology in a comparison between a regional and a global approach. Working under the assumption that regional mapping produces the best results for each biome—as it is capable of optimizing the extraction of information from the available sources depending on their characteristics, a comparison of the results of regional modeling with a single “global” model trained on patches from all 4 biomes was carried out. The trained global model was then used to predict AGBD for all biomes in the years in which in-situ data was available, and metrics were obtained and compared to those of the regional models. This comparison ensures a fair evaluation of regional versus global modeling approaches. Unlike the ESA CCI benchmark, which differs in methodology and resolution, this analysis directly compares models trained under consistent conditions, minimizing potential biases. The results are presented together in the benchmark comparison section.
4. Discussion
A key hypothesis in biomass estimation is that regional mapping, even when applied with the same methodology and datasets, performs better only in specific biomes, with global models providing better results in biomes that are well represented in their training data. However, our findings suggest that this assumption does not hold universally. Instead, regional models appear to provide benefits across various biomes, demonstrating their broader applicability and general improvement against, in this case, the benchmark model chosen, ESA CCI. Note that, as stated in
Section 3.3, both our regional and global models outperform ESA CCI in areas with homogeneous or null signals. This is potentially due to ESA CCI’s lower resolution, which limits its ability to capture high AGBD values in highly heterogeneous areas, such as the semiarid savannas of Burkina Faso and Niger, but also due to its signal saturation around 200–300 Mg ha
−1, as observed in tropical rainforests [
51]. This highlights that the capability of the UNet regional model to learn overall vegetation patterns from GEDI, even when using a general approach that combines data from the 4 biomes, brings an added value when mapping AGBD. Our results also suggest that localized mapping approaches may be beneficial regardless of ecosystem type, emphasizing their potential for widespread use. Beyond accuracy, regional models offer flexibility, as they can be tailored to specific conditions by incorporating additional local datasets. It is important to note that in some cases, such as Catalonia, our study does suggest that global approaches can provide results that are as good as a regional model, which shows potential for global models in some areas.
Generally, the proposed modified UNet model is able to capture spatial hierarchies and contextual information, which is essential for accurate pixel-wise predictions but also for spatial coherence in complex forest biomass mapping scenarios, as seen in
Figure 5. The UNet’s encoder-decoder structure facilitates the integration of multi-scale features, enhancing its ability to delineate intricate patterns within the biomass data. This method based on contextual information can lead, however, to smoothed outputs that are potentially less precise at the single-pixel scale but more so at the large-scale. This can be one of the reasons for the relatively low explanatory power (R
2 = 0.119–0.396) observed across the four study regions. However, this warrants a more nuanced interpretation. In this study, the R
2 values were derived from validations using completely independent ground-truth datasets, a methodology that often results in lower R
2 values compared to studies where models are both trained and tested on the same type of data [
52]. This approach provides a more stringent assessment of model performance, as it evaluates the model’s predictive capability on truly unseen data, thereby offering a realistic measure of its generalizability. Conversely, some studies employ airborne laser scanning (ALS) data for validation [
45], which involves comparing continuous maps to continuous maps rather than individual data points. This method typically yields higher R
2 values because it emphasizes overall spatial patterns, potentially overlooking discrepancies at finer scales. In our study, the focus was on the accuracy of individual pixels, a more granular approach that inherently leads to lower R
2 values but ensures a rigorous evaluation of model precision.
Results from the variable importance analysis do show that the input datasets used weigh differently from biome to biome, particularly when there is fewer variability within them. Until now, regional mapping approaches that have demonstrated superior quality compared to global models [
53] have been largely due to their ability to integrate high-resolution inputs, local field measurements, and tailored parameterizations [
54,
55,
56]. This approach offers high accuracy but lacks reproducibility and comparability, while our method brings new insights into the predictability of each biome separately and the replicability of the methodology. In comparison, global models have excelled in capturing broad spatial patterns, but their training data often underrepresent certain biomes, limiting their predictive accuracy in regions with distinct vegetation structures.
There are still key limitations to the regional approach, such as capturing canopy heterogeneity or the saturation of signal at high AGBD levels [
50]. Although the saturation can be attributed to constrains in the satellite sensor, our results indicate that the scarcity of high AGBD values in the GEDI dataset can result in insufficient training data for the model to learn and generalize to these higher biomass levels (
Figure 3). Additionally, GEDI data itself has large uncertainties which need to be taken into account [
57], such as geolocation, allometric equation model selection or high slope-driven errors. As seen in the validation of UNet predictions against GEDI data, our model is constrained by the input data points from LiDAR to provide estimates of AGBD at global scale. This implies that if only 5–10% of the data points are over a certain threshold (dependent on the biome), the model is less probable to estimate AGBD over that threshold, contributing to signal saturation at larger AGBD values. Another important aspect when looking at the input data to train models is its variability. One critical improvement for future implementations is shifting from politically defined regional models to biome-based models. Even though this study was an attempt to bridge the gap between regional and global applicability of AGBD models mainly taking into account biomes, since political regions have a high interest for mapping AGBD for carbon stock accounting, Catalonia was chosen as one of the regions even if there was a small overlap of two different biomes. In this case, both Mediterranean and temperate forests coexist, and a single model may not optimally capture biomass variability across distinct vegetation zones. A more effective strategy would involve biome-specific model partitions, where models are trained separately for each ecological region within the administrative region, and then combined using ensemble techniques or hierarchical modeling. This would allow for more precise AGBD estimations that would better reflect ecological rather than administrative boundaries.
One important aspect of the validation carried out in this study is the representativity of the 4 different biomes and their harmonization. The applicability of AGBD maps such as the one we produced depends as much on the thoroughness of its validation as it does on their accuracy. That is why this study uses in-situ measurements as ground truth, even if AGBD measurements themselves have large uncertainties coming from potential human errata, allometric equations and geolocation accuracy [
58]. Overall, the efforts taken in the validation step using not only the closest pixel but all the pixels falling in the potential area of coverage of the plot aligns with well-established forestry practices [
59]. It does imply however that the validation resolution of this model is therefore lower than its actual pixel size, and is dependent on the biome and its plot size. The information does come in more detail at 10-m pixels, providing insights into heterogeneity and land cover changes. Another aspect to improve our models and their effective resolution would be to account for potential geolocation errors of the GEDI footprints, which has been shown to greatly impact the predictions [
58].
Considering the advantages but also the limitations of current regional approaches, our study introduces a standardized methodology for biomass mapping at a global scale while accounting for biome-specific variations. By structuring our approach to consider these nuances, we provide a scalable solution that bridges the gap between regional accuracy and global applicability. A key challenge in global biomass estimation is the ecological diversity, with each vegetation type exhibiting distinct spectral signatures. This diversity, combined with different terrain conditions, affects the model’s ability to generalize and produce accurate AGBD estimates. As observed in the global model, this variability can affect performance in diverse landscapes, even when applying a regional approach, if the studied area contains a high diversity of vegetation types and terrain conditions. This challenge calls for adaptive strategies to improve the accuracy of the estimate. Additionally, our model framework allows for independent deployment in each region, enabling iterative improvements over time. This flexibility means that new datasets or missions, such as BIOMASS [
60], NISAR [
61], and additional GEDI data, can be seamlessly integrated, ensuring continuous model refinement. With investigation around GEDI data gaining traction and the mission continuing for the next few years, further improvement in GEDI data filtering is expected. This would further improve results, as GEDI constitutes the basis for this model and most large-scale applications for global AGBD mapping. Furthermore, innovative model architectures, such as multi-output models, can enhance AGBD estimation by simultaneously predicting related variables like maximum height and canopy cover. Since both of these metrics, derived from GEDI data, serve as proxies for AGBD, their inclusion can improve the model’s overall accuracy. This approach ensures that biomass estimation remains dynamic and capable of adapting to future advancements in remote sensing and ecological monitoring.
5. Conclusions
This study aims to bridge the gap between regional AGBD mapping, which often relies on localized, non-replicable data such as ALS or proprietary datasets, and global AGBD datasets, which capture biome diversity but struggle to account for biome-specific variations. By using only open-source datasets, this approach offers a cost-effective, scalable solution for mapping AGBD across Earth’s land masses, with potential improvements as new missions and datasets enhance accuracy in forest structure and vegetation density mapping. Building on extensive research in recent years, we developed a model capable of global AGBD mapping while preserving the specificity of regional modeling. Our results highlight the advantages of regional mapping, demonstrating superior accuracy compared to the ESA CCI benchmark dataset and outperforming a globally trained version of our model. This advantage likely stems from models being able to learn biome-specific patterns more effectively and from tailoring training datasets to each biome’s characteristics. Further exploration of open-source datasets could help reduce uncertainties and errors, enhancing future models.
The primary limitations of our study stem from the quality and sufficiency of open-source datasets in accurately mapping AGBD across its full range of values within each biome. One notable challenge is signal saturation, which could be mitigated by upcoming missions specifically targeting vegetation. Since GEDI data forms the foundation for global AGBD training, careful curation is essential to ensure the highest quality dataset. Improvements in filtering and balancing GEDI data are necessary to ensure that underrepresented values are mapped with equal accuracy. Our findings reinforce the advantages of regional over global AGBD mapping, demonstrating that biome-specific modeling allows for better utilization of relevant information and enables models to adjust variable weights to suit each biome’s needs. Additionally, it informs of potential adjustments to be made on global modeling approaches to better represent within-biome variability while capturing simultaneously biome diversity. With all the insights derived from this study AGBD mapping approaches can be further refined both at the global and regional levels.