Woody Aboveground Biomass Mapping of the Brazilian Savanna with a Multi-Sensor and Machine Learning Approach

The tropical savanna in Brazil known as the Cerrado covers circa 23% of the Brazilian territory, but only 3% of this area is protected. High rates of deforestation and degradation in the woodland and forest areas have made the Cerrado the second-largest source of carbon emissions in Brazil. However, data on these emissions are highly uncertain because of the spatial and temporal variability of the aboveground biomass (AGB) in this biome. Remote-sensing data combined with local vegetation inventories provide the means to quantify the AGB at large scales. Here, we quantify the spatial distribution of woody AGB in the Rio Vermelho watershed, located in the centre of the Cerrado, at a high spatial resolution of 30 metres, with a random forest (RF) machine-learning Remote Sens. 2020, 12, 2685; doi:10.3390/rs12172685 www.mdpi.com/journal/remotesensing Remote Sens. 2020, 12, 2685 2 of 22 approach. We produced the first high-resolution map of the AGB for a region in the Brazilian Cerrado using a combination of vegetation inventory plots, airborne light detection and ranging (LiDAR) data, and multispectral and radar satellite images (Landsat 8 and ALOS-2/PALSAR-2). A combination of random forest (RF) models and jackknife analyses enabled us to select the best remote-sensing variables to quantify the AGB on a large scale. Overall, the relationship between the ground data from vegetation inventories and remote-sensing variables was strong (R2 = 0.89), with a root-mean-square error (RMSE) of 7.58 Mg ha−1 and a bias of 0.43 Mg ha−1.


Introduction
The tropical savanna in Brazil, known as the Cerrado, is the second-largest biome in South America, covering over 200 million ha or approximately 23% of the Brazilian territory [1]. It has the highest species richness and biodiversity among the world's savannas [2,3]. Gradients of tree density, height, canopy cover, and aboveground biomass (AGB) in the Cerrado vary according to the climate, fire regime, geomorphology, and soil nutrient availability, resulting in 19 distinctive ecoregions [4]. The Cerrado is characterised by a mosaic of grasslands, shrublands, and forestlands in varying proportions, depending on the location [5]. Its physiognomies range from campo (grasslands) to the typical Cerrado stricto sensu (trees and shrubs up to 8-10-m-high and with an understory dominated by grass) and the cerradão (forest formations with trees up to a height of 20-m-high) [6,7].
Although the aboveground carbon stock in the Cerrado is lower than that of the Brazilian Amazon, the conversion of the Cerrado biome to different types of land uses is occurring much faster than in the Brazilian Amazon, mainly because the Brazilian livestock and agricultural frontier has been expanding towards the northern parts of the Cerrado over the last decades [8,9]. This trend has been increasing due to the governmental policies put in place since 2018. The Cerrado land use and land cover (LULC) mapping project, conducted by the Brazilian Ministry of Environment (MMA), showed that, in 2013, approximately 43% (88 million ha) of the biome had already been converted to different land uses, with 55% (111 million ha) still covered by native vegetation [4]. The remaining 2% of the Cerrado were covered by water bodies and by the class "unidentified", which included areas covered by clouds and burned areas [10]. Most of the remaining natural areas have been undergoing degradation due to unsustainable selective logging and burning activities, often overlooked as threats to habitat integrity and connectivity [4,11]. According to MMA-2019 [12] and the MapBiomas alert platform (https://mapbiomas.org/en/project), the Cerrado was the Brazilian biome with the highest levels of deforestation between October 2018 and March 2019, losing 47,704 ha of native vegetation. In addition, about 95% of the deforestation alerts were in areas without any authorisation-that is, without a license for deforestation, which is issued by either the federal or state environmental agency. More than 1400 ha of deforestation took place in legal reserves (Brazil's environmental legislation obligates private property owners to retain a fixed proportion of their total area for native vegetation. These areas are called "legal reserves") [13].
Only 3% of the Cerrado is strictly protected by the law within conservation areas [12,14], and the high rates of vegetation loss and degradation have made the Cerrado the second-largest source of carbon emissions in Brazil [15]. In this context, it is essential to monitor AGB and carbon stocks effectively, and reliable maps are needed for climate change mitigation policies [16][17][18]. Uncertainties in current vegetation carbon stock estimates over the Cerrado are high, and biomass estimates vary by more than 50 Mg ha −1 within the same area [19][20][21]. This demands improvements in the accuracy and spatial resolution to estimate the AGB in this biome. The challenge here is to take the large latitudinal gradient and the high variation of the vegetation structure into consideration [15], as well as the paucity of field studies quantifying AGB over different regions of the biome [15,16].

Vegetation
Most of the Cerrado natural vegetation is comprised of savanna formations (cerrado stricto sensu). However, forest formations such as woodlands, riparian forests, and seasonal forests play an important role in the carbon balance, because they have a higher carbon stock density [63]. In the Rio Vermelho watershed, woodlands or cerradão (in Portuguese) and seasonal forest formations predominate as the forest remnants. The forest inventory sites were located in two fragments of cerradão, and the plots follow a structural and biomass gradient from: (a) a savanna-cerradão transition zone (lower biomass), (b) cerradão, and (c) cerradão-seasonal forest transition zone (higher biomass). The Rio Vermelho watershed is part of a region in the Goiás State called "Mato Grosso Goiano". This denomination is based on the historical extensive and dense native forest cover (typical for the Mato Grosso State) in the region [59]. This part of the Brazilian Cerrado underwent intensive human settlement in the 18th century due to the Brazilian gold rush in Central Brazil [60]. Since forests in the Cerrado are supposedly associated with soils with good fertility, most of the forest-covered areas have already been converted to other uses, mainly croplands and pastures [59], sometimes resulting in land degradation. Today, forest formations are restricted to a small and few fragments. Pastures cover more than 57% of the study area [6], and livestock ranching is the main economic activity in the region. By 2018, only approximately 33% of the native vegetation persisted [12,61], and the remaining native vegetation is fragmented [62] and often located in legal reserves areas.

Vegetation
Most of the Cerrado natural vegetation is comprised of savanna formations (cerrado stricto sensu). However, forest formations such as woodlands, riparian forests, and seasonal forests play an important role in the carbon balance, because they have a higher carbon stock density [63]. In the Rio Vermelho watershed, woodlands or cerradão (in Portuguese) and seasonal forest formations predominate as the forest remnants. The forest inventory sites were located in two fragments of cerradão, and the plots follow a structural and biomass gradient from: (a) a savanna-cerradão transition zone (lower biomass), (b) cerradão, and (c) cerradão-seasonal forest transition zone (higher biomass).
Dystrophic cerradão is typical for deep, highly lixiviated soil (Oxisols and sandy soils) areas with a seasonal tropical climate, and their canopy height varies between 8 and 15 m. Canopy cover varies between 50% and 90%, with an understory of shrubs and grass [62]. They are structurally similar to seasonal forests and differ mainly in species composition, as they are comprised of seasonal forest as well as wooded savanna species [62,64].
Seasonal forests are often associated with more fertile soils (mesotrophic and eutrophic soils) and present different levels of deciduousness during the dry season. The canopy height varies between 15 and 25 m, and the forest cover is between 70% and 95% during the rainy season. In the dry season, the canopy cover can be lower than 50% in seasonal semideciduous forests and lower than 35% in seasonal deciduous forests [62].
In wooded savannas, most trees are shorter and sparser than in forest formations, allowing for a continuous herbaceous grassy layer. Wooded savannas are separated into subgroups according to their structural gradient: cerrado ralo (2-3-m-tall trees with 5-20% canopy cover), cerrado sensu stricto or typical cerrado (3-6-m-tall trees with 20-50% forest cover), and cerrado denso (8-15-m-tall trees with 50-70% forest cover) [65]. Figure 2 shows the flowchart of the methodology used to produce the AGB map of the Rio Vermelho watershed. Our reference datasets consisted of ground measurements and airborne LiDAR data. We used a combination of Earth Observation (EO) datasets from multispectral passive optical and synthetic aperture radar (SAR) sensors as predictor variables to estimate the AGB of the study area. The description of each step of the methodology is provided in the following subsections.

Methodology
Remote Sens. 2020, 11, x FOR PEER REVIEW 5 of 23 Dystrophic cerradão is typical for deep, highly lixiviated soil (Oxisols and sandy soils) areas with a seasonal tropical climate, and their canopy height varies between 8 and 15 m. Canopy cover varies between 50% and 90%, with an understory of shrubs and grass [62]. They are structurally similar to seasonal forests and differ mainly in species composition, as they are comprised of seasonal forest as well as wooded savanna species [62,64].
Seasonal forests are often associated with more fertile soils (mesotrophic and eutrophic soils) and present different levels of deciduousness during the dry season. The canopy height varies between 15 and 25 m, and the forest cover is between 70% and 95% during the rainy season. In the dry season, the canopy cover can be lower than 50% in seasonal semideciduous forests and lower than 35% in seasonal deciduous forests [62].
In wooded savannas, most trees are shorter and sparser than in forest formations, allowing for a continuous herbaceous grassy layer. Wooded savannas are separated into subgroups according to their structural gradient: cerrado ralo (2-3-m-tall trees with 5-20% canopy cover), cerrado sensu stricto or typical cerrado (3-6-m-tall trees with 20-50% forest cover), and cerrado denso (8-15-m-tall trees with 50-70% forest cover) [65]. Figure 2 shows the flowchart of the methodology used to produce the AGB map of the Rio Vermelho watershed. Our reference datasets consisted of ground measurements and airborne LiDAR data. We used a combination of Earth Observation (EO) datasets from multispectral passive optical and synthetic aperture radar (SAR) sensors as predictor variables to estimate the AGB of the study area. The description of each step of the methodology is provided in the following subsections.

Ground Truth Data
We used woody AGB data from 15 (20 m × 50 m) field plots located in cerradão vegetation to estimate the AGB from airborne LiDAR. These plots were established in 2014, and all trees with a diameter at breast height (dbh) ≥ 5 cm at 1.30 m above the ground were considered. The tree heights

Ground Truth Data
We used woody AGB data from 15 (20 m × 50 m) field plots located in cerradão vegetation to estimate the AGB from airborne LiDAR. These plots were established in 2014, and all trees with a diameter at breast height (dbh) ≥ 5 cm at 1.30 m above the ground were considered. The tree heights were measured with a digital clinometer (HAGLOF ECII-D). The basal area in m 2 was calculated from the dbh (basal area = (π/4) * (dbh/100) 2 ) ( Table 1). Ten plots were located in the remnants of native   As mentioned, the two native vegetation fragments sampled are classified as cerradão. However, due to the high structural heterogeneity in the Cerrado biome, there were considerable variations in the structure of the woody vegetation within the sampled plots. The floristic and structural characteristics of the plots described in Table 1 (and from Figure S1 to Figure S20 in the Supplementary Materials) corroborate that the sampled plots adequately represent the gradient of floristic-structural variation inherent to forest formations from all of the Cerrado. Thus, four plots were classified as a savanna-cerradão transition zone, seven as cerradão, and four as a cerradão-seasonal forest transition zone [62]. The sampled plots showed a height gradient directly related to the aboveground biomass stocks of woody vegetation. In the savanna-cerradão transition zone, the mean height was below 6 m, in the cerradão, between 6 and 8 m, and, in the cerradão-seasonal forest transition zone, above 9 m (Table 1). These plots were representative of the structural variation found in the cerradão vegetation of the region, with AGB values ranging from 19 to 104 Mg ha −1 (Table 1).
In tropical ecosystems, species diversity is generally very high. Therefore, generalised (mixed-species) allometric models are more appropriate than species-specific equations [66,67]. We used the following mixed species allometric model [68] where dbh is the diameter at breast height, and h is height. In our study, these in situ woody AGB estimates were then used to establish an empirical model to estimate AGB as a function of the structural metrics derived from the airborne LiDAR.

LiDAR Data
We used 60 tiles of airborne LiDAR data (covering a total area of 1009.01 ha) collected by the Sustainable Landscapes project led by the Brazilian Enterprise for Agricultural Research Corporation (EMBRAPA). The LiDAR data covered the municipalities of Itapirapuã and Goiás, Goiás State, Brazil. The flights took place between 20 June 2015 and 7 July 2015. The LiDAR dataset has an average density of returns of~45 ppm 2 (points per square meter). The average altitude of the flights was~850 m, with a field of view of 12 • . Two LiDAR sensors (Optech Orion M300 and Optech ALTM 09SEN243) were used in these campaigns, but the percentage of flight line overlaps was high (~65%). LiDAR returns of all 15 plots are shown from Figure S6 to Figure S20 in the Supplementary Materials.
The LASTools software [69] was used to process the LiDAR data and to generate the following variables derived from the LiDAR point cloud data: DSM (digital surface model), CHM (canopy height model), DTM (digital terrain model), CC (canopy cover coverage), CD (canopy density), Max-H (maximum height), Percentile_p15, Percentile_p10, Percentile_p20, Percentile_p30, Percentile_p35, Percentile_p55, Percentile_p40, Percentile_p45, Percentile_p50, Percentile_p60, and Percentile_p65. Highly correlated variables were excluded, and the ones containing the most unique information, CHM, CC, and CD, were used to establish statistical models to extrapolate the spatially limited LiDAR AGB estimates to both the optical and radar observations ( Figure 2).

Optical Data
The United States Geological Survey (USGS) Landsat 8 Collection 1 Tier 1 data consists of surface reflectance products generated from the Landsat 8 Operational Land Imager (OLI), with 30-m spatial resolution. The USGS atmospherically corrects the scenes using the Landsat 8 Surface Reflectance Code (LaSRC), which also uses the C function of the mask (CFMASK) algorithm [70] to generate a cloud, shadow, water, and snow mask. We selected all the scenes acquired in the period of 2015-2017 (244 scenes) to generate a cloud and shadow-free median temporal composite ( Figure 2). In addition to the spectral bands, we generated a range of vegetation indices that could potentially be representative for the vegetation canopy structure, seasonality, and a measure of vegetation greenness. These were the normalised difference vegetation index (NDVI), normalised burn ratio (NBR), normalised difference moisture index (NDMI), and the soil adjusted vegetation index (SAVI). These indices have previously been used in similar studies [21,23,[71][72][73].

SAR Data
We used the global 25-m resolution ALOS PALSAR/PALSAR−2 annual mosaics, which are freely available at https://www.eorc.jaxa.jp/ALOS/en/palsar_fnf/fnf_index.htm. This dataset was pre-processed by the Japanese Aerospace Exploration Agency (JAXA) using L-band SAR images of the backscattering coefficient acquired by the Advanced Land Observing Satellite (ALOS) and Advanced Land Observing Satellite-2 (ALOS-2). The mosaics consist of 10 × 10-degree tiles pre-processed to correct for geometric distortions (ortho-rectification) and topographic effects [74]. The mosaics were calibrated to γ0 using the following equation: where γ 0 is the gamma-naught in decibels (dB), DN is the digital number in unsigned 16 bit, and CF is a calibration constant of 83.0 dB. We reduced the noise of the mosaics by applying a temporal multi-channel filter [75] with a 5 × 5-pixel moving average window. Then, we applied a temporal normalisation between PALSAR Remote Sens. 2020, 12, 2685 8 of 22 and PALSAR-2 images aiming to correct for soil moisture and sensor differences. For our final PALSAR-2 composite, we used the median value of the three mosaics (2015, 2016, and 2017) ( Figure 2). We observed a geolocation error of~80 m on average for the PALSAR-2 mosaics in comparison to HR (high-resolution) imagery and Landsat 8 scenes. We generated a Sentinel-1 SAR median γ 0 composite using 283 scenes from between 2016 and 2018, and we used this not as an input for our modelling but as a geolocation reference image. The PALSAR-2 composite was georeferenced against the Sentinel-1 composite using a SAR-to-SAR co-registration. This approach, instead of a SAR-to-optical approach, was chosen to avoid errors derived from the different viewing geometries of SAR and optical images. The variables used for this study were the SAR backscatter coefficients (γ 0 HV and γ 0 HH ) and two SAR indices-namely, the radar forest degradation index (RFDI) and the cross-polarised ratio (CpR) [74].

Modelling Framework
We adopted a two-stage upscaling method from field measurements to airborne LiDAR point clouds and from the LiDAR-based estimates to satellite imagery. We estimated the woody AGB of the field plots and used these as a reference to estimate the woody AGB across the LiDAR footprints using LiDAR-derived structural variables as a predictor. Our field plots covered a representative range of flora and structure of the woody vegetation (Table 1) from 19 Mg ha −1 to 104 Mg ha −1 . As the LiDAR point clouds are sensitive to the forest structure, we assumed that these plots could express the physical and structural variations of the vegetation of the study site, and we allowed the AGB LiDAR model to extrapolate for values < 19 Mg ha −1 . This field-to-LiDAR procedure allowed us to increase our sampling from the 15 plots to thousands of observations derived from the LiDAR footprint covering a wide range of woody AGB and vegetation types. We then used these LiDAR AGB, representative for the vegetation in the region, in combination with EO datasets to estimate the woody AGB over the entire Rio Vermelho watershed.
The first step was to model the relationship between LiDAR woody vegetation structural variables and the AGB estimated from the field plots ( Figure 2). We generated raster-based LiDAR vegetation structural variables with using a 1-m pixel size. We then used the average value of the pixels within the 20 × 50-m plots (0.10 ha) to develop our models. As we had a limited number of field plots, we had to limit the number of variables included in the empirical models as well. Based on a visual exploration of the relationships between the dependent and independent variables, as well as the distribution of each predictor, we identified three potential predictors: canopy height model (CHM), canopy density (CD), and canopy cover (CC). We then used all possible combinations of these variables to find the best generalised linear model to estimate the AGB based on the Akaike's information criterion corrected for a small sample size (AICc). This final model was used to estimate the AGB over the entire LiDAR footprint at a 30-m spatial resolution (0.09 ha). The next step was to model the relationship between the estimated LiDAR AGB and the EO datasets. These EO datasets were resampled, co-registered, and stacked at 30-m spatial resolutions ( Figure 2). We used the LiDAR AGB pixels (N = 2,973) as training and test samples in a regression version of the random forest (RF) algorithm [76] in the Google Earth Engine [77], which was implemented within a k-fold cross-validation framework [73]. The following variables were used as predictors in the model: γ 0 HV ; γ 0 HH ; RFDI; CpR from the PALSAR-2 data; and blue, green, red, near infrared, and shortwave infrared bands, as well as NDVI, NBR, NBR2, NDMI, and SAVI from the Landsat 8 data. RF is a nonparametric machine-learning algorithm that uses a bootstrap technique to construct multiple decision trees. Jackknife tests were also run to compare the importance of the predictor variables based on a R 2 of the model. Due to the randomness (stochastic nature) of the RF algorithm, the importance may change for each run. Thus, the variable importance analysis was run for each k and then averaged to produce the overall importance figures. The test was performed running each single set of variables in isolation to assess the accuracy of each set. The higher the R 2 , the higher the importance of the variable. Then, a set of variables was excluded each time from the whole set to assess the drop in the variance explained by the model. The k-fold cross-validation framework was used to train and validate the algorithm (Figure 2), maximising the reference data available. Figure 2 shows the overview of the method used to produce the AGB map and the uncertainty characterisation in the Rio Vermelho watershed. The whole reference dataset was used for both training and validation purposes. The dataset was stratified by three AGB levels (low, medium, and high), and randomly sampled into the folds to ensure that all folds have similar probability distribution functions of the AGB. Then, k-1 folds were used to train the RF algorithm and to produce the AGB map, while the remaining fold was used for the validation. The process was then run k times, reusing the folds for training purposes but using them only once for validation purposes. The final outputs were k AGB maps over the study area. The mean value of the k AGB maps was used as the final AGB map, and we used the standard deviation (SD) of the k estimates for any given pixel to generate a prediction error (σ prediction ) map. This error also accounts for the representativeness of the sampling sites of the true distribution of AGB in the region [21]. The total SD (σ AGB ) was propagated, as explained in Rodriguez-Veiga et al. [29] and Saatchi et al. [21], as follows: where the measurement error (σ measurement ) of the tree level parameters averaged at the plot scale was assumed as 10% [78], the LiDAR error (σ LiDAR ) was assumed as 1.5% (from a 0.11-m instrument error for an average 7.13 m canopy height), the allometric error (σ allometry ) was assumed to be 49.83% (Scolforo et al. [68]), and the sampling error (σ sampling ) from the variability of AGB within the pixels was estimated as 12.39% based on Réjou-Méchain et al. [79]. Our k-fold cross-validation assessment followed James et al. [80] and involved the calculation of the R 2 , RMSE, rel. RMSE, and the mean bias error (MBE). Aside from the overall assessment, we also analysed the errors by biomass ranges. Finally, we also compared our AGB estimates to four other studies [21][22][23]81] (Table 2). Baccini et al. [23] used multi-sensor satellite data to estimate the aboveground live woody vegetation carbon density for pantropical ecosystems with a 500-m resolution. Saatchi et al. [21] mapped the total carbon stock in live biomasses (above-and belowground) in the tropics using a combination of data from in situ inventory plots and satellite LiDAR samples of the forest structure to estimate the carbon storage, plus optical and microwave imagery (1 km resolution), to extrapolate over the landscape. Santoro et al. [81] estimated AGB globally at a 100-m resolution by combining spaceborne SAR, LiDAR, and optical observations for the year 2010, with auxiliary datasets from forest inventories-namely, additional remote-sensing observations, climatological variables, and ecosystems classifications. Avitabile et al. [22] combined two existing datasets of vegetation aboveground biomasses (AGB) into a pantropical AGB map at a 1-km resolution using an independent reference dataset of field observations and locally calibrated it using high-resolution, harmonised, and upscaled biomass maps. For the comparison of our results to these maps, we aggregated all of them to the same spatial resolution (1 km).  Table 3 shows the models tested in this study. The AGB = f(CHM, CC) model was the one that best explained the field-based AGB variations (Table 3, adj R 2 = 0.93, RMSE = 6.74 Mg ha −1 , or 13% of the mean biomass values) and, therefore, was used to predict the AGB over the whole LiDAR flight footprint. This predicted LiDAR AGB was used to generate training and test samples to run the RF algorithm to extrapolate the AGB to the EO datasets covering the entire Rio Vermelho watershed.

LiDAR-Derived AGB Map
Although the AGB = f(CC, CD, CHM) model resulted in a slightly smaller RMSE and a higher AIC than AGB = f(CHM, CC), both models showed the same R 2 (= 0.93) and RMSE (6.74 Mg ha −1 or rel. RMSE = 13%). We therefore opted for the AGB = f(CHM, CC) model, as it is more parsimonious and only uses two independent variables (Table 3 and Figure S1 in the Supplementary Materials). Table 3. Model comparison according to Akaike's information criterion corrected for a small sample size (AICc). Parameters for each model include: LogLik (log-likelihood), k (number of predictor variables), AICc (Akaike's information criterion corrected for a small sample size), ∆AICc (difference in AICc between the current and the best model), Adj R 2 (adjusted coefficient of determination), and RMSE (root mean square error). The intercept and coefficients, as well as the coefficient of determination (R 2 ) for each model, are also presented. CHM = canopy height model, CD = canopy density, and CC = canopy cover.

AGB and Uncertainty Map
The AGB varied from 0 to 90 Mg ha −1 per pixel (Figure 3), whereas the pixel-scale uncertainty estimated by the error propagation approach ranged from 0 to 49 Mg ha −1 (Figure 4). Figure 5 shows the averaged variable importance analysis across the k-fold procedure for each set of variables derived from Landsat 8 (L8) and ALOS-2/PALSAR-2 (ALOS) included in the random forest (RF) model. These results indicated that the Landsat reflectances composite was the most important set of variables when predicting AGB. However, the ALOS indices contain more unique information not represented by the other variables, as shown by the largest decrease in accuracy when the variable was excluded. Figure 6 shows the overall accuracy assessment and Table 4 the assessment by AGB range.
The accuracy assessment between the AGB map predicted from EO and the AGB reference from LiDAR showed R 2 = 0.89, RMSE = 7.58 Mg ha −1 , rel. RMSE = 43%, and bias = 0.43 Mg ha −1 (Figure 6). Our map shows an underestimation of very high AGB (negative bias) and a slight overestimation of low AGB (positive bias) [28] (Table 4). We also found a RMSE of 6.39 Mg ha −1 (rel. RMSE = 61%) for AGB lower than 50 Mg ha −1 and a RMSE = 13.41 Mg ha −1 (rel. RMSE = 19%) for AGB higher than 50 Mg ha −1 . Details of the accuracy assessment by biomass range are shown in Table 4.
the averaged variable importance analysis across the k-fold procedure for each set of variables derived from Landsat 8 (L8) and ALOS-2/PALSAR-2 (ALOS) included in the random forest (RF) model. These results indicated that the Landsat reflectances composite was the most important set of variables when predicting AGB. However, the ALOS indices contain more unique information not represented by the other variables, as shown by the largest decrease in accuracy when the variable was excluded. Figure 6 shows the overall accuracy assessment and Table 4 the assessment by AGB range.     The accuracy assessment between the AGB map predicted from EO and the AGB reference from LiDAR showed R 2 = 0.89, RMSE = 7.58 Mg ha −1 , rel. RMSE = 43%, and bias = 0.43 Mg ha −1 (Figure 6). Our map shows an underestimation of very high AGB (negative bias) and a slight overestimation of low AGB (positive bias) [28] (Table 4). We also found a RMSE of 6.39 Mg ha −1 (rel. RMSE = 61%) for AGB lower than 50 Mg ha −1 and a RMSE = 13.41 Mg ha −1 (rel. RMSE = 19%) for AGB higher than 50 Mg ha −1 . Details of the accuracy assessment by biomass range are shown in Table 4.  The accuracy assessment between the AGB map predicted from EO and the AGB reference from LiDAR showed R 2 = 0.89, RMSE = 7.58 Mg ha −1 , rel. RMSE = 43%, and bias = 0.43 Mg ha −1 (Figure 6). Our map shows an underestimation of very high AGB (negative bias) and a slight overestimation of low AGB (positive bias) [28] (Table 4). We also found a RMSE of 6.39 Mg ha −1 (rel. RMSE = 61%) for AGB lower than 50 Mg ha −1 and a RMSE = 13.41 Mg ha −1 (rel. RMSE = 19%) for AGB higher than 50 Mg ha −1 . Details of the accuracy assessment by biomass range are shown in Table 4.   [21][22][23]81]. Although these maps were estimated using different methods and at different resolutions over the pantropical area, it is interesting to observe how the different data products compared to each other. While this study and Avitabile et al. [22] presented probability distribution functions skewed towards low biomass ranges, other studies [21,23,81] showed distributions tending towards much higher AGB levels, with averages two to three times larger than those found in this study and by Avitabile et al. [22] (Figure 7). Figure 8 shows that the datasets from [21,23,81] did not adequately represent all the variations that exist in the area due to the lower spatial resolution and, also, generally showed a greater overestimation of the AGB. Although Santoro et al. [81] provided more details in terms of the distribution of AGB in comparison with the other three maps, their map still overestimated the AGB compared to our results. In the map developed in this study, one can observe the AGB variation in detail, which is corroborated by the observed high coefficient of determination (R 2 = 0.89) between the AGB reference datasets and the final AGB map.   [21][22][23]81]. Although these maps were estimated using different methods and at different resolutions over the pantropical area, it is interesting to observe how the different data products compared to each other. While this study and Avitabile et al. [22] presented probability distribution functions skewed towards low biomass ranges, other studies [21,23,81] showed distributions tending towards much higher AGB levels, with averages two to three times larger than those found in this study and by Avitabile et al. [22] (Figure 7). Figure 8 shows that the datasets from [21,23,81] did not adequately represent all the variations that exist in the area due to the lower spatial resolution and, also, generally showed a greater overestimation of the AGB. Although Santoro et al. [81] provided more details in terms of the distribution of AGB in comparison with the other three maps, their map still overestimated the AGB compared to our results. In the map developed in this study, one can observe the AGB variation in detail, which is corroborated by the observed high coefficient of determination (R 2 = 0.89) between the AGB reference datasets and the final AGB map.

Discussion
In 2014, Brazil submitted a greenhouse gas (GHG) emission report (Forest Reference Emission Level (FREL) for Reducing Emissions from Deforestation for REDD+ under the United Nations Framework Convention on Climate Change (UNFCCC)) [82] as part of the country's commitment to reduce GHG emissions from deforestation in the Amazon (FREL Amazonia). Brazil indicated in that report that its national report would be the sum of the FRELs for each of the six Brazilian biomes. The FREL report for the Cerrado biome (FREL Cerrado), together with the FREL Amazonia, showed that emissions induced by land use changes accounted for approximately 73% of the emissions in Brazilian territory [82]. This shows the importance of the Cerrado biome for REDD+ policies and for the efforts of climate change mitigation in the country, such as the national inventories initiatives as part of the Third National Communication of Brazil to the UNFCCC [63].
The reduced number of studies on the quantification and spatial variability of the total Cerrado AGB stocks limits a full understanding of CO2 emission patterns in this biome. These are especially critical for local and national climate governance strategies. Appropriate management and mitigation models will depend on the accuracy of this information. We produced the first high-resolution continuous map of AGB in an area of the Brazilian Cerrado, making use of a combination of vegetation inventory plots, airborne light detection and ranging (LiDAR) data, and satellite images (Landsat 8 and ALOS-2/PALSAR-2). Fieldwork in the Brazilian Cerrado is challenging, timeconsuming, and expensive, and existing field datasets still do not entirely represent the extent of the biome. An alternative to increasing the sample size is using LiDAR by upscaling the information from field plots (which ideally should be representative of the analysed ecosystem). This strategy has been used in recent studies in other forest types [83,84]. We increased our sample size thousands of times, reaching a very high R 2 of 0.93 in the AGB LiDAR estimation. This consistent result was essential to allow us to use LiDAR as a reference for our AGB modelling process using satellite images and machine learning.  [23] (500 m) (C), Avitabile et al. [22] (1 km) (D), and Saatchi et al. [21] (1 km) (E).

Discussion
In 2014, Brazil submitted a greenhouse gas (GHG) emission report (Forest Reference Emission Level (FREL) for Reducing Emissions from Deforestation for REDD+ under the United Nations Framework Convention on Climate Change (UNFCCC)) [82] as part of the country's commitment to reduce GHG emissions from deforestation in the Amazon (FREL Amazonia). Brazil indicated in that report that its national report would be the sum of the FRELs for each of the six Brazilian biomes. The FREL report for the Cerrado biome (FREL Cerrado), together with the FREL Amazonia, showed that emissions induced by land use changes accounted for approximately 73% of the emissions in Brazilian territory [82]. This shows the importance of the Cerrado biome for REDD+ policies and for the efforts of climate change mitigation in the country, such as the national inventories initiatives as part of the Third National Communication of Brazil to the UNFCCC [63].
The reduced number of studies on the quantification and spatial variability of the total Cerrado AGB stocks limits a full understanding of CO 2 emission patterns in this biome. These are especially critical for local and national climate governance strategies. Appropriate management and mitigation models will depend on the accuracy of this information. We produced the first high-resolution continuous map of AGB in an area of the Brazilian Cerrado, making use of a combination of vegetation inventory plots, airborne light detection and ranging (LiDAR) data, and satellite images (Landsat 8 and ALOS-2/PALSAR-2). Fieldwork in the Brazilian Cerrado is challenging, time-consuming, and expensive, and existing field datasets still do not entirely represent the extent of the biome. An alternative to increasing the sample size is using LiDAR by upscaling the information from field plots (which ideally should be representative of the analysed ecosystem). This strategy has been used in recent studies in other forest types [83,84]. We increased our sample size thousands of times, reaching a very high R 2 of 0.93 in the AGB LiDAR estimation. This consistent result was essential to allow us to use LiDAR as a reference for our AGB modelling process using satellite images and machine learning.
In our study, the RF algorithm proved to be a good strategy to estimate the AGB in the study area (Table 4 and Figure 6). This result can be attributed to the ability of these techniques to capture the nonlinearity present in the data, as they can approximate complex functions [85]. This is an important characteristic for the modelling of vegetation, such as ecological patterns governed by nonlinear and interactive processes at any ecological scale [85], since they usually present complex behaviours [28].
Current sources of AGB information for the Cerrado are the global and pantropical maps that have a limited spatial resolution (from 100 m to 1 km) and questionable high estimations for this biome (Figures 7 and 8) [2]. Figure 7 shows the distribution by AGB intervals, average, and total AGB observed from our study and the four global and pantropical maps over the Rio Vermelho watershed [21][22][23]81]. This study and the one conducted by Avitabile et al. [22] presented distributions with an average of 18.66 Mg ha −1 and 16.25 Mg ha −1 , respectively. On the other hand, Santoro et al. [81], Saatchi et al. [21], and Baccini et al. [23] showed distributions tending towards much higher AGB levels, with averages of 33.89 Mg ha −1 , 31.12 Mg ha −1 , and 58.59 Mg ha −1 , respectively. Saatchi et al. [21] and Baccini et al. [23] found the distributions of the AGB tending toward a normal distribution for the study site, which was not the case for the map produced by Avitabile et al. [22], Santoro et al. [81], and our study. Morandi et al. [2] recently calculated AGB values for the Cerrado biome using a large independent dataset of field plots. The study estimated an average of 20.4 ± 6.0 Mg ha −1 for the central areas of the Cerrado biome, where this watershed is located, which agrees with our study results. However, global and pantropical studies showed significantly higher AGB levels in comparison to Morandi et al. [2], our study (Figure 7), and to the AGB values measured in our field plots. We also observed the tendency of these products to overestimate the AGB on the lower AGB ranges, which results from the models attempting to compensate for the underestimation at the higher AGB ranges due to the potential signal saturation [28]. This agrees with previous studies that have indicated the inconsistency of these products when compared in several forest biomes [86,87]. Our regional AGB map (Figure 3) takes the local vegetation structure of the Brazilian Cerrado into account and has a spatial resolution of 30 m. Overall, the relationship between our reference data and the remote-sensing variables is relatively high (R 2 = 0.89), with a root mean square error (RMSE) of 7.58 Mg ha −1 . The accuracy assessment showed that our map underestimates high AGB levels and slightly overestimates low AGB levels (Table 4). This result fully meets the biomass product accuracy requirements set by the BIOMASS mission (RMSE < 10 Mg ha −1 for AGB < 50 Mg ha −1 and rel. RMSE < 20% for AGB > 50 Mg ha −1 ) [56], with a RMSE of 6.39 Mg ha −1 for AGB < 50 Mg ha −1 and a rel. RMSE = 19% for values > 50 Mg ha −1 . It is important to highlight that we were interested in a representative range for cerradão AGB, including transitions to other vegetation types.
When analysing the statistical indicators of the goodness-of-fit (RMSE and R 2 ) ( Table 4) using the RF technique, a higher accuracy was achieved for the AGB range > 40 Mg ha −1 . However, even though we are confident in our AGB estimations, we have to be cautious with the estimations of AGB values < 19 Mg ha −1 , areas that are not covered by woody vegetation (e.g. grasslands, agriculture, and bare soil), as our field-to-LiDAR AGB model was developed using field plots located only in woody vegetation areas. In Figure 6, we can observe an underestimation of the highest AGB ranges (AGB > 80 Mg ha −1 ), which is also reflected in the negative bias for this high AGB level ( Table 4). The range of predictions random forest regression can make is bound by the highest and lowest values in the training data. This can lead to overestimations in the lower value range and underestimations in the higher range [88]. The scatterplot between the satellite AGB estimates from our results and the LiDAR AGB showed this effect with the slight underestimation of high AGB (Table 4 and Figure 6). This effect was also observed in the studies of Nunes and Görgens [88] and Zhang and Lu [89].
It is reported that machine-learning techniques can provide more accurate estimates than classical regression models [88]. Silva et al. [90] evaluated the use of machine-learning techniques and mixed models to estimate the volume and AGB of individual trees in the Brazilian Cerrado. The machine-learning techniques presented, and mixed-effect models, showed similar and highly accurate results (R 2 = from 0.97 to 0.99 and RMSE = from 12% to 13%) [58]. According to Silva et al. [90], because the modelling of forest resources commonly presents complex relationships among the variables, nonlinear mixed-effects modelling (NLME) and machine-learning techniques such as RF, adaptive network-based fuzzy inference systems (ANFIS), and artificial neural networks (ANNs) may be good alternative modelling techniques. In our study, using RF, we produced an AGB map of an area in the Brazilian Cerrado with a R 2 of 0.89 and RMSE of 7.58 Mg ha −1 . Previous studies tried to estimate the AGB in savannas using satellite data and reached performances lower than the results from our work [54,91,92].
Over the last decades, several studies have used SAR, especially ALOS PALSAR L-band radar images, to estimate the AGB of tropical savannas. Mermoz et al. [93] used ALOS PALSAR data to map the AGB in savanna ecosystems in Cameroon. They argued that L-band PALSAR mosaics are suitable for the retrieval of savanna AGB (typically less than 100 Mg ha −1 ) at the national and continental scales. In a similar study, Carreiras et al. [27] tested a combination of field data and ALOS PALSAR backscatter intensity to reduce the uncertainty in the estimation of vegetation AGB in the Miombo savanna woodlands of Mozambique (East Africa). They applied a machine-learning algorithm, resulting in a good fit and accurate model (R 2 = 0.95 and RMSE = 5.03 Mg ha −1 ). However, since they used bootstrap samples in combination with a cross-validation procedure, the reported cross-validation statistics could be overoptimistic [22]. A combination of datasets from different sources, such as in this study, proved to be efficient when the goal was to reduce uncertainties in the AGB estimates. Recent studies have explored the combination of different data types. Braun et al. [91] used passive and active microwaves to estimate the AGB of savannas. They introduced the integration of passive brightness temperature as an additional variable for AGB estimation, based on the hypothesis that it contains information complementary to the microwave backscatter coefficient from active sensors.
As we found in our work, many studies have shown the advantages of combining optical and SAR datasets using machine-learning techniques such as RF for AGB estimations. The optical data are sensitive to the biophysical properties of the vegetation, and radar data are more sensitive to the electrical and geometrical information of the vegetation-more specifically, the moisture content and vegetation structure. Forkuor et al. [46] showed in a recent study that combining field inventory data with Sentinel 1 (SAR) and Sentinel 2 (optical) to estimate the AGB in the West African dryland forest (tropical savanna and woodlands) using a RF algorithm performed much better than either one of them alone, reaching a R 2 of 0.90. This corroborates our results, which have also shown an improvement in accuracy using the combination of SAR and optical datasets rather than using either one individually. Our study also showed the highest performance (R 2 = 0.89) when all variables were included. The variable importance analysis ( Figure 5) showed a slightly superior R 2 for Landsat 8 reflectance data (R 2 = 0.84) in comparison to ALOS−2 backscatter data (R 2 = 0.79) when mapping the AGB using a single set of data. Landsat 8 indices seem to have an almost identical performance to the Landsat 8 reflectance data, and there is no significant drop in performance when the indices are excluded ( Figure 6). Conversely, the exclusion of the ALOS indices from the whole set results in the largest drop of R 2 (i.e., −0.07), despite having the lowest performance when used alone (R 2 = 0.51). This agrees with previous studies that found a higher contribution of the optical reflectance data to estimate AGB, potentially due to the sensitivity of optical data to shadow and moisture [23,29,94], which can be a key factor in sparse vegetation such as Cerrado. Additionally, the relatively low AGB levels in the study area might provide a level playing field with L-band SAR data, which usually presents a sensitivity of the signal to higher AGB levels than optical data [95][96][97][98][99]. Our study highlighted the need to use remote sensing in combination with local vegetation inventories to effectively quantify the spatial variation of AGB in ecosystems of the Brazilian Cerrado. Future high-resolution maps of ABG will likely be more useful to quantify carbon emissions from Cerrado degradations at the local and regional scales. The methodology presented here has the potential to be used to generate the first of its kind AGB map of the entire Brazilian Cerrado, which is often neglected in carbon stock assessments of the South American continent.

Conclusions
We provided the first high-resolution map of the AGB (30-m resolution) of a Brazilian Cerrado area using a combination of field inventory plots, airborne light detection and ranging (LiDAR) data, and satellite images (Landsat 8 and ALOS−2/PALSAR−2). We used random forest (RF) models and jackknife analyses to study the importance of remote-sensing variables to quantify the AGB of cerradão at large scales. Overall, the relationship between the reference data and remote-sensing variables is strong (R 2 = 0.89), with a root mean square error (RMSE) of 7.58 Mg ha −1 . Spatially, our map slightly underestimated and overestimated the values of the AGB in few areas of the savanna (bias = 0.43 Mg ha −1 ). However, this spatial bias is similar to other AGB maps. Our study highlights the need to use remote sensing in combination with local field inventories to effectively quantify the spatial variation of AGB in the ecosystems of the Brazilian Savanna.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-4292/12/17/2685/s1: Figure S1: One-on-one relationship between the selected airborne LiDAR predictors (CC (%): canopy cover; and CHM (m): canopy height model) and the mean AGB values measured in the field plots (Mg ha-1). Figure S2 to Figure S5: Field photos of natural vegetation found in the Rio Vermelho watershed. Figure S6 to Figure S20: LiDAR returns.