Non-Parametric Retrieval of Aboveground Biomass in Siberian Boreal Forests with ALOS PALSAR Interferometric Coherence and Backscatter Intensity

The main objective of this paper is to investigate the effectiveness of two recently popular non-parametric models for aboveground biomass (AGB) retrieval from Synthetic Aperture Radar (SAR) L-band backscatter intensity and coherence images. An area in Siberian boreal forests was selected for this study. The results demonstrated that relatively high estimation accuracy can be obtained at a spatial resolution of 50 m using the MaxEnt and the Random Forests machine learning algorithms. Overall, the AGB estimation errors were similar for both tested models (approximately 35 t ̈ha ́1). The retrieval accuracy slightly increased, by approximately 1%, when the filtered backscatter intensity was used. Random Forests underestimated the AGB values, whereas MaxEnt overestimated the AGB values.


Introduction
Aboveground biomass (AGB) is an important variable in carbon accounting and climate science.In particular, forest AGB is relevant because forests constitute approximately 70%-90% of the Earth's aboveground biomass [1].
The AGB is defined as the mass of living organic matter growing above ground level per unit area at a particular time.The difference in AGB over time allows for measurement of carbon sequestration (excluding root growth) and carbon emission from deforestation, forest degradation, and forest fires.The estimation of AGB in the boreal forest is of special concern as it constitutes the largest biome in the world and has substantial carbon accumulation capability.Russia, as the country with the largest forested area in the world (809 million ha [2]), provided more than 90% of the carbon sink of the world's boreal forests between the years 2000 and 2007 [3].Despite this importance, Russia's boreal forest has the highest uncertainty in carbon stock calculations [4,5].This is mostly due to poor measurements of biomass stocks, forest degradation, deforestation, and forest growth.Additionally, due to the lack of financial support, some forested regions in Siberia have not been inventoried for more than 20 years [6].Therefore, there is a strong need for earth observation-based methods to reduce costs and improve biomass estimations.
The most common method of measuring AGB is estimation from field measurements, such as stem diameter and tree height, using allometric models.However, due to the sampling nature of the field measurements and their high acquisition costs, they can only be collected over small areas.Satellite technology together with reliable in situ measurements allows for accurate and relatively cost-efficient wall-to-wall AGB estimates.
There are different remote sensing techniques for AGB retrieval.Several publications provide a comprehensive review of use of remote sensing techniques for biomass estimation including sourcebooks of recommended methods and data sources [7][8][9][10][11][12][13][14][15][16].The estimates using optical sensors are feasible at low biomass levels using vegetation indices, bidirectional reflectance distribution function (BRDF), and texture [17][18][19][20][21][22].The latest results of biomass estimation using Landsat data showed that an accuracy of ˘36% can be measured in the boreal zones [23].The most accurate estimates are gathered from airborne light detection and ranging (LiDAR) systems.The only archive data from the satellite profiling LiDAR for measuring and monitoring vegetation are from the Ice, Cloud, and land Elevation (ICESat) Geoscience Laser Altimeter System (GLAS).However, the data have some limitations related to the large footprint, sparse coverage, and sensitivity to terrain variability [24][25][26][27][28].
Another technique that has a potential for forest AGB estimation is synthetic aperture radar (SAR).Similar to LiDAR sensors, radar systems are sensitive to the geometrical properties of observed objects; on the contrary, SAR platforms are imaging sensors.SAR is an active system that transmits microwave energy at wavelengths ranging from 3.1 cm (the X band) to 23.6 cm (the L band).Longer wavelengths, i.e., L-band, are preferable due to its deeper penetration into the forest canopy and due to saturation of radar signal at higher biomass levels [29][30][31].Especially, SAR data with long wavelength, horizontal and vertical polarizations, interferometric capabilities, and global acquisition strategy are of great value for biomass retrieval.Data from those sensors are already available as archive data from L-band Japanese Earth Resources Satellite 1 (JERS-1) and Advanced Land Observing Satellite (ALOS) Phased Array L-band Synthetic Aperture Radar (PALSAR), or as new acquisitions from ALOS-2 PALSAR-2.Moreover, new SAR missions are planned that will ensure data continuity, e.g., the European Space Agency's P-band (68.9 cm) BIOMASS mission [32] and the UK NovaSAR S-band mission [33].
Table 1 [50-52,55-60] presents a summary of the SAR retrieval statistics reported in the literature over Russian boreal forests, the same type of forest that was used in this study.The estimation error is represented as the root mean square error (RMSE) and the relative RMSE, i.e., the RMSE divided by the mean GSV or AGB.
Table 1.Summary of previous studies of growing stock volume (GSV) or aboveground biomass (AGB) estimation over Siberian boreal forests using SAR data.The estimation error is given as the root mean square error (RMSE) and the relative RMSE, i.e., the RMSE divided by the mean GSV or AGB.approach-Random Forest The study presented in this paper is based on multi-temporal ALOS PALSAR L-band backscatter intensity and coherence data.Both types of data were used as explanatory variables for AGB retrieval at a local scale of 0.25 ha.The AGB was estimated using two non-parametric machine learning algorithms: maximum entropy (MaxEnt) and Random Forests.Both models are popular in applied research.The MaxEnt approach [61] is particularly popular in species distribution modeling [62,63].However, the method was also successfully implemented for AGB estimation at regional and global scale [60,64].The Random Forests [65] are widely used for classification in ecology [66][67][68] as well as for AGB estimation [27,58,[69][70][71][72][73][74].The Random Forests were found to be superior to other methods such as support vector machine (SVM), k-nearest neighbour (KNN), Gaussian processes (GP), and stepwise linear models [75].So far Random Forests have not been compared with the MaxEnt approach.
In summary, the aim of this paper is to: 1. use SAR L-band backscatter and coherence data synergistically to improve AGB estimation at a local scale; 2. compare AGB retrieval results using two recently popular machine learning algorithms.

Study Site
The study site is located in Krasnoyarskiy Kray in the Southern part of Central Siberia, Russia, approximately 120 km northeast of the city Krasnoyarsk-part of the Bolshe Murtinsky forest enterprise (center coordinates: 57 ˝12 1 N and 93 ˝49 1 E, Figure 1).The area is characterized by a continental climate with long, severe winters and short, warm, and wet summers.From mid-October until the beginning of April, the mean temperature is approximately ´15 ˝C; in summer the mean temperature is approximately +15 ˝C.The annual precipitation is below 450 millimeters.

of 24
The study presented in this paper is based on multi-temporal ALOS PALSAR L-band backscatter intensity and coherence data.Both types of data were used as explanatory variables for AGB retrieval at a local scale of 0.25 ha.The AGB was estimated using two non-parametric machine learning algorithms: maximum entropy (MaxEnt) and Random Forests.Both models are popular in applied research.The MaxEnt approach [61] is particularly popular in species distribution modeling [62,63].However, the method was also successfully implemented for AGB estimation at regional and global scale [60,64].The Random Forests [65] are widely used for classification in ecology [66][67][68] as well as for AGB estimation [27,58,[69][70][71][72][73][74].The Random Forests were found to be superior to other methods such as support vector machine (SVM), k-nearest neighbour (KNN), Gaussian processes (GP), and stepwise linear models [75].So far Random Forests have not been compared with the MaxEnt approach.
In summary, the aim of this paper is to: 1. use SAR L-band backscatter and coherence data synergistically to improve AGB estimation at a local scale; 2. compare AGB retrieval results using two recently popular machine learning algorithms.

Study Site
The study site is located in Krasnoyarskiy Kray in the Southern part of Central Siberia, Russia, approximately 120 km northeast of the city Krasnoyarsk-part of the Bolshe Murtinsky forest enterprise (center coordinates: 57°12′N and 93°49′E, Figure 1).The area is characterized by a continental climate with long, severe winters and short, warm, and wet summers.From mid-October until the beginning of April, the mean temperature is approximately −15 °C; in summer the mean temperature is approximately +15 °C.The annual precipitation is below 450 millimeters.The research area covers almost 2000 km 2 .The area is mainly characterized by needleleaf, coniferous forests.The dominant trees are pine, spruce, fir, and larch.The main disturbances are logging activities and fire events.The study site is characterized by gentle topography with heights from 90 m to 572 m above sea level (a.s.l.) with an average height of 243 m a.s.l.The slopes range from 0° to 54° (riverside), with a mean value of 6°.The research area covers almost 2000 km 2 .The area is mainly characterized by needleleaf, coniferous forests.The dominant trees are pine, spruce, fir, and larch.The main disturbances are logging activities and fire events.The study site is characterized by gentle topography with heights from 90 m to 572 m above sea level (a.s.l.) with an average height of 243 m a.s.l.The slopes range from 0 ˝to 54 ˝(riverside), with a mean value of 6 ˝.

Available Data
The ALOS PALSAR L-band data were used for the explanatory variables.The data were delivered in Single Look Complex (SLC) Level 1.1 format.The data were provided by the Japan Aerospace Exploration Agency (JAXA) within the third phase of the Kyoto and Carbon Initiative [76,77].In total, 19 scenes were available for this study area (Table 2).The weather data were downloaded from http://www.sibessc.uni-jena.de/[78].
Ten scenes were acquired between 2006 and 2011 in fine beam single (FBS) mode and nine scenes were obtained in fine beam dual (FBD) mode.In the case of the FBS mode, the data were collected in horizontally transmitted and horizontally received polarization (HH), and in the case of the FBD mode, the data were given in HH and horizontally transmitted and vertically received (HV) polarizations.
The spatial resolution of the single-look image is 9.37 m in range and 3.14 m in azimuth for data acquired in the FBD mode and 4.68 m in range and 3.14 m in azimuth for data acquired in the FBS mode.The acquisition angle is 34.3 The forest inventory data were used as the dependent variable.The field data were provided by the Russian State Forest Inventory within the SIBERIA project [79].The inventory dates back to 1998.Because of the time difference between inventory data and the SAR data acquisitions (>10 years) the field data were updated using semi-empirical phytomass models [80] and growth (yield) tables [81] [81].
The original data were gathered in the framework of the Russian forest inventory and planning (FIP) and are available in GIS vector format.Attributes such as GSV, age, tree height, diameter at breast height, and species composition were provided.All information is given for a forest stand, i.e. a group of trees occupying a specific area uniform in species composition, size, age, and management strategy.In total, information about 1604 stands was available.

Forest Inventory Data
The available forest inventory data are for 1998.Therefore, the first step of AGB retrieval was an update of the reference data.The data were improved to the year 2010.The data update consisted of four stages.First, the forest stands that changed from forest to non-forest were excluded by visual interpretation using very high and high resolution optical data.Cloud-free images were selected from KOMPSAT-2 and RapidEye.The data were acquired from spring to autumn in 2010, 2011, and 2012.Then, from the resulting stands only those were selected in which at least 60% of the trees belong to a single species.The reason for the selection is that the growth (yield) tables were done for dominant tree species.In the second step, the stands were used in the semi-empirical phytomass models.The basis for the improvement of the old inventory data is a site index (SI).SI is defined as the edaphic and climatic characteristics of a site that have an impact on the growth and yield of a given tree species [82].Usually SI classes are determined by the relationship between the mean tree height and the mean age of a stand.As the SI was not available in the original forest inventory data it was calculated from the following equation [83]: where H t represents tree height, A denotes forest stand age, and ∆ is the interval between site indexes.
In the case of Russian forests, the SI from Ib, Ia, I-V, Va, and Vb are denoted as where the class with the lowest SI indicates the best site conditions for forest trees to grow.The site indexes II and III are dominant in the study area with fir, birch, and aspen as the dominant species (Figure 2).

Forest Inventory Data
The available forest inventory data are for 1998.Therefore, the first step of AGB retrieval was an update of the reference data.The data were improved to the year 2010.The data update consisted of four stages.First, the forest stands that changed from forest to non-forest were excluded by visual interpretation using very high and high resolution optical data.Cloud-free images were selected from KOMPSAT-2 and RapidEye.The data were acquired from spring to autumn in 2010, 2011, and 2012.Then, from the resulting stands only those were selected in which at least 60% of the trees belong to a single species.The reason for the selection is that the growth (yield) tables were done for dominant tree species.In the second step, the stands were used in the semi-empirical phytomass models.The basis for the improvement of the old inventory data is a site index (SI).SI is defined as the edaphic and climatic characteristics of a site that have an impact on the growth and yield of a given tree species [82].Usually SI classes are determined by the relationship between the mean tree height and the mean age of a stand.As the SI was not available in the original forest inventory data it was calculated from the following equation [83]: ) .
(2) where represents tree height, A denotes forest stand age, and ∆ is the interval between site indexes.In the case of Russian forests, the SI from Ib, Ia, I-V, Va, and Vb are denoted as where the class with the lowest SI indicates the best site conditions for forest trees to grow.The site indexes II and III are dominant in the study area with fir, birch, and aspen as the dominant species (Figure 2).After calculating the SI, the growth rate was derived by implementing the Richards-Chapman growth function in polynomial quadratic form [81]: where A denotes age and c i are the parameters that have an ecological interpretation and depend on site index (SI) [81]: After obtaining the GSV growth rate, the correction coefficient GSV cc was calculated: where GSV in´situ represents the GSV measured in the field, whereas GSV model is the GSV calculated according to the models [81] for a particular site index and forest stand age.A new GSV was then derived: where A di f f is the age difference between the old inventory and reference year 2010 and equals 12.
Thirdly, GSV were converted to AGB.Based on freely available in situ measurements of forest live biomass (phytomass) [83], a regional allometric model relating GSV to AGB was developed (Figure 3).
The final step of the reference data processing was to rasterize the inventory data to 50 m spatial resolution, and then to erode the stands.An erosion of two pixels was used (100 m) to avoid border effects in the SAR products.Then, the stands were converted into points using the center of gravity (centroid) of the stand.The final updated AGB values ranged from 0 to 224 t¨ha ´1, with a mean value of 98 t¨ha ´1.The stand size with erosion varied from 0.5 to 130 ha, with a mean value of 16 ha that corresponds to 64 pixels.The stand age varied from 20 to 290 years.Six hundred forty-one stands remained for the investigations.
After calculating the SI, the growth rate was derived by implementing the Richards-Chapman growth function in polynomial quadratic form [81]: where denotes age and are the parameters that have an ecological interpretation and depend on site index ( ) [81]: After obtaining the GSV growth rate, the correction coefficient was calculated: where represents the GSV measured in the field, whereas is the GSV calculated according to the models [81] for a particular site index and forest stand age.A new GSV was then derived: where is the age difference between the old inventory and reference year 2010 and equals 12. Thirdly, GSV were converted to AGB.Based on freely available in situ measurements of forest live biomass (phytomass) [83], a regional allometric model relating GSV to AGB was developed (Figure 3).
The final step of the reference data processing was to rasterize the inventory data to 50 m spatial resolution, and then to erode the stands.An erosion of two pixels was used (100 m) to avoid border effects in the SAR products.Then, the stands were converted into points using the center of gravity (centroid) of the stand.The final updated AGB values ranged from 0 to 224 t•ha −1 , with a mean value of 98 t•ha −1 .The stand size with erosion varied from 0.5 to 130 ha, with a mean value of 16 ha that corresponds to 64 pixels.The stand age varied from 20 to 290 years.Six hundred forty-one stands remained for the investigations.

SAR Data Processing and Data Selection
The first pre-processing steps of SAR data included data calibration and multi-looking.In order to obtain data with squared pixels of approximately 50 m spatial resolution, the following multi-looking factors (range x azimuth) were used: FBS mode 6 × 15, FBD mode 3 × 15.Thereafter, the SAR

SAR Data Processing and Data Selection
The first pre-processing steps of SAR data included data calibration and multi-looking.In order to obtain data with squared pixels of approximately 50 m spatial resolution, the following multi-looking factors (range x azimuth) were used: FBS mode 6 ˆ15, FBD mode 3 ˆ15.Thereafter, the SAR backscattering coefficient was calculated as γ ˝, which includes a correction of the backscatter for the local incidence angle θ i [84]: where σ 0 is the backscattering coefficient and θ the incidence angle measured at mid-swathe (34.3 ˝).
A f lat and A slope represent the local and the true pixel area, respectively.The cosine of the local incidence angle θ i corrects the radiometry of backscatter for local slopes and converts the data from σ ˝to γ 0 .This correction is known as a topographic normalization based on the local incidence angle and pixel area.In total eight backscatter images acquired in FBD mode were generated.The data selection was based on the recommendations from previous studies, in which it was reported that the backscatter data acquired during summer and autumn were superior to the winter acquisitions with regards to the AGB/GSV retrieval [31,85,86].
To reduce the speckle effect in the backscatter data, a filtering approach was implemented.The method was based on a multi-temporal speckle filtering calculated according to [87]: where I is the local mean value of pixels in a window with a center at (x, y) in image I, k = 1,..,N and N represents the number of multi-temporal images, with intensity at position (x, y) in the k image-I i px, yq .J k filters uncorrelated speckle between the images.In this study, N was equal to 8 for filtering of intensity images and window size was 5 ˆ5.No larger filtering window size was used because already large multi-looking factors were implemented.The coherence images were calculated according to [88]: where φ is the phase and E{} represents expected value.
Coherence measures the degree of correlation between two SAR images and takes values between 0 for total decorrelation and 1 for perfect correlation.Data processing consisted of co-registration at sub-pixel level (less than 0.05 pixel), common range and azimuth band filtering, and interferogram calculation and flattening.Coherence was estimated by spatial averaging within a two-dimensional window.In this study an adaptive estimation window was used with a window size between 3 ˆ3 and 5 ˆ5.A larger window size was used in the areas of low coherence in order to reduce the coherence bias.The resulting coherence was computed using the same number of looks as in the case of backscatter data (FBS mode 6 ˆ15, FBD mode 3 ˆ15 in range, and azimuth).This number of looks significantly reduced the coherence overestimation for low coherence values, as described in [89].Therefore, the coherence bias was calculated to be close to zero (for coherence equal to 0 the values were ~0.002).The coherence was generated for 19 SLC pairs (Table 3).As a master image the image acquired in 2010 was used, additionally coherence for data acquired in 2011 was calculated.The perpendicular baselines B n were between 224 and 3829 m and thus shorter than the critical baselines: 14.7 and 7.3 km for the FBS and FBD modes, respectively.Five coherence images were considered for the AGB retrieval.The latter were selected according to the stable weather and environmental conditions during the data acquisition [50] and a simple visual interpretation.To avoid coherence variability due to topography, slopes greater than 5 ˝were masked out (mainly riverside areas).The mask was implemented to all SAR products used for AGB estimation.The SAR images were geocoded and normalized using the Shuttle Radar Topography Mission (SRTM) 90 m digital elevation model version 4.1 [90,91].The SAR pre-processing was performed with the GAMMA Interferometric SAR Processor [92].The backscatter data were geocoded using the bicubic-log spline resampling method while the coherence data were processed with the bicubic spline interpolation approach.The final spatial resolution of the SAR products was 50 m.In addition to the backscatter and coherence data, the normalized ratio of the backscatter in linear scale and coherence, R n , was calculated: The rationale behind the ratio calculation is purely statistical.The ratio was introduced to enhance the backscatter relation to AGB and to reduce the number of potential outliers influencing the AGB retrieval error.In this approach, coherence γ values are considered as weighting factors for the backscatter γ ˝(linear scale) and strengthen the response over forested areas.The ratio calculation was performed using a reference coherence image with a temporal baseline of 46 days.The SLC data for the coherence calculation were acquired in winter 2010, namely on 5 January 2010 and 20 February 2010.This coherence was selected due to the highest dynamic range of coherence values resulted from optimal and similar environmental conditions during the data acquisition, as well as the short perpendicular baseline B n .The resulting ratio values over forested areas are relatively high compared to the values over non-forested and sparsely forested areas.For example, a value of 0.16 of backscatter on a linear scale with a corresponding coherence value of 0.2 (dense forest) is approximately 0.8, whereas a backscatter value of 0.07 with a coherence value of 0.6 (non-forest) is approximately 0.1.In order to adjust the ratio values to a single scale, the values were normalized and ranged from 0 to 1.A normalized ratio between available backscatter data (in linear scale) acquired on 23 August 2010 and 8 October 2010 in both polarizations and the coherence between acquisitions taken in 2010 (5 January and 20 February) were calculated.The resulting range of calculated values is illustrated by box plots (Figure 4, plot A).
On each box the central mark is the median, the edges of the box are the 25th and 75th percentiles, and the whiskers extend to the extreme values.Most of the values varied from 0 to approximately 0.2 show an almost linear relationship up to approximately 60 t¨ha ´1 and a non-linear relationship for higher AGB values (Figure 4B,C).The values close to the upper boundary represented small heterogeneous stands in the forest inventory data (<8 ha).In total, 13 SAR products were selected for AGB retrieval.The examples of the generated SAR products are presented in Figure 5.In total, 13 SAR products were selected for AGB retrieval.The examples of the generated SAR products are presented in Figure 5.In total, 13 SAR products were selected for AGB retrieval.The examples of the generated SAR products are presented in Figure 5.

AGB Retrieval Models
In this study, two non-parametric data fusion machine learning algorithms were considered: maximum entropy (MaxEnt) and Random Forests.In both cases the updated forest inventory was used as the response data.The model's training was done on 90% of the sample size (577 samples), whereas 10% was used for independent validation (64 samples).For the selection of training and validation data, a stratified sampling was implemented.Only 10% of the response data was used for the independent validation as the mentioned algorithms calculate unbiased model error using 25% in the case of MaxEnt and approximately 1/3 in the case of Random Forests of the data randomly excluded from the training data.
The first approach was a model based on the MaxEnt algorithm.The model was run using the MaxEnt program for maximum entropy modeling version 3.3.3k (under Java Runtime Environment).The MaxEnt is an exponential model that can be compared with the generalized linear (GLM) and generalized additive (GAM) models.The concept of the method is to estimate the probability distribution of maximum entropy constrained by a set of remote sensing variables.Through numerous iterations, the weights of these variables are adjusted to maximize the average sample likelihood (training gain).The weights are then used to estimate the distribution over the whole space for each of the AGB classes.In this study, AGB reference data were divided into seven AGB classes, in t¨ha ´1: 0-40, 40-60, 60-80, 80-100, 100-120, 120-140, and >140.The values were grouped such that each class could be represented equally in terms of occurrences.The classes' representation is illustrated as a histogram (Figure 6).

AGB Retrieval Models
In this study, two non-parametric data fusion machine learning algorithms were considered: maximum entropy (MaxEnt) and Random Forests.In both cases the updated forest inventory was used as the response data.The model's training was done on 90% of the sample size (577 samples), whereas 10% was used for independent validation (64 samples).For the selection of training and validation data, a stratified sampling was implemented.Only 10% of the response data was used for the independent validation as the mentioned algorithms calculate unbiased model error using 25% in the case of MaxEnt and approximately 1/3 in the case of Random Forests of the data randomly excluded from the training data.
The first approach was a model based on the MaxEnt algorithm.The model was run using the MaxEnt program for maximum entropy modeling version 3.3.3k (under Java Runtime Environment).The MaxEnt is an exponential model that can be compared with the generalized linear (GLM) and generalized additive (GAM) models.The concept of the method is to estimate the probability distribution of maximum entropy constrained by a set of remote sensing variables.Through numerous iterations, the weights of these variables are adjusted to maximize the average sample likelihood (training gain).The weights are then used to estimate the distribution over the whole space for each of the AGB classes.In this study, AGB reference data were divided into seven AGB classes, in t•ha −1 : 0-40, 40-60, 60-80, 80-100, 100-120, 120-140, and >140.The values were grouped such that each class could be represented equally in terms of occurrences.The classes' representation is illustrated as a histogram (Figure 6).Then, the AGB value was calculated for each pixel using the maximum probability weighted average [64]: where i refers to the class number, is the MaxEnt probability, is the estimated average biomass range, and is the predicted value of AGB for each pixel.As an input for the MaxEnt algorithm, all available variables were used and two groups of data were distinguished.The first one contained unfiltered data, whereas the second one was with backscatter data filtered according to Equation (8).Then, the AGB value was calculated for each pixel using the maximum probability weighted average [64]: where i refers to the class number, P i is the MaxEnt probability, AGB i is the estimated average biomass range, and z AGB is the predicted value of AGB for each pixel.As an input for the MaxEnt algorithm, all available variables were used and two groups of data were distinguished.The first one contained unfiltered data, whereas the second one was with backscatter data filtered according to Equation (8).
In the MaxEnt program the resampling of the data for each replication is done by bootstrapping, whereas the jackknife tests are used for calculation of variable contribution.The jackknife tests are generated using regularized gain and AUC.The jackknife test is used in two cases: withhold one predictor and refit model and withhold all predictors but one and refit the model.To determine the variable percent contribution, in each iteration of the training algorithm the increase in regularized gain was added to the contribution of the corresponding variable, or subtracted from it if the resulted value was negative.At the end the values were converted to percentages.
The second approach was based on a supervised Random Forests ® regression approach, available as the randomForest package in the R software [93].It is an ensemble learning method that operates by constructing a large number of trees by randomly selecting the predictors and then calculating a mean prediction from all individual regression trees.Each tree is constructed using a different randomly permuted sample from the input dataset.One-third of the data are left out of the bootstrap sample and not used in the construction of the tree.This sample is called out-off-bag (OOB) and is used to obtain an unbiased estimate for the retrieval error, OOB error.
The randomForest package also provides measures for evaluating the importance of the different predictors in the model development.In this study, we used the measure that is computed from permuting OOB data and permuting each predictor variable for each tree.The difference between the OOB MSE and predictor MSE are then averaged over all trees, and normalized by the standard deviation of the differences.
In order to evaluate the accuracy and performance of the implemented models, the following quantitative measures were considered for the independent validation: a. root mean square error (RMSE) is defined as: where AGB re f piq represents AGB reference value for stand i, z AGB i predicted AGB and n a number of AGB observations.b. corrected root-mean-square error (rRMSE cor ) is defined as: where RMSE Sat represents the root mean square error in a satellite-derived estimation of AGB and RMSE Re f is the root mean square error in the forest inventory data.According to the manual on forest inventory and planning in Russian forests, the maximum error of GSV is expected to be 15% [94].This value was also considered in the AGB estimation because GSV is the main component of AGB.
According to [95], GSV constitutes in our test site 73% of AGB.Additionally, relative RMSEs were calculated by dividing the RMSEs by the mean AGB and multiplying by 100%.c. bias is defined as the mean of estimation error: bias " where positive values of bias expresses overestimation, and negative values underestimation.The models' predictive performance was evaluated using pseudo R-squared (1-MSE/variance) [93] in the case of the Random Forests and an area under receiver operating characteristic curves (AUC) [96] in the case of the MaxEnt.The parameters were generated using the models' testing samples.

MaxEnt Performance
As an output of MaxEnt seven continuous probability distribution maps with pixel values from 0 to 1 were obtained, where 1 is a high predicted probability of being classified to a specified AGB class.The probability values were then used to calculate the final AGB map according to Equation (11).
To assess the MaxEnt algorithm performance, AUC was calculated by bootstrapping 25% of the training data.The AUC was computed for each AGB class (Figure 7).
The AUC values are higher than 0.7, which shows that the model performed well.Only in the case of filtered data or the 60-80 t¨ha ´1 AGB class was the AUC was lower than 0.6.The highest predictive power of the model was observed for the high and the low AGB ranges.The mean AUC for all AGB classes was calculated to be 0.76 and 0.77 for unfiltered and filtered datasets, respectively.
To assess the importance of the predictors, the variable percent of contribution was calculated.A high value indicates that the model depends more on that variable (Figure 8A,B).

MaxEnt Performance
As an output of MaxEnt seven continuous probability distribution maps with pixel values from 0 to 1 were obtained, where 1 is a high predicted probability of being classified to a specified AGB class.The probability values were then used to calculate the final AGB map according to Equation (11).
To assess the MaxEnt algorithm performance, AUC was calculated by bootstrapping 25% of the training data.The AUC was computed for each AGB class (Figure 7).
The AUC values are higher than 0.7, which shows that the model performed well.Only in the case of filtered data or the 60-80 t•ha −1 AGB class was the AUC was lower than 0.6.The highest predictive power of the model was observed for the high and the low AGB ranges.The mean AUC for all AGB classes was calculated to be 0.76 and 0.77 for unfiltered and filtered datasets, respectively.
To assess the importance of the predictors, the variable percent of contribution was calculated.A high value indicates that the model depends more on that variable (Figure 8A,B).In the case of the dataset with the unfiltered backscatter, the variable that mostly decreased model performance when it was omitted was coherence between data acquired on 17 February 2009 and 20 February 2010.For the filtered dataset, it was the ratio of backscatter in HH polarization acquired on 8 October 2010.Therefore, the coherence and normalized ratio appear to contain the most information that is not present in the other variables.The backscatter acquired on 23 August 2010 in HH and on 8 October 2010 in HH polarization seem to be of less importance for AGB prediction.

MaxEnt Performance
As an output of MaxEnt seven continuous probability distribution maps with pixel values from 0 to 1 were obtained, where 1 is a high predicted probability of being classified to a specified AGB class.The probability values were then used to calculate the final AGB map according to Equation (11).
To assess the MaxEnt algorithm performance, AUC was calculated by bootstrapping 25% of the training data.The AUC was computed for each AGB class (Figure 7).
The AUC values are higher than 0.7, which shows that the model performed well.Only in the case of filtered data or the 60-80 t•ha −1 AGB class was the AUC was lower than 0.6.The highest predictive power of the model was observed for the high and the low AGB ranges.The mean AUC for all AGB classes was calculated to be 0.76 and 0.77 for unfiltered and filtered datasets, respectively.
To assess the importance of the predictors, the variable percent of contribution was calculated.A high value indicates that the model depends more on that variable (Figure 8A,B).In the case of the dataset with the unfiltered backscatter, the variable that mostly decreased model performance when it was omitted was coherence between data acquired on 17 February 2009 and 20 February 2010.For the filtered dataset, it was the ratio of backscatter in HH polarization acquired on 8 October 2010.Therefore, the coherence and normalized ratio appear to contain the most information that is not present in the other variables.The backscatter acquired on 23 August 2010 in HH and on 8 October 2010 in HH polarization seem to be of less importance for AGB prediction.In the case of the dataset with the unfiltered backscatter, the variable that mostly decreased model performance when it was omitted was coherence between data acquired on 17 February 2009 and 20 February 2010.For the filtered dataset, it was the ratio of backscatter in HH polarization acquired on 8 October 2010.Therefore, the coherence and normalized ratio appear to contain the most information that is not present in the other variables.The backscatter acquired on 23 August 2010 in HH and on 8 October 2010 in HH polarization seem to be of less importance for AGB prediction.
The variable importance was also analyzed per AGB class.To better illustrate how the variable contribution changes among the seven AGB classes, the data were distinguished between coherence, backscatter, and normalized ratio (Figure 9).

J. Imaging 2016, 2, 0001
The variable importance was also analyzed per AGB class.To better illustrate how the variable contribution changes among the seven AGB classes, the data were distinguished between coherence, backscatter, and normalized ratio (Figure 9). Figure 9 confirms with a mean percent contribution value of 51.2% that in the case of unfiltered data the most important variables were coherence data.The data were the most important in four AGB classes: 0-40, 40-60, 60-80, and >140 t•ha −1 .The mean percent contribution for the ratio products was 36.8% and for the backscatter products was 12%.Therefore, the backscatter seems to provide the least information for MaxEnt.When the filtered data were used, the most important group of data was the ratio layers, with a mean percent contribution of 47.7%.The data were the most important in two classes, i.e., AGB classes 80-100 and 100-120 t•ha −1 .The second most important type of data was the coherence products, with a mean percent contribution of 38.3%.The data contributed most to the AGB retrieval for three AGB classes 60-80, 120-140, and >140 t•ha −1 .The backscatter data provided the least information for the retrieval.The mean percent contribution was 13.9%.In both plots the rise of the ratio importance is related to the decrease of the influence of the backscatter and coherence.

Random Forests Performance
The higher percent of variance explained was calculated for the dataset with filtered backscatter, the value of the pseudo R-squared was 38.4%.In the case of the unfiltered dataset the pseudo Rsquared was 36.4%.The values were calculated based on the testing sample (OOB sample).Overall, the Random Forests predictor importance ranking (Figure 10) revealed only small differences between the datasets with unfiltered and filtered backscatter data.The ranking showed that of the 13 predictors the normalized ratio between the backscatter in HV polarization acquired on 8 October 2010 and coherence (5 January & 20 February 2010) was the most important for the retrieval of AGB with values of 26% and 28.7% for the unfiltered and filtered datasets, respectively.The mean value of the increase in MSE in the case of all ratio products was 22.2% and 24.9%.
The second most important data group in the case of the both datasets was backscatter products.The mean value of the increase in MSE was 17.4% and 17.9%.Random Forests suggested that coherence products had the smallest influence on AGB retrieval with a mean value of 16.4% and 15.4% for unfiltered and filtered datasets, respectively.The mean percent contribution for the ratio products was 36.8% and for the backscatter products was 12%.Therefore, the backscatter seems to provide the least information for MaxEnt.When the filtered data were used, the most important group of data was the ratio layers, with a mean percent contribution of 47.7%.The data were the most important in two classes, i.e., AGB classes 80-100 and 100-120 t¨ha ´1.The second most important type of data was the coherence products, with a mean percent contribution of 38.3%.The data contributed most to the AGB retrieval for three AGB classes 60-80, 120-140, and >140 t¨ha ´1.The backscatter data provided the least information for the retrieval.The mean percent contribution was 13.9%.In both plots the rise of the ratio importance is related to the decrease of the influence of the backscatter and coherence.

Random Forests Performance
The higher percent of variance explained was calculated for the dataset with filtered backscatter, the value of the pseudo R-squared was 38.4%.In the case of the unfiltered dataset the pseudo R-squared was 36.4%.The values were calculated based on the testing sample (OOB sample).Overall, the Random Forests predictor importance ranking (Figure 10) revealed only small differences between the datasets with unfiltered and filtered backscatter data.The ranking showed that of the 13 predictors the normalized ratio between the backscatter in HV polarization acquired on 8 October 2010 and coherence (5 January & 20 February 2010) was the most important for the retrieval of AGB with values of 26% and 28.7% for the unfiltered and filtered datasets, respectively.The mean value of the increase in MSE in the case of all ratio products was 22.2% and 24.9%.
The second most important data group in the case of the both datasets was backscatter products.The mean value of the increase in MSE was 17.4% and 17.9%.Random Forests suggested that coherence products had the smallest influence on AGB retrieval with a mean value of 16.4% and 15.4% for unfiltered and filtered datasets, respectively.

AGB Mapping Results
Figures 11 to 14 show SAR-derived AGB maps with a spatial resolution of 50 m.Each map presents AGB retrieval results expressed in t•ha −1 .The first two maps show AGB values derived by the MaxEnt algorithm; the other two are by the Random Forests.Examining the results with a simple visual interpretation, it can be noticed that the maps differ taking into account the spatial variability of AGB values.In the case of the AGB maps generated by MaxEnt, the values seem to be more heterogonous both in high and low biomass ranges.
The range and the mean value of the retrieved AGB for each map are given in Table 4.In the case of the MaxEnt algorithm, the AGB ranged from 0 to 140 (150 t•ha −1 ), whereas the values computed by the Random Forests were higher by 10 t•ha −1 .The mean values derived by MaxEnt were lower than 90 t•ha −1 and in the case of Random Forests greater than 95 t•ha −1 .

AGB Mapping Results
Figures 11-14 show SAR-derived AGB maps with a spatial resolution of 50 m.Each map presents AGB retrieval results expressed in t¨ha ´1.The first two maps show AGB values derived by the MaxEnt algorithm; the other two are by the Random Forests.Examining the results with a simple visual interpretation, it can be noticed that the maps differ taking into account the spatial variability of AGB values.In the case of the AGB maps generated by MaxEnt, the values seem to be more heterogonous both in high and low biomass ranges.
The range and the mean value of the retrieved AGB for each map are given in Table 4.In the case of the MaxEnt algorithm, the AGB ranged from 0 to 140 (150 t¨ha ´1), whereas the values computed by the Random Forests were higher by 10 t¨ha ´1.The mean values derived by MaxEnt were lower than 90 t¨ha ´1 and in the case of Random Forests greater than 95 t¨ha ´1.

AGB Mapping Results
Figures 11 to 14 show SAR-derived AGB maps with a spatial resolution of 50 m.Each map presents AGB retrieval results expressed in t•ha −1 .The first two maps show AGB values derived by the MaxEnt algorithm; the other two are by the Random Forests.Examining the results with a simple visual interpretation, it can be noticed that the maps differ taking into account the spatial variability of AGB values.In the case of the AGB maps generated by MaxEnt, the values seem to be more heterogonous both in high and low biomass ranges.
The range and the mean value of the retrieved AGB for each map are given in Table 4.In the case of the MaxEnt algorithm, the AGB ranged from 0 to 140 (150 t•ha −1 ), whereas the values computed by the Random Forests were higher by 10 t•ha −1 .The mean values derived by MaxEnt were lower than 90 t•ha −1 and in the case of Random Forests greater than 95 t•ha −1 .To better observe the differences in spatial distribution, the difference maps between updated forest inventory (in situ data) and SAR-derived AGB were calculated (Figure 15).Green represents overestimation, whereas red is underestimation.In yellow are the AGB values estimated correctly.
In general, there are almost no differences between maps generated using unfiltered and filtered datasets.In the case of MaxEnt, the retrieved AGB values are displayed in green, which means overestimation.The Random Forests tends to underestimate.The AGB values are displayed in red and orange.In both cases, the overestimation can be seen on the borders of the stands.The underestimation is observed for stands with high AGB values.To better observe the differences in spatial distribution, the difference maps between updated forest inventory (in situ data) and SAR-derived AGB were calculated (Figure 15).Green represents overestimation, whereas red is underestimation.In yellow are the AGB values estimated correctly.
In general, there are almost no differences between maps generated using unfiltered and filtered datasets.In the case of MaxEnt, the retrieved AGB values are displayed in green, which means overestimation.The Random Forests tends to underestimate.The AGB values are displayed in red and orange.In both cases, the overestimation can be seen on the borders of the stands.The underestimation is observed for stands with high AGB values.To better observe the differences in spatial distribution, the difference maps between updated forest inventory (in situ data) and SAR-derived AGB were calculated (Figure 15).Green represents overestimation, whereas red is underestimation.In yellow are the AGB values estimated correctly.
In general, there are almost no differences between maps generated using unfiltered and filtered datasets.In the case of MaxEnt, the retrieved AGB values are displayed in green, which means overestimation.The Random Forests tends to underestimate.The AGB values are displayed in red and orange.In both cases, the overestimation can be seen on the borders of the stands.The underestimation is observed for stands with high AGB values.

Validation
Table 5 summarizes the accuracies of the MaxEnt and the Random Forests AGB retrieval when using backscatter data (filtered and unfiltered), coherence images, and normalized ratio products.The corrected RMSE and relative corrected RMSE were calculated for a training sample and an independent sample.It should be noted that a difference of more than 10 t¨ha ´1 in the case of the Random Forests between corrected RMSE calculated using training and validation datasets could result from the small size of validation sample.

Discussion
In general, the MaxEnt machine learning algorithm as well as the Random Forests regression approach provided good AGB retrieval results compared to previous studies [51, 52,55,58,60].The reported analyses also used ALOS PALSAR L-band data for AGB/GSV retrieval over Siberian boreal forests.The researchers reported RMSEs between 33 and 87 m 3 ¨ha ´1 and 55 t¨ha ´1 at the coarser scales, and 36.4 t¨ha ´1 at 0.25 ha scale.Those results are similar to or worse than those derived in this study.The results presented in this paper showed that a relatively high estimation accuracy (down to 30%) can be obtained at a local scale.The AGB estimation showed only slightly better results when a dataset with filtered backscatter intensity was used.
When the model performance is taken into account, the MaxEnt performed better than the Random Forests.The area under the receiver operator curve (AUC) was higher than 0.7, except the AGB range from 60 to 80 t¨ha ´1, whereas Random Forests reached an R 2 of 38.4%.MaxEnt generated AGB maps with dominantly overestimated AGB values, whereas Random Forests provided slightly underestimated values.The range of derived AGB values was underestimated but similar and differs by 10 t¨ha ´1 between the applied models.The mean AGB provided by Random Forests was comparable to the reference AGB mean.MaxEnt showed an underestimation of approximately 10 t¨ha ´1.The estimation bias was lower in the case of Random Forests.
MaxEnt and Random Forests provided measures for evaluating the importance of the predictors used in the model construction.The algorithms showed no agreement between derived variable rankings.In the case of MaxEnt, the coherence together with the ratio products were the most important for model construction, whereas ratio and backscatter products provided the most information in the case of modeling with Random Forests.In both cases, the ratio products seem to provide important information for AGB retrieval.Within the data groups, in the case of the MaxEnt the most important variables were the coherence between images acquired on 17 February 2009 and 20 February 2010, the normalized ratio between the backscatter from 23 August 2010 in HV and coherence, the backscatter from 23 August 2010 in HV polarization for unfiltered dataset.For filtered dataset, the most important variables within the data groups were the normalized ratio between the backscatter from 8 October 2010 in HH polarization and coherence, the coherence between images acquired on 17 February 2009 and 20 February 2010, and the backscatter image acquired on 23 August 2010 in HV polarization.In the case of Random Forests, the most important variables were the normalized ratio between the backscatter acquired on 8 October 2010 in HV polarization and coherence, the backscatter acquired on 23 August 2010 in HH polarization, and the coherence between images from 2 January 2009 and 5 January 2010 for both unfiltered and filtered datasets.The coherence images generated with the temporal baseline (B t ) greater than 46 days were superior to those derived with shorter B t .

Conclusions
In this study we have demonstrated the feasibility of synergistic usage of backscatter and coherence for aboveground biomass (AGB) retrieval for boreal forests in Siberia at a local scale of 0.25 ha.This research was focused on the further exploitation of SAR data.The ALOS PALSAR L-band backscatter was combined with coherence, introducing the backscatter-coherence normalized ratio.The latter was developed based on the statistical data analysis.In total 13 variables were used for the AGB estimation.For the AGB retrieval two popular machine learning algorithms were implemented.The MaxEnt and the Random Forests performed well, showing promising AGB estimations and demonstrating the model's robustness.The corrected RMSEs were between 35.8 and 36.4 t¨ha ´1 and between 35.8 and 35.0 t¨ha ´1 for MaxEnt and Random Forests, respectively.The estimation error slightly decreased, by approximately 1%, when the filtered backscatter intensity was used.In this study, the retrieval of AGB using the SAR products was demonstrated only for Siberian unmanaged forests.It is expected that the estimation error over well-managed forests could be further reduced.Estimation improvement is also foreseen at the stand level due to the reduction of the spatial variability in the SAR data.Another issue that could have an influence on the retrieval accuracy is the reference data.Using optical remote sensing data and recommended yield tables and semi-empirical phytomass models in Russian forestry and forest management, the authors updated the old inventory data.Unfortunately, it was not possible to fully validate the obtained values in the field, hence the error in the reference data was only partially known.
The models provided different variable importance rankings.However, in both cases the normalized ratio products seem to contain important information for the model development.The coherence data were the most important in the low and high AGB ranges, whereas the ratio was most important for middle to high AGB ranges.Thus, a strategy of using different datasets for estimation of low, medium, and high AGB values could further increase biomass retrieval accuracy.It was observed that the backscatter data increased their contribution in the model construction after filtering.In terms of the retrieved AGB values, the Random Forests algorithm provided AGB mean estimation almost the same as the reference value.MaxEnt provided slightly overestimated AGB values, whereas Random Forests tended to underestimate the AGB values.
The MaxEnt and Random Forests machine learning algorithms demonstrated their potential use for forestry applications, especially for estimations in remote areas.Often no information about AGB is available in those regions.The models could be used to provide AGB estimations with relatively low estimation errors.Moreover, the results generated for different time spans could easily be applied for AGB change monitoring, which is very important from the carbon account calculations perspective.

Figure 1 .
Figure 1.Extent of the study area and the spatial distribution of the reference points.Background image acquired by the Landsat 5 TM satellite (data available from the U.S. Geological Survey Earth Explorer).

Figure 1 .
Figure 1.Extent of the study area and the spatial distribution of the reference points.Background image acquired by the Landsat 5 TM satellite (data available from the U.S. Geological Survey Earth Explorer).

Figure 3 .
Figure 3. Allometric model relating AGB to GSV using in situ measurements of forest phytomass.

Figure 3 .
Figure 3. Allometric model relating AGB to GSV using in situ measurements of forest phytomass.

Figure 4 .
Figure 4. (A) presents box plots for four calculated ratios between backscatter images (acquired on 23 August 2010 and 8 October 2010) and coherence acquired in winter 2010 (5 January and 20 February); (B) and (C) present the ratio as a function of aboveground biomass (AGB).

Figure 4 .
Figure 4. (A) presents box plots for four calculated ratios between backscatter images (acquired on 23 August 2010 and 8 October 2010) and coherence acquired in winter 2010 (5 January and 20 February); (B) and (C) present the ratio as a function of aboveground biomass (AGB).

Figure 4 .
Figure 4. (A) presents box plots for four calculated ratios between backscatter images (acquired on 23 August 2010 and 8 October 2010) and coherence acquired in winter 2010 (5 January and 20 February); (B) and (C) present the ratio as a function of aboveground biomass (AGB).

Figure 6 .
Figure 6.Number of measurements in selected AGB class.

Figure 6 .
Figure 6.Number of measurements in selected AGB class.

Figure 8 .
Figure 8. Predictor importance presented as percent contribution (A) dataset with unfiltered and (B) with filtered backscatter data.

Figure 8 .
Figure 8. Predictor importance presented as percent contribution (A) dataset with unfiltered and (B) with filtered backscatter data.

Figure 8 .
Figure 8. Predictor importance presented as percent contribution (A) dataset with unfiltered and (B) with filtered backscatter data.

Figure 9
Figure9confirms with a mean percent contribution value of 51.2% that in the case of unfiltered data the most important variables were coherence data.The data were the most important in four AGB classes: 0-40, 40-60, 60-80, and >140 t¨ha ´1.The mean percent contribution for the ratio products was 36.8% and for the backscatter products was 12%.Therefore, the backscatter seems to provide the least information for MaxEnt.When the filtered data were used, the most important group of data was the ratio layers, with a mean percent contribution of 47.7%.The data were the most important in two classes, i.e., AGB classes 80-100 and 100-120 t¨ha ´1.The second most important type of data was the coherence products, with a mean percent contribution of 38.3%.The data contributed most to the AGB retrieval for three AGB classes 60-80, 120-140, and >140 t¨ha ´1.The backscatter data provided the least information for the retrieval.The mean percent contribution was 13.9%.In both plots the rise of the ratio importance is related to the decrease of the influence of the backscatter and coherence.

Figure 10 .
Figure 10.Predictor importance presented as increase in MSE for(A) unfiltered dataset and (B) dataset with filtered backscatter data.

Figure 11 .
Figure 11.AGB map generated using the MaxEnt algorithm and a dataset with unfiltered backscatter data.

Figure 10 .
Figure 10.Predictor importance presented as increase in MSE for (A) unfiltered dataset and (B) dataset with filtered backscatter data.

Figure 10 .
Figure 10.Predictor importance presented as increase in MSE for(A) unfiltered dataset and (B) dataset with filtered backscatter data.

Figure 11 .
Figure 11.AGB map generated using the MaxEnt algorithm and a dataset with unfiltered backscatter data.

Figure 11 .
Figure 11.AGB map generated using the MaxEnt algorithm and a dataset with unfiltered backscatter data.

Figure 12 .
Figure 12.AGB map generated using the MaxEnt algorithm and a dataset with filtered backscatter data.

Figure 13 .
Figure 13.AGB map generated using the Random Forests algorithm and a dataset with unfiltered backscatter data.

Figure 12 .
Figure 12.AGB map generated using the MaxEnt algorithm and a dataset with filtered backscatter data.

Figure 12 .
Figure 12.AGB map generated using the MaxEnt algorithm and a dataset with filtered backscatter data.

Figure 13 .
Figure 13.AGB map generated using the Random Forests algorithm and a dataset with unfiltered backscatter data.

Figure 13 .
Figure 13.AGB map generated using the Random Forests algorithm and a dataset with unfiltered backscatter data.

Figure 14 .Figure 15 .
Figure 14.AGB map generated using the Random Forests algorithm and a dataset with filtered backscatter data.

Figure 14 .
Figure 14.AGB map generated using the Random Forests algorithm and a dataset with filtered backscatter data.

Figure 14 .Figure 15 .
Figure 14.AGB map generated using the Random Forests algorithm and a dataset with filtered backscatter data.
. The latter were developed by the International Institute for Applied Systems Analyses (IIASA) in collaboration with the V.N.Sukachev Institute of Forest, Siberian Branch, Russian Academy of Sciences, and Moscow State Forest University.Those models and tables are recommended for use in forestry and forest management in Russia (Protocol of the Council of Federal Agency of Forest Management No. 2, dated 8 June 2006)

Table 2 .
Summary of SAR data available for the test site.The data acquisition time was approximately 16 GMT-10 PM local time in summer or 11 PM local time in winter.The weather parameters are given as a

Table 3 .
Summary of coherence images generated for the test site.

Table 4 .
Range of derived AGB.

Table 4 .
Range of derived AGB.

Table 4 .
Range of derived AGB.

Table 5 .
Validation of AGB retrieval results.RMSE values are given for training and validation samples (training/validation).In the case of MaxEnt, the corrected RMSE was 36.4 t¨ha ´1(33.3t¨ha´1 for the training sample) and 28.7 t¨ha ´1 (35.8 t¨ha ´1 for the training sample) for unfiltered and filtered datasets, respectively.Overall, the estimation error of 39.5% (34.3% for training sample) was calculated for an unfiltered dataset and 38.8% (29.6% for the training sample) for a dataset with filtered backscatter data.The bias of 5.2 t¨ha ´1 (12 t¨ha ´1 for the training sample) was calculated in the case of unfiltered data and of 4.3 t¨ha ´1 (6.9 t¨ha ´1 for the training sample) in the case of the dataset with filtered backscatter data.The Random Forests results show similar estimation error.The corrected RMSE was 35.4 t¨ha ´1 (21.6 t¨ha ´1 for training sample) and 35.0 t¨ha ´1 (21.3 t¨ha ´1 for training sample) for unfiltered and filtered datasets, respectively.The relative corrected RMSEs of 38.4% (22.3% for the training sample) in the case of unfiltered datasets and of 36.9% (22.0% for the training sample) were calculated.The bias of ´4.4 t¨ha ´1 (3.2 t¨ha ´1 for the training sample) and of ´4.5 t¨ha ´1 (0.9 t¨ha ´1 for the training sample) was calculated in the case of unfiltered and filtered data, respectively.