Machine-Learning Applications for the Retrieval of Forest Biomass from Airborne P-Band SAR Data

: This study aimed at evaluating the potential of machine learning (ML) for estimating forest biomass from polarimetric Synthetic Aperture Radar (SAR) data. Retrieval algorithms based on two di ﬀ erent machine-learning methods, namely Artiﬁcial Neural Networks (ANNs) and Supported Vector Regressions (SVRs), were implemented and validated using the airborne polarimetric SAR data derived from the AfriSAR, BioSAR, and TropiSAR campaigns. These datasets, composed of polarimetric airborne SAR data at P-band and corresponding biomass values from in situ and LiDAR measurements, were made available by the European Space Agency (ESA) in the framework of the Biomass Retrieval Algorithm Inter-Comparison Exercise (BRIX). The sensitivity of the SAR measurements at all polarizations to the target biomass was evaluated on the entire set of data from all the campaigns, and separately on the dataset of each campaign. Based on the results of the sensitivity analysis, the retrieval was attempted by implementing general algorithms, using the entire dataset, and speciﬁc algorithms, using data of each campaign. Algorithm inputs are the SAR data and the corresponding local incidence angles, and output is the estimated biomass. To allow the comparison, both ANN and SVR were trained using the same subset of data, composed of 50% of the available dataset, and validated on the remaining part of the dataset. The validation of the algorithms demonstrated that both machine-learning methods were able to estimate the forest biomass with comparable accuracies. In detail, the validation of the general ANN algorithm resulted in a correlation coe ﬃ cient R = 0.88, RMSE = 60 t / ha, and negligible BIAS, while the speciﬁc ANN for data obtained R from 0.78 to 0.94 and RMSE between 15 and 50 t / ha, depending on the dataset. Similarly, the general SVR was able to estimate the target parameter with R = 0.84, RMSE = 69 t / ha, and BIAS negligible, while the speciﬁc algorithms obtained 0.22 ≤ R ≤ 0.92 and 19 ≤ RMSE ≤ 70 (t / ha). The study also pointed out that the computational cost is similar for both methods. In this respect, the training is the only time-demanding part, while applying the trained algorithm to the validation set or to any other dataset occurs in near real time. As a ﬁnal step of the study, the ANN and SVR algorithms were applied to the available SAR images for obtaining biomass maps from the available SAR images. layers. The connections are weighted and biased by coefficients that are iteratively adjusted during the training, thus modifying the strength of every connection. In each neuron, the inputs are weighted, biased, and added each other to produce an output value, called activation, which is then passed through the so-called transfer function and moved to the neurons of the following layer. In this study, transfer functions of linear, hyperbolic tangent sigmoid (tansig), and log-sigmoid (logsig) types were considered.


Introduction
Forests act as one of the main terrestrial carbon sinks [1]. Monitoring forest changes and estimating forest biomass are therefore mandatory for several applications, including studies on global changes, natural disaster prevention, and management of forest resources. The possibility to observe forests from data. SVR was successfully applied to the retrieval of surface parameters by the authors in [37], where this method was also compared to ANN. In this study, two algorithms for the retrieval of forest biomass, based on ANN and SVR, respectively, were implemented and validated using P-band polarimetric airborne SAR data acquired by the ESA during the BioSAR, AfriSAR, and TropiSAR campaigns [38][39][40][41][42][43]. Inputs of both algorithms are the polarimetric SAR backscattering, and output is the biomass. Considering the dependence of SAR measurements on the acquisition geometry [44][45][46][47], both algorithms have the local incidence angle (LIA) as additional input.
The paper is organized as follows. Section 2 gives an overview of the test areas and the datasets considered in this study. The ANN-and SVR-based retrieval algorithms are presented in Section 3. The sensitivity of SAR measurements to forest biomass is described in Section 4. The algorithm validation is presented in Section 5, the obtained results are discussed in Section 6 and conclusions and future work are presented in Section 7.

Experimental Sites and SAR Data
The entire dataset made available by the ESA in the framework of the Biomass Retrieval Algorithm Inter-Comparison Exercise (BRIX) [48] was considered for implementing, training, and validating both the ANN and SVR algorithms. BRIX is an initiative promoted by the ESA that is intended to intercompare different approaches for retrieving forest biomass from P-band fully polarimetric SAR sensors. The BRIX dataset was composed of the P-band polarimetric SAR data acquired during the airborne experiments carried out in preparation for BIOMASS. The calibrated and geocoded backscattering data from each campaign were delivered by the ESA through the "Testbed", as described in the BRIX documentation [48]. We referred to the support documentation of each campaign for the description of the SAR data processing and of the in situ biomass measurements [38][39][40][41][42][43]. The main dataset was derived from the joint NASA and ESA AfriSAR experiment, which was conducted in Gabon. The experiment was composed of two campaigns-the first was carried out in 2015 with the ONERA (Office National d'Études et de Recherches Aérospatiales) SETHI SAR system and the second one in 2016 with the NASA (National Aeronautics and Space Administration) Land Vegetation and Ice Sensor (LVIS) LiDAR, the NASA L-band UAVSAR, and the DLR (German Aerospace Center) F-SAR [38]. The area covered by the flights was composed of four test sites. The main site was located in Lopé National Park (0.5 • S, 11.5 • E), a UNESCO World Heritage site protected since 2007, in which high closed-canopy forests are merged with open savannas. SAR campaigns were carried out in three more sites, including the protected area of Mondah, close to Libreville (0.6 • N, 9.6 • E), the Mabounié mining site (0.7 • S, 10.7 • E), and the Rabi site (1.9 • S, 10 • E). In Mondah, integer patches of forest are mixed with others significantly disturbed by human activities. Mabounié is forested for the most part but with traces of ongoing building, while Rabi is an onshore oil-drilling site in which some areas were preserved. Ancillary LiDAR measurements were collected for generating biomass maps at 50 m to be used for validation. Direct biomass measurements were also carried out on a total of 12 plots of 50 × 50 m (33 subplots of 25 × 25 m) for calibration and validation purposes.
The other datasets were derived from the three BioSAR campaigns [39][40][41] and from the TropiSAR campaign [42,43]. The data included biomass values up to 500 t/ha, being representative of forest types that included boreal and equatorial forests. The datasets also included very low biomass values, clearly referring to non-forested areas. These data were kept in the training to make the algorithms capable of identifying the low vegetated areas as well. All the test areas had flat or gently undulated topography. For instance, the La Lopé area included a 600 m tall hill rising over a 200 m ground: thus, the effect of local slopes on the measured σ 0 was accounted for by including the LIA in the algorithm inputs.
By combining all the available data, a dataset was obtained composed of about 4500 sets of backscattering coefficients (σ • ) at four polarizations (HH, HV, VH, and VV), as well as corresponding LIA and biomass values from LiDAR or in situ measurements.
The ANN and SVR algorithms were trained by dividing the entire BRIX dataset into two parts, or the subsets of each campaign in the case of specific algorithms. The training set was composed of 50% of the entire dataset, while the other 50% was used for validating the algorithms. The flowchart in Figure 1 shows how the training and validation sets were obtained. The interleaved sampling was also evaluated for dividing the dataset; however, negligible differences were obtained in the training and validation results. It should be mentioned that the sampling cannot be considered purely random in the case of the general algorithm, since the training and validation sets are obtained by combining together the specific sets randomly sampled, so the general training and validation sets include the 50% of data from each campaign.
Remote Sens. 2020, 12, 804 4 of 17 The ANN and SVR algorithms were trained by dividing the entire BRIX dataset into two parts, or the subsets of each campaign in the case of specific algorithms. The training set was composed of 50% of the entire dataset, while the other 50% was used for validating the algorithms. The flowchart in Figure 1 shows how the training and validation sets were obtained. The interleaved sampling was also evaluated for dividing the dataset; however, negligible differences were obtained in the training and validation results. It should be mentioned that the sampling cannot be considered purely random in the case of the general algorithm, since the training and validation sets are obtained by combining together the specific sets randomly sampled, so the general training and validation sets include the 50% of data from each campaign.

ANN
ANN can be regarded as a statistical method based on the minimum variance, which is able to approximate the existing relationship between the given input(s) and output(s) [49,50].
The core of the algorithm is based on the Feedforward Multi-layer Perceptron Artificial Neural Networks (MLP-ANNs) available in the MATLAB ® Neural Networks toolbox. In MLP-ANN, each neuron is connected to all the other neurons of the previous and following layers. The connections are weighted and biased by coefficients that are iteratively adjusted during the training, thus modifying the strength of every connection. In each neuron, the inputs are weighted, biased, and added each other to produce an output value, called activation, which is then passed through the socalled transfer function and moved to the neurons of the following layer. In this study, transfer functions of linear, hyperbolic tangent sigmoid (tansig), and log-sigmoid (logsig) types were considered.
The weights and biases were adjusted by the Back Propagation (BP) learning rule, namely BP is a gradient descendent algorithm aimed at minimizing the Mean Square Error (MSE) between the ANN output and the desired value.

ANN
ANN can be regarded as a statistical method based on the minimum variance, which is able to approximate the existing relationship between the given input(s) and output(s) [49,50].
The core of the algorithm is based on the Feedforward Multi-layer Perceptron Artificial Neural Networks (MLP-ANNs) available in the MATLAB ® Neural Networks toolbox. In MLP-ANN, each neuron is connected to all the other neurons of the previous and following layers. The connections are weighted and biased by coefficients that are iteratively adjusted during the training, thus modifying the strength of every connection. In each neuron, the inputs are weighted, biased, and added each other to produce an output value, called activation, which is then passed through the so-called transfer function and moved to the neurons of the following layer. In this study, transfer functions of linear, hyperbolic tangent sigmoid (tansig), and log-sigmoid (logsig) types were considered.
The weights and biases were adjusted by the Back Propagation (BP) learning rule, namely BP is a gradient descendent algorithm aimed at minimizing the Mean Square Error (MSE) between the ANN output and the desired value.
In the implemented algorithm, ANN inputs were the SAR backscattering coefficients (dB) at all polarizations (HH, HV, VH, VV) and the corresponding local incidence angle (LIA), and output was the forest biomass (t/ha). It is worthy to mention that, according to the reciprocity property of passive targets for monostatic observations, no difference was found in using one or the other cross-polarized channel, and therefore the two cross-polarized channels were provided to the algorithms as mean average. Several combinations of inputs were evaluated, by implementing specific algorithms for each dataset (AfriSAR Onera, AfriSAR DLR, BioSAR, and TropiSAR), and a general algorithm which was trained and validated on the entire dataset.
The training set was further divided into 60%, 20%, and 20% by using random sampling [32]. The ANN training was carried out on the first subset, using BP for adjusting the ANN parameters. The other two subsets were used for a posteriori tests at each training iteration. The "early stopping" rule was applied for preventing overfitting, that is, training stops as soon as the errors on the three subsets are diverging.
The best dimensioning of ANN, in terms of number of neurons and hidden layers, is obtained by iteratively increasing the number of neurons and hidden layers from one hidden layer, with a number of neurons equal to the number of inputs, and up to two hidden layers, with a number of neurons each equal to three times the number of inputs. The training of each configuration is repeated 100 times for each transfer function, by resetting each time the initial ANN weights. At the end of the iterations, the results are compared and the ANN giving the highest correlation estimated vs. target is chosen as optimal. Such systematic optimization helps in preventing both underfitting and overfitting; underfitting occurs when the ANN architecture is too simplistic for the given problem, and overfitting happens when the ANN architecture is overly complex and it also fits the noise in the training set, causing large errors when applied to other datasets.
It should be noted that the training described here is carried out one time only; after training, the ANN is saved and loaded again to be applied to the validation set and, in this case, to any new datasets. Since the training is the only time-consuming step of the implementation, the algorithm can be applied to new data in near real time.
Moreover, the training can be updated every time new training sets are available, for improving the retrieval accuracy. This represents a unique feature of the ANN and, in general, of the data-driven approaches based on machine learning, in comparison to the conventional retrievals that are based on the inversion of electromagnetic forward models.
The validation, to which the results presented in Section 5 refer, was obtained by applying the saved ANN to the validation set, which was not involved in the training, to keep training and validation as independent as possible.

SVR
The second machine-learning approach to the retrieval was based on the Supported Vector Regression (SVR) techniques. Similarly to ANN, past research has proven the SVR capabilities for remote sensing applications (e.g., [36,51]). SVRs were demonstrated capable of handling complex and nonlinear problems and of managing different kinds of inputs. While neural networks can handle nonlinear problems having only a large training dataset, SVRs can achieve high accuracies, even if few training data are available [51]. Actually, SVRs overcome this limitation because they are based on a geometrical concept [52]. The method uses a so-called kernel function, for mapping the m-dimensional input space of the original problem into a higher dimensional space, in which the function underlying the data can be linearly approximated.
While neural networks try to populate the feature space with as many data as possible considering all the available combinations of inputs and outputs for mapping the functions, SVRs aim at identifying the boundaries of the tolerance tube around the input data to map the function without the need of larger datasets [53].
The problem of retrieving the biomass from the P-band backscattering at different polarizations was set as follows: Remote Sens. 2020, 12, 804 6 of 17 where f is the desired function, y is the biomass, x 1 , x 2 ,... x m is the backscattering coefficient at different polarizations, and e is the white noise. The y estimation is obtained by determining the mapping function f as close as possible to the true mapping f for the given problem [37]. Given a set of N reference samples {xi, yi |i=1, . . . N}, the ε-insensitive SVR attempts to identify a smooth function f approximating f while keeping at most ε deviation from the targets yi [54]. f is obtained by mapping the input domain at m-dimensions into a higher dimension feature space, in which the function flatness is increased. In the new space, f can be linearly approximated, according to the following equation: where w represents the weights of the linear function, Φ is the mapping function that transforms the samples into the higher dimensional space, and b is the bias. The cost function combining the training error and the model complexity is minimized to obtain the optimal linear function in the transformed feature space [36]. The loss function f is ε-insensitiveε being the tolerance to errors, f ensures that losses smaller than ε are neglected [53]. An example of a possible choice of the ε-insensitive loss function is shown in Figure 2. The second term of Equation (2) is computed as the Euclidean norm of the weight vector w. The latter is inversely related to the geometrical margins of the solution and therefore to the model complexity [36]. A regularization parameter C is also introduced with the scope of adjusting the trade-off between the complexity (flatness) of the function f ' and the tolerance to the empirical errors. We referred to [37] for the detailed mathematical formulation.
Remote Sens. 2020, 12, 804 6 of 17 where f is the desired function, y is the biomass, x1, x2,... xm is the backscattering coefficient at different polarizations, and e is the white noise. The y estimation is obtained by determining the mapping function f' as close as possible to the true mapping f for the given problem [37]. Given a set of N reference samples {xi, yi |i=1,…N}, the ε-insensitive SVR attempts to identify a smooth function f' approximating f while keeping at most ε deviation from the targets yi [54]. f' is obtained by mapping the input domain at m-dimensions into a higher dimension feature space, in which the function flatness is increased. In the new space, f' can be linearly approximated, according to the following equation: where w represents the weights of the linear function, Φ is the mapping function that transforms the samples into the higher dimensional space, and b is the bias. The cost function combining the training error and the model complexity is minimized to obtain the optimal linear function in the transformed feature space [36]. The loss function f' is ε-insensitiveε being the tolerance to errors, f' ensures that losses smaller than ε are neglected [53]. An example of a possible choice of the ε-insensitive loss function is shown in Figure 2. The second term of Equation (2) is computed as the Euclidean norm of the weight vector w. The latter is inversely related to the geometrical margins of the solution and therefore to the model complexity [36]. A regularization parameter C is also introduced with the scope of adjusting the trade-off between the complexity (flatness) of the function f' and the tolerance to the empirical errors. We referred to [37] for the detailed mathematical formulation. The training phase acts on reference samples composed by field data coupled with remote sensing data, to train the SVR algorithm and tune the kernel parameters C and ε. This process is called model selection [53]. During the training phase, the SVR algorithm uses a subset of the training dataset, called the test dataset, to tune its performances step by step. After this phase, the algorithm is applied to the validation set for evaluating the algorithm performances on independent data. At this point, the learning phase is over and the regressor can be used (operational estimation phase). The trained SVR is a good approximator of the mapping function between input data, which in this case are represented by HH, VV, HV, and VH P-band backscattering, and the target variable represented by the biomass. The separation among training, test, and validation follows the same approach proposed for ANN.
Similar to ANN, specific SVR training regressors were computed from each of the missions provided in the input dataset, while a general SVR training regressor was computed from the entire The training phase acts on reference samples composed by field data coupled with remote sensing data, to train the SVR algorithm and tune the kernel parameters C and ε. This process is called model selection [53]. During the training phase, the SVR algorithm uses a subset of the training dataset, called the test dataset, to tune its performances step by step. After this phase, the algorithm is applied to the validation set for evaluating the algorithm performances on independent data. At this point, the learning phase is over and the regressor can be used (operational estimation phase). The trained SVR is a good approximator of the mapping function between input data, which in this case are represented by HH, VV, HV, and VH P-band backscattering, and the target variable represented by the biomass. The separation among training, test, and validation follows the same approach proposed for ANN.
Similar to ANN, specific SVR training regressors were computed from each of the missions provided in the input dataset, while a general SVR training regressor was computed from the entire dataset. The input/output configuration is the same of ANN-algorithm inputs are the backscattering coefficients at all polarizations (HH, VV, HV, VH), while the output is the forest biomass.

Data Analysis
A sensitivity analysis was carried out to understand the relationship between σ • and forest biomass for all the experimental data available in BRIX.
The analysis of the BRIX dataset confirmed the sensitivity of σ • to the forest biomass; the direct sensitivity of σ • at each polarization to forest biomass for the AfriSAR dataset is shown in Figure 3. The plots show an increase of σ • when biomass increases up to 500 t/ha, although some saturation is evident for biomass values higher than 300-350 t/ha, as already pointed out by previous research (e.g., [15,16]).
Remote Sens. 2020, 12, 804 7 of 17 dataset. The input/output configuration is the same of ANN-algorithm inputs are the backscattering coefficients at all polarizations (HH, VV, HV, VH), while the output is the forest biomass.

Data analysis
A sensitivity analysis was carried out to understand the relationship between ° and forest biomass for all the experimental data available in BRIX.
The analysis of the BRIX dataset confirmed the sensitivity of ° to the forest biomass; the direct sensitivity of σ° at each polarization to forest biomass for the AfriSAR dataset is shown in Figure 3. The plots show an increase of ° when biomass increases up to 500 t/ha, although some saturation is evident for biomass values higher than 300-350 t/ha, as already pointed out by previous research (e.g., [15,16]). The backscattering however does not saturate completely, and some increase with biomass can also be observed beyond this threshold. All scatterplots exhibit vertical clustering of °; this depends on the ° variability between the different SAR acquisitions in each test area that composed the dataset. Moreover, the same values of in situ biomass correspond to different subareas and, therefore, to different ° values, depending on the spatial variability of forest structure and on the speckle [23]. This further increased the spread of data.
The results of the sensitivity analysis conducted on the entire BRIX dataset is shown in Figure 4, where the backscattering at the four polarizations is represented as a function of the in situ biomass.
Different colors correspond to different campaigns (AfriSAR, BioSAR, TropiSAR), showing some separation of the data coming from the different campaigns. In particular, at least two different clusters can be identified, one including the AfriSAR and TropiSAR data and one including the The backscattering however does not saturate completely, and some increase with biomass can also be observed beyond this threshold. All scatterplots exhibit vertical clustering of σ • ; this depends on the σ • variability between the different SAR acquisitions in each test area that composed the dataset. Moreover, the same values of in situ biomass correspond to different subareas and, therefore, to different σ • values, depending on the spatial variability of forest structure and on the speckle [23]. This further increased the spread of data.
The results of the sensitivity analysis conducted on the entire BRIX dataset is shown in Figure 4, where the backscattering at the four polarizations is represented as a function of the in situ biomass.
Different colors correspond to different campaigns (AfriSAR, BioSAR, TropiSAR), showing some separation of the data coming from the different campaigns. In particular, at least two different clusters can be identified, one including the AfriSAR and TropiSAR data and one including the BioSAR data. These clusters can be explained by considering the differences in the instrumental setups and calibrations of the different SAR instruments used in the campaigns [38][39][40][41][42][43].
BioSAR data. These clusters can be explained by considering the differences in the instrumental setups and calibrations of the different SAR instruments used in the campaigns [38][39][40][41][42][43]. The analysis of correlation coefficients (R) reflects this behavior, with higher correlation when considering the AfriSAR dataset (R from 0.64 to 0.76) and low values when considering the entire dataset (R from 0.16 to 0.34). The R values for each campaign are summarized in Table 1. Table 1 points out that the three BioSAR datasets have the worst correlation to in situ biomass. This can be attributed to the boreal forest type, which is characterized by more sparse trees and lower biomasses. Therefore, soil and undergrowing vegetation are expected to contribute to the total backscattering. The correlation computed by grouping the three sets slightly increases (0.17 ≤ R ≤ 0.24) but is still very low. Depending on the lower biomass range, ° saturation was not observed in the BioSAR datasets.  The analysis of correlation coefficients (R) reflects this behavior, with higher correlation when considering the AfriSAR dataset (R from 0.64 to 0.76) and low values when considering the entire dataset (R from 0.16 to 0.34). The R values for each campaign are summarized in Table 1.  Table 1 points out that the three BioSAR datasets have the worst correlation to in situ biomass. This can be attributed to the boreal forest type, which is characterized by more sparse trees and lower biomasses. Therefore, soil and undergrowing vegetation are expected to contribute to the total backscattering. The correlation computed by grouping the three sets slightly increases (0.17 ≤ R ≤ 0.24) but is still very low. Depending on the lower biomass range, σ • saturation was not observed in the BioSAR datasets. The dependence of the relationship σ • -biomass on the site characteristics suggested that, along with a general algorithm able to apply to all datasets, specific algorithms for each dataset improve the retrieval performances.

ANN
The ANN algorithm validation is summarized in the plots of Figure 5. Each scatterplot shows the predicted vs. in situ biomass values for the validation set of each campaign, obtained by applying the specific ANN to each given validation set, plus the result obtained by applying the general ANN to the whole validation set.
Remote Sens. 2020, 12, 804 9 of 17 The dependence of the relationship °-biomass on the site characteristics suggested that, along with a general algorithm able to apply to all datasets, specific algorithms for each dataset improve the retrieval performances.

ANN
The ANN algorithm validation is summarized in the plots of Figure 5. Each scatterplot shows the predicted vs. in situ biomass values for the validation set of each campaign, obtained by applying the specific ANN to each given validation set, plus the result obtained by applying the general ANN to the whole validation set.
(e) (f) The correlation coefficient between estimated and target biomass ranged from R=0.69 to R=0.98, while the RMSE was between 14 t/ha and 58 t/ha ( Figure 5 a-f). The result obtained by the specific ANN for the two AfriSAR missions was 0.92 ≤ R ≤ 0.94 with 42 ≤ RMSE ≤ 57 t/ha. Some underestimation of the highest biomass values can be identified in the scatterplot, possibly attributed to the saturation exhibited by the SAR signal for biomasses higher than 350 t/ha ( Figure 5 a-b). The ANN for TropiSAR dataset obtained better results (R = 0.98 and RMSE= 17 t/ha - Figure 5 f); in that case, the underestimation of higher values was not evident. Among the specific ANN for BioSAR missions, the first and the second obtained similar results in terms of both R and RMSE (Figure 5cd), whereas worse results were obtained for the third one ( Figure 5 e).
Finally, the validation result for the general ANN is reported in Figure 5 g). Looking at the ° correlation to the forest biomass listed in Figure 4, this result is quite surprising, since the correlation between biomass estimated by the ANN and reference values was R = 0.93, while the ° at various polarizations and target biomass on the entire dataset was R ≤ 0.23. The p-value was < 0.05 for the general and all the specific algorithms.

SVR
The learning process of the proposed algorithm was performed for any of the six missions and a further analysis was performed for the learning process by using the complete dataset. Then a trained SVR regressor was obtained for each dataset.
The performances of each SVR regressor were computed using the validation dataset and using as metrics the correlation and the RMSE between true biomass and estimated biomass ( Figure 6 a-g), where the green line represents the regression line.
From the scatterplots, we can see that the correlation coefficient between estimated and target biomass ranges from R = 0.65 to R = 0.93, while the RMSE is between 15 t/ha and 65 t/ha. Despite a lower sensitivity of the backscattering coefficient with high biomass values, the SVR regression appears to predict quite well the higher biomasses. On the other hand, the higher sensitivity of the backscattering coefficient with low biomass values does not correspond to the lower ability of the SVR regressor to forecast low values of biomass. The correlation coefficient between estimated and target biomass ranged from R = 0.69 to R = 0.98, while the RMSE was between 14 t/ha and 58 t/ha (Figure 5a-f). The result obtained by the specific ANN for the two AfriSAR missions was 0.92 ≤ R ≤ 0.94 with 42 ≤ RMSE ≤ 57 t/ha. Some underestimation of the highest biomass values can be identified in the scatterplot, possibly attributed to the saturation exhibited by the SAR signal for biomasses higher than 350 t/ha (Figure 5a,b). The ANN for TropiSAR dataset obtained better results (R = 0.98 and RMSE = 17 t/ha- Figure 5f); in that case, the underestimation of higher values was not evident. Among the specific ANN for BioSAR missions, the first and the second obtained similar results in terms of both R and RMSE (Figure 5c,d), whereas worse results were obtained for the third one (Figure 5e).
Finally, the validation result for the general ANN is reported in Figure 5g). Looking at the σ • correlation to the forest biomass listed in Figure 4, this result is quite surprising, since the correlation between biomass estimated by the ANN and reference values was R = 0.93, while the σ • at various polarizations and target biomass on the entire dataset was R ≤ 0.23. The p-value was < 0.05 for the general and all the specific algorithms.

SVR
The learning process of the proposed algorithm was performed for any of the six missions and a further analysis was performed for the learning process by using the complete dataset. Then a trained SVR regressor was obtained for each dataset.
The performances of each SVR regressor were computed using the validation dataset and using as metrics the correlation and the RMSE between true biomass and estimated biomass (Figure 6a-g), where the green line represents the regression line.
From the scatterplots, we can see that the correlation coefficient between estimated and target biomass ranges from R = 0.65 to R = 0.93, while the RMSE is between 15 t/ha and 65 t/ha.
Despite a lower sensitivity of the backscattering coefficient with high biomass values, the SVR regression appears to predict quite well the higher biomasses. On the other hand, the higher sensitivity of the backscattering coefficient with low biomass values does not correspond to the lower ability of the SVR regressor to forecast low values of biomass. The general SVR regressor for the complete dataset obtained R = 0.86 and RMSE = 64 t/ha ( Figure  6g). Similar to ANN, the condition p-value < 0.05 was verified in all cases.

Biomass maps
After validation, the ANN and SVR algorithms were applied to the available SAR images for generating biomass maps of the entire area covered by the SAR acquisition. Figure 7 shows, as examples, three biomass maps, namely two generated by ANN and one by SVR using the AfriSAR DLR dataset. The maps are generated pixel by pixel from the input SAR data. Along with each map, the corresponding validation scatterplot using the in situ data available is shown. The areas other than savannah are covered by very dense forest and therefore the majority of points in the scatterplot is concentrated around very low and very high values. The correlation coefficient is 0.88 ≤ R ≤ 0.95 and both machine-learning approaches behave similarly, by slightly underestimating the highest values of biomass. The qualitative inspection of the maps shows that non-forested areas, mainly composed of meadows and grassland, are identified and the local patterns of biomass are correctly reproduced. The SVR regressor obtained for AfriSAR DLR dataset shows R = 0.88 and RMSE = 65 t/ha. For biomass values greater than 200, the SVR regressor shows rather good performances with RMSE = 47 t/ha, whereas for values lower than 200 it shows an overestimation in the estimated biomass ( Figure 6a). The result obtained for AfriSAR Onera shows R = 0.9 and RMSE = 55 t/ha with some bias that however does not hamper the accuracy (Figure 6b).
The results obtained for the BioSAR 1 and 2 datasets are quite good, as demonstrated by R = 0.92 and 0.93 and RMSE = 24 t/ha and 15 t/ha, respectively (Figure 6c,d). As for the ANN algorithm, the accuracy was lower in the case of the BioSAR 3 dataset (RMSE = 53 t/ha and R = 0.65). Moreover, a consistent underestimation of the target biomass can be observed (Figure 6e). For the TropiSAR dataset, SVR obtained R = 0.87 and RMSE = 48 t/ha, with some overestimation of the biomasses in the range 200-300 t/ha (Figure 6f).
The general SVR regressor for the complete dataset obtained R = 0.86 and RMSE = 64 t/ha (Figure 6g). Similar to ANN, the condition p-value < 0.05 was verified in all cases.

Biomass Maps
After validation, the ANN and SVR algorithms were applied to the available SAR images for generating biomass maps of the entire area covered by the SAR acquisition. Figure 7 shows, as examples, three biomass maps, namely two generated by ANN and one by SVR using the AfriSAR DLR dataset. The maps are generated pixel by pixel from the input SAR data. Along with each map, the corresponding validation scatterplot using the in situ data available is shown. The areas other than savannah are covered by very dense forest and therefore the majority of points in the scatterplot is concentrated around very low and very high values. The correlation coefficient is 0.88 ≤ R ≤ 0.95 and both machine-learning approaches behave similarly, by slightly underestimating the highest values of biomass. The qualitative inspection of the maps shows that non-forested areas, mainly composed of meadows and grassland, are identified and the local patterns of biomass are correctly reproduced.

Discussion
The comparison of the results obtained by ANN and SVR ( Figure 5 and 6) did not indicate significant differences in the retrieval performances between the two methods. Both ANN and SVR exhibited similar accuracies, being able to estimate the target biomass with a slight underestimation of the values higher than 350 t/ha. Such underestimation can be attributed to the saturation of SAR data for biomass values higher than this threshold (see Figure 3 and 4), as already pointed out by past research [15]. It should be noted that both algorithms require only SAR data as input, without the need of any ancillary information. The results of the general algorithms could be affected by the

Discussion
The comparison of the results obtained by ANN and SVR (Figures 5 and 6) did not indicate significant differences in the retrieval performances between the two methods. Both ANN and SVR exhibited similar accuracies, being able to estimate the target biomass with a slight underestimation of the values higher than 350 t/ha. Such underestimation can be attributed to the saturation of SAR data for biomass values higher than this threshold (see Figures 3 and 4), as already pointed out by past research [15]. It should be noted that both algorithms require only SAR data as input, without the need of any ancillary information. The results of the general algorithms could be affected by the different instrumental setups that generated the clusters of data in Figure 4. Better retrievals could therefore be expected if data are collected by the same instrument, thus overcoming any intercalibration issue.
The computational cost of both ANN and SVR was also similar. The training was the only time-consuming step-for both algorithms, training each configuration using the BRIX dataset took a few minutes on a recent machine with an INTEL I7 6 Core processor, while applying the trained algorithm to the validation set occurred in near real time.
The validation results were in line with other studies (e.g., [22,23,25]), although differences in the test areas and datasets make the direct comparison difficult. For instance, the RMSE obtained by both ANN and SVR in the validation using BIOSAR1 data is in the same range reported in [22]. Similar conclusions can be drawn from the comparison between the ANN and SVR results obtained on equatorial forests (AfriSAR and TropiSAR datasets) and the results at P-band presented in [23]. It should be remarked that, in this case, both algorithms were also able to retrieve biomass beyond the 300 t/ha threshold indicated in [23], although with a slight underestimation of the higher values. Both algorithms can manage nonlinear relationships and, therefore, they are able to exploit the residual sensitivity of backscattering to biomass higher than 300 t/ha shown in Figure 3.
Concerning the disadvantages of these methods, the algorithm exportability to other areas should be verified before claiming a general validity. Indeed, depending on the experiment-driven training, the obtained results could be site-dependent and they could change significantly if applying the algorithms to other test areas. Previous studies indeed reported that retrieval errors of ML methods could be large if the test data are not properly represented in the training (e.g., [32]). However, updating the training with new data to enable the algorithms working on other areas is quite straightforward and it can be achieved without modifying the algorithm structure. Another possibility that will be investigated in the pursuance of this study is to train the algorithms by merging the experimental datasets with data simulated by electromagnetic models, such as the Water Cloud Model [54,55], for a wider range of forest conditions. Such a strategy should allow overcoming the site dependency of experiment-driven training, by obtaining more general algorithms, which can also retrieve the forest biomass with satisfactory accuracy in areas other than the ones considered in the training.

Conclusions and Future Work
In this study, two algorithms based on machine learning, namely ANN and SVR, were implemented, trained, and validated to estimate forest biomass from P-band airborne SAR data.
Both ANN and SVR exhibited similar retrieval performances, displaying a general capability of retrieving the target biomass with a slight underestimation of the values higher than 350 t/ha. Such underestimation can be attributed to some saturation exhibited by the SAR signal for the highest biomass values.
The characteristics of the available dataset suggested implementing general algorithms trained and tested on the entire dataset and specific algorithms for each test area.
The validation of the general algorithms resulted in R > 0.85 for both SVR and ANN, with RMSE 60-70 t/ha and bias negligible, while the validation of the specific algorithms resulted in R from 0.65 to 0.98 and RMSE between 11 and 65 t/ha, depending on the dataset and on the algorithm. This investigation demonstrated the capability of machine-learning techniques for the remote sensing of forest biomass by using SAR. In this respect, ANN and SVR can be substantially considered equivalent in both retrieval accuracy and computational cost, since the investigation did not point out any aspect in which one of the two methods outperformed the other.
In the pursuance of this study, we plan to merge the experimental dataset with data simulated by electromagnetic forward models for training the algorithms. This strategy should overcome the limits of experiment-driven training, by filling the gaps in the training set and enabling the application to other areas.