Pathways of Forest Above-Ground Biomass Estimation Based on SAR Backscatter and Interferometric SAR Observations

Estimation of forest biomass with synthetic aperture radar (SAR) and interferometric SAR (InSAR) observables has been surveyed in 186 peer-reviewed papers to identify major research pathways in terms of data used and retrieval models. Research evaluated primarily (i) L-band observations of SAR backscatter; and, (ii) single-image or multi-polarized retrieval schemes. The use of multi-temporal or multi-frequency data improved the biomass estimates when compared to single-image retrieval. Low frequency SAR backscatter contributed the most to the biomass estimates. Single-pass InSAR height was reported to be a more reliable predictor of biomass, overcoming the loss of sensitivity of SAR backscatter and coherence in high biomass forest. A variety of empirical and semi-empirical regression models relating biomass to the SAR observables were proposed. Semi-empirical models were mostly used for large-scale mapping because of the simple formulation and the robustness of the model parameters estimates to forest structure and environmental conditions. Non-parametric models were appraised for their capability to ingest multiple observations and perform accurate retrievals having a large number of training samples available. Some studies argued that estimating compartment biomass (in stems, branches, foliage) with different types of SAR observations would lead to an improved estimate of total biomass. Although promising, scientific evidence for such an assumption is still weak. The increased availability of free and open SAR observations from currently orbiting and forthcoming spaceborne SAR missions will foster studies on forest biomass retrieval. Approaches attempting to maximize the information content on biomass of individual data streams shall be pursued.


Introduction
Above-ground biomass (AGB) refers to amount of organic matter that is stored in vegetation above the ground level.Forests store by far the largest amount of biomass when compared to all vegetation types.Knowledge of the forest above-ground biomass is crucial because of its importance from an ecological, climatic, and economic point of view [1].Estimates of AGB are typically obtained from sets of measurements of forest variables (i.e., diameter at breast height, tree height, forest composition, and tree density) that are taken on the ground.A detailed forest inventory takes a substantial amount of time, entails significant costs, and does not permit a synoptic view of the distribution of biomass across a forest landscape.In this sense, survey and biomass estimation techniques based on remote sensing data are more suitable for mapping and monitoring large areas.Above-ground biomass is indeed listed as an Essential Climate Variable (ECV) by the Global Climate Observing System (GCOS), for which accurate and up-to-date knowledge can be achieved with systemic observations.However, remote sensing estimates of biomass are not free from errors given that remote sensing does not provide a direct measurement of the organic matter stored in vegetation.The amount of biomass can only be inferred through e.g.,

•
the vertical distribution of organic matter as seen by LiDAR or interferometric radar; and, • direct observations of reflectance (optical sensor) or backscattered signal (active microwave sensor) with empirical models and functions; the retrieval can be aided by vegetation height as derived from laser measurements.
The potential of radar observations to estimate forest above-ground biomass has been investigated since the 1980s and reached a first peak during the 1990s with data from spaceborne (ERS-1/2, JERS-1, SIR-C/X-SAR) and airborne platforms (AIRSAR, CARABAS).Initial evidence that only long wavelengths sufficiently penetrate the forest canopy and are sensitive to biomass paired with the lack of long wavelength datasets, did not promote the use of radar observations for retrieving biomass in the following years.Towards the end of the last decade, the topic of forest biomass retrieval was revived thanks to concepts that were particularly favorable to support biomass estimation (single-pass interferometry) and new concepts concerning the use of multiple radar channels in the retrieval (polarimetric interferometric, tomography, hyper-temporal combinations).The development of spaceborne missions exploiting long wavelength radar systems that are targeting in particular observations of forests (ALOS PALSAR series, SAOCOM, TanDEM-L, NiSAR, and BIOMASS) is an additional setting for studies that are dealing with biomass retrieval approaches.
In this paper, we present a review of investigations that were published until 2017 in peer-reviewed literature that were concerned with retrieval approaches of forest aboveground biomass from the backscattered intensity or interferometric synthetic aperture radar (SAR) observations.Ultimately, the objective of this paper is to summarize knowledge on forest biomass retrieval with "standard" radar remote sensing observations so to identify salient aspects that are worth being addressed further in future studies.In particular, we are interested in understanding the prospects of using multi-frequency SAR observations in an epoch of increasing availability of SAR data from multiple platforms orbiting in space.SAR backscatter and interferometric observations are and will be available from all spaceborne SAR missions in orbit, and, therefore, are of interest to researches that are dealing with the development of forest retrieval algorithms worldwide.
Studies dealing with the retrieval of other forest variables from which above-ground biomass can be inferred with allometric functions (e.g., age, canopy cover, height) are not addressed in this study because involving functional relationships that are beyond those existing between remote sensing data and biomass.Retrieval schemes that are based upon advanced processing techniques, such as polarimetric interferometry or tomography, are also not addressed in this survey.There is still limited experimental evidence on the pathways followed by these techniques as researches have been performed on a relatively small number of datasets.
Section 2 provides the background to the analysis that is presented.Section 3 reports statistics that were obtained by grouping research papers according to a set of discriminants.The statistics represent the backbone of this summary; for completeness and transparency, the list of articles that were surveyed is included in a separate document being part of the Supplementary Information to this paper.Articles found to support our interpretation of research pathways are cited in this paper.Section 4 reports on strengths and limitations of retrieval approaches identified in literature.Results of our survey are summarized in Section 5, providing indications on possible frameworks for retrieving biomass.In Section 6, we finally review approaches where the total biomass is obtained by summing up estimates of biomass for the different tree compartments (stem, branches, and foliage).The paper ends with a set of conclusions on research pathways and suggested fields of future investigations (Section 7); also, the use of SAR observables is put in a broader context while considering approaches that are mentioned above but are not addressed in this survey.

Background
Forest biomass in this context refers from here onwards indistinctly to either forest above-ground live biomass AGB (i.e., amount of organic matter) or forest growing stock volume (GSV, i.e., amount of woody volume).AGB is defined as the amount of organic matter above ground per unit area (t/ha, Mg/ha or kg/m 2 ).GSV is defined to a wood volume above ground per unit area (m 3 /ha).From AGB, carbon stock densities can be estimated by means of a scaling factor of approximately 0.5 [2].The focus of this review is on approaches that are developed and applied to retrieve biomass.The pre-processing techniques (e.g., filtering, window size for coherence estimation, etc.) are not discussed.Similarly, we do not go into the details of the definition of AGB or GSV used by the authors.Typically, AGB or GSV are based upon measurements of trees larger than a certain threshold of the diameter at breast height, tree height etc.Although ultimately important when quantifying the spatial distribution of biomass, the exact definition of biomass is considered to be of minor relevance for the scope of this paper.When considering all of the factors that could eventually impact the retrieval performance, we believe that the biomass that was retrieved with SAR data has more uncertainty and errors associated with the sensitivity, or the lack thereof, of the data to biomass and the type of algorithmic framework used for the retrieval.It also needs to be acknowledged that while inventories report live above-ground biomass, the radar signal can be affected by both live and dead components of a forest.There is, however, limited experimental evidence on the relationship between biomass of coarse woody debris and the radar backscattered signal [3]; therefore, this aspect is not further discussed in this paper.
In total, 186 peer-reviewed papers were surveyed.The majority dealt with development and/or application of a biomass retrieval algorithm.Some basic studies on signature analysis have also been added when the authors acknowledged the potential of the dataset under investigation to retrieve biomass.The majority of studies that are presenting a retrieval scheme addressed the retrieval of total forest biomass; retrieval of biomass components (branch, stem, needles, etc.) was seldom addressed, all having in common a multi-frequency dataset and detailed inventory data to support the investigations.

Survey Statistics
For each paper, the survey identified the following set of parameters and a set of statistics was derived to identify pathways of research.In the survey, we decided to cover all the aspects of biomass retrieval with SAR data, thus including studies that explicitly addressed the retrieval of forest biomass from a set of SAR observations, as well as studies that dealt with the signatures of SAR observables as a function of biomass and studies that developed models relating observations to biomass, but not inverting them.Results reported in the 186 studies that are listed in the Supplement were found to depend on forest type and structure, viewing geometry of the radar, environmental conditions at the time of image acquisition, spatial and temporal resolution of the SAR observations, and repeat-pass interval.The outcome of each study was reported in form of a set of statistical measures or quality indicators (e.g., retrieval root mean square error, estimation bias, correlation coefficient between SAR observable and biomass, backscatter dynamic range, perpendicular component of the interferometric baseline, etc.).Given the differing premises of each study in terms of data, research focus, and modeling framework, we omit comparing and interpreting numbers in this paper.Instead, we tried to identify the major trends and behaviors by grouping studies according to a number of discriminants: Figure 1 shows a bar chart detailing the year of publication of the 186 research papers.The trend in Figure 1 is in our opinion closely related with the amount of data available for research and potentially suitable for biomass retrieval.The first studies (1987)(1988)(1989)(1990)(1991)(1992) were based upon a small number of airborne observations at one or multiple frequencies and suggested that lower frequencies (L-and P-band) could be more suitable for biomass estimation when compared to high frequencies (X-and C-band).The mid-1990s were dominated by studies based on AIRSAR and SIR-C/X-SAR observations and developed the concepts that were proposed earlier with multi-polarized and multi-frequency SAR data.In addition, in the mid-1990s, pioneering studies on the use of C-band ERS-1 backscatter observations for biomass retrieval were published.The first breakthrough of spaceborne satellite SAR data to estimate biomass occurred around the year 2000, with much attention being paid to ERS-1/2 and JERS-1 observations of the backscatter and the coherence.After this high, the paucity of SAR data available and suitable for biomass retrieval at the beginning of the 2000s is revealed by a minimum in peer-reviewed publications around the year 2005.With the start of operations of the L-band ALOS PALSAR sensor towards the end of 2006 and the TanDEM-X constellation in 2009, the topic of biomass retrieval was revived, reaching a first maximum in terms of peer reviewed publications in 2013 and then in 2015.The high throughput of research since then had a break in 2016, as a consequence of the end of the ALOS and the Envisat missions in 2011 and 2012, respectively.The start of operations of ALOS-2 PALSAR-2 and Sentinel-1 towards the end of 2014 coupled with increased knowledge on the potential of several existing datasets (primarily, ALOS PALSAR backscatter and TanDEM-X interferometry) is likely to explain the second highest number of publications per year in 2017.Grouping studies in terms of sensor/platform allowed for understanding the major study objectives (Table 1).Studies using SAR data acquired by more than one sensor were associated to each sensor.ALOS PALSAR was the sensor most used (23% of all datasets) thanks to the suitability of L-band to retrieve biomass and the observation strategy that was tailored to forest mapping applications [4].Data acquired by ERS-1/2, JERS-1, AIRSAR, and SIR-C/X-SAR during the 1990s accounted for 44% of the datasets that are reviewed here.During the 1990s, biomass retrieval approaches were developed following signature analyses, which demonstrated the sensitivity of SAR observables to biomass.TerraSAR-X and TanDEM-X data fostered primarily studies on the exploitation of three-dimensional information that was obtained with interferometric and radargrammetric approaches.Data acquired with the ground-based scatterometer HUTSCAT are also considered since they were actively used in the definition of retrieval algorithms.The survey identified a large variety of airborne observations (Airborne SAR-R99B, AeS-1, AIRSAR, CARABAS, CCRS radar, E-SAR, EMISAR, OrbiSAR, PiSAR, PiSAR-2, PLIS, RAMSES, SETHI, and UAVSAR), having the major objective to assess the sensitivity of SAR observations to biomass for specific configurations in terms of frequency band, viewing geometry, and polarization.When grouping studies according to the set of frequencies at which the radar data were acquired, it was evident that longer wavelengths were preferred in studies that were exploiting the radar backscattered intensities, given the stronger sensitivity of the SAR backscattered intensity to biomass [5][6][7].Studies involving the use of interferometric SAR (InSAR) observables focused on single-pass datasets or short repeat-pass intervals because of the direct and accurate measurement of vertical structural properties by interferometry, and, thus, the strong sensitivity to biomass as well [8][9][10].L-band data were used in 71% of the papers, followed by C-band (36%), P-band (21%), X-band (19%), VHF (3%), and S-band (1%).67% of the research papers dealt with a single band (Table 2), a consequence of the uncoordinated acquisition of SAR data by different platforms and missions and the unavailability of multi-band sensors on a single platform in space.Studies that were based on SAR data acquired at two or three frequencies were reported in 16% and 15% of the research papers surveyed, respectively.SAR data from four frequencies were used in four cases, corresponding to 2% of all studies.Grouping studies according to the SAR observable used as explanatory variable for the biomass revealed that research focused primarily on the backscattered intensity (Table 3).71% of the papers that were surveyed used observations of the backscattered intensity either as normalized radar cross section or as normalized radar cross section compensated for local topography.Local topography was expressed in terms of terrain slope angle, sensor look angle, local incidence angle, area of pixel, etc.Here, we did not further investigate the impact of the specific processing that was applied to generate the backscatter observations.For simplicity, we use the term "SAR backscatter" when referring to the backscattered intensity.From the SAR backscatter data, metrics such as average backscatter in time, textural parameters, and n-th intensity moment of the histogram were seldom investigated (4%).InSAR observables (i.e., InSAR height, coherence, or complex coherence) were used in 23% of the studies.Retrieval of biomass was primarily investigated with observations being characterized by short repeat-pass intervals that were acquired during the ERS missions and single-pass data acquired by the TerraSAR-X/TanDEM-X constellation.Table 3 reports slightly fewer studies involving the interferometric height, i.e., the elevation of the effective scattering phase center, than the coherence, i.e., a measure of the cross-correlation between the two images forming the interferogram [11], because the information content of the InSAR height is strongly affected by the temporal decorrelation occurring in repeat-pass scenarios.It needs to be remarked that the vertical profiling of vegetation with InSAR has been undertaken in far more instances, but these studies are not considered here because of our choice to focus on biomass retrieval approaches.The radargrammetric observable in Table 3 refers to the elevation estimated from the parallaxes between two images that were acquired at different look angles [12].The usefulness of radargrammetric height in the context of forest biomass estimation could be assessed so far only thanks to TerraSAR-X data acquired with intersection angles of 8 • or more [13].Retrieval of biomass was undertaken in 144 studies; in 42 research papers, instead, the focus was on signature analysis or modeling.In Table 4, the studies were grouped according to whether single or multiple observations were used to retrieve biomass.Biomass was retrieved from a single image in 56% of the cases.Single images were used primarily during the 1990s, an epoch that was characterized by few satellite observations.Multi-polarized data from a single acquisition was the most common dataset when using more than one observation in a retrieval scheme.Interestingly, single-date and multi-polarized datasets have been used in prototyping studies of the 1990s as well as in more recent investigations.Multi-temporal observations were used in 12% of the studies, with a clear focus on improving the biomass estimates when compared to single-image retrieval.In 7% of the studies, data from multiple bands have been combined.Such studies were mostly based on multi-frequency airborne acquisitions or sparse C-and L-band acquisitions during the 1990s.Only six studies ingested both multi-frequency and multi-temporal SAR data in a retrieval scheme, highlighting an almost unexplored field of investigation for biomass retrieval.Table 4. Investigations grouped according to which type of images were selected to undertake biomass retrieval (S = single image, M = multi, T = temporal, F = frequency, P = polarization).Biomass retrieval studies were undertaken in all of the forest ecosystems (Table 5).Boreal forests were mostly targeted (41% of the studies), primarily because the sensitivity of the spaceborne SAR backscatter observations to biomass was considered to be sufficient for developing retrieval algorithms to cover the range of biomass.InSAR retrieval models were also initially developed in boreal forests.Temperate forests (25%) were targeted as part of several airborne campaigns.The retrieval of biomass in tropical forests was assessed in 34% of the studies, primarily with longer wavelengths because of the supposedly stronger penetration into the forest canopy and the increased sensing of the major structural forest components, which primarily explain the biomass.However, given that most of the studies in tropical forests have been published in the last 10 years thanks to the availability of single-pass repeat pass InSAR data and more in situ data for model training and retrieval validation, it is believed that the retrieval of biomass will be the target of a large number of studies in the nearest future in such regions.

SAR Observable Counts
Boreal 82 Temperate 49 Tropical and sub-tropical (including savannas, cerrado and miombo woodlands) 68 Grouping the studies in terms of biomass variable of interest revealed an interesting divide between ecoregions (Table 6).AGB was the variable of interest in 71% of the studies, being retrieved almost exclusively in temperate and tropical forests.The sampling units here were primarily forest field plots with a size <1 ha.GSV was investigated in 26% of the studies.Retrieval of GSV was mostly undertaken in boreal forest, and, primarily, at the level of forest management units, i.e., forest stands, which were typically larger than 1 ha.Three studies looked at the direct retrieval of carbon stocks or carbon stock densities (i.e., tC/ha).
In case of a retrieval using backscatter or coherence data, it may be more rigorous to estimate the volume of a forest, i.e., a forest structural variable, and then convert the estimate to dry mass by accounting for wood density.If GSV is estimated, then the stem biomass needs to be expanded for the stem-to-total biomass proportion [2].Nonetheless, the conversion from volume to dry mass may be highly uncertain especially in regions of high biodiversity and composition where their relationship may be poorly characterized because of the spatial heterogeneity of species and vegetation structure.It is not the scope of this paper, however, to evaluate the prospects of retrieving volume or dry mass, as it is well understood that the observables here considered are only indirectly related to forest above-ground biomass.

Survey of Biomass Retrieval Approaches
The survey evidenced that the majority of the retrieval approaches that are presented in the literature used a small variety of observables, and, in particular, exploited observations that were taken at a single frequency (Table 4).The paucity of retrieval studies targeting SAR multi-frequency datasets and the increased availability of SAR data that were acquired at multiple frequencies in recent years (e.g., spaceborne X-, C-and L-band) suggested looking in more detail at retrieval approaches separately for single-frequency data and multi-frequency data.It is believed that trends in single frequency retrieval approaches are established and can inform multi-frequency retrieval strategies, which, on the other hand, are still in their infancy.

Retrieval of Biomass Using Backscatter Observations
Backscatter-based approaches can be grouped into three main categories: • parametric empirical regression models; • parametric semi-empirical and physically-based models; and, • non-parametric models.
Empirical regression models use a simple function with a limited number of coefficients to relate biomass and forest backscatter observations at one or multiple polarizations (and possibly across multiple observations in time).However, there is not a consensus on a single empirical model that performs better than the others.Several studies at first undertook an analysis of the SAR observations and the corresponding biomass values.Then, the mathematical function was identified that best represents the relationship between the observations and the biomass variable.Keeping the empirical model simple implied that such models could be inverted to allow for a retrieval of biomass from a set of observations of the SAR backscatter.Several authors, instead, directly set up a function from which the biomass could be predicted from the observed SAR backscatter.Notwithstanding whether a forward model is inverted or whether a direct retrieval model is presented, four typologies of empirical models to retrieve biomass were identified from the survey: linear models, multiple linear models, rise-to-max-exponential models, and logarithmic models.The latter, however, will not be discussed further as used only in very few cases.
Linear models relating the biomass (its natural logarithm or a power value) to SAR backscatter (either in linear scale or in the decibel scale) are the simplest possible type of regression models, requiring the estimation of the coefficient of slope and intercept from a set of training stands or plots.Such models have been used mostly for low frequency data (L-band, P-band, VHF), but occasionally also to explain the relationship between biomass and SAR backscatter at X-and C-band.In case of multi-polarized data, multiple linear regression models have been proposed; using multiple polarizations was reported to improve the retrieval accuracy when compared to single-image retrieval, in particular, when cross-polarized data were used.The advantage of linear models is that they are straightforward to apply since they attribute one biomass to any input value(s) of the backscatter.Nonetheless, deviations from the presumed linear relationship between backscatter and biomass, or transformations thereof, likely introduce systematic under-and over-estimation in certain biomass ranges.
The rise-to-max-exponential model was often used when assuming that a non-linear functional dependence existed between the SAR backscatter and the biomass, Equation (1).It expects the SAR backscatter to increase from the lowest value that was observed for a bare surface to the maximum value of the densest possible forest.This trend is typical for forests that were observed at the X-, C-, and L-band.The SAR backscatter increases rapidly for increasing biomass B from the level represented by the coefficient a, corresponding to a virtually unvegetated planar surface, to a backscatter level at which the model loses sensitivity to biomass.The coefficient of the exponential function, c, determines the slope of the modeled backscatter.The coefficient b corresponds to the backscatter value for a vegetation layer having a theoretically infinite biomass.
After model training, the inversion of Equation ( 1) is straightforward.The major drawback is that the estimation is affected by substantial errors when the backscatter is close to the largest of the modeled values, which necessitates a definition of a maximum retrievable biomass, for instance, with the biomass level for which the modeled sensitivity of backscatter to biomass falls below a certain threshold or the model inversion entails an error exceeding a desired level of accuracy.
Semi-empirical and physically based models describe the backscattered intensity from a forest in terms of the main scattering mechanisms occurring in a forest.In their simplest formulation, the models consist of a small number of components each describing one type of scattering mechanism, with a limited number of parameters related to the forest structural properties of the forest and the way the microwaves interact with the forest structure (e.g., attenuation, density of trees, or density of scatterers, etc.).With the aid of some simplifying assumptions, the models are set to express the total forest backscatter as a function of a single forest variable and consist of mathematical functions that can be easily inverted.One formulation that has been widely used to model the X-, C-, and L-band backscatter as a function of GSV or AGB is the Water Cloud Model (see e.g., [14][15][16]).The model stems from radiative transfer theory and expresses the total forest backscatter as an incoherent sum of the backscattered intensities from the canopy and the forest floor.In Equation (2), each contribution was expressed in terms of the biomass parameters of interest, V.The contribution of each component is expressed by the respective backscattering coefficients (σ 0 veg and σ 0 gr ), which were weighted by the relative contribution of the component to the total backscatter, which is expressed by the forest transmissivity.The transmissivity is typically modeled as an exponential, including the biomass variable of interest and an empirical coefficient that isassumed to be related to the forest attenuation, β.
Given that the model parameters have a physical meaning or can be modeled in terms of some additional parameters, they can be applied potentially wherever the model describes the physics of the scattering occurring in a forest.Such models tend to idealize the interaction of microwaves with the forest, which makes them potentially too general to be able to capture the complexity of a forest structure in their small number of parameters.The use of more advanced models could serve to resolve such an issue; however, such models require a significant amount of external information to be correctly calibrated and the inversion of such models often require numerical recipes.In this respect, a lookup table that links a set of parameters of a given structure to a specific value of the backscatter appears to be a promising approach.By matching an observation of the SAR backscatter with the value in the lookup table that is closest to the observation, it is possible to derive an estimate of the biomass without the need of having to invert the backscatter model [17].Model training is, however, necessary to set up the lookup table, which poses the question to which extent this approach can be generalized.
Non-parametric models include a range of computational algorithms that allow for learning from a set of observations.Learning means that multiple models are built and are then refined until convergence is reached.Such models require the intervention of the operator to tune the parameters of the algorithms while the leaning process and the construction of the models is left to the architecture of the system itself.Furthermore, they are quite advanced and they have been proposed for the retrieval of biomass to deal with aspects that parametric models either do not consider or fail to represent correctly in their oversimplification of scattering mechanisms.The advantage of non-parametric models over parametric models is greater when having multiple input datasets and additional auxiliary datasets.Nonetheless, such models require a fair amount of training datasets to perform optimally, which is often beyond what is available.Especially, when aiming at mapping large areas, the possibility to use non-parametric models becomes therefore unclear.
When compared to a retrieval based on SAR backscatter data only, the few studies exploring texture-based models reported smaller retrieval errors using texture either as stand-alone or in complement with the SAR backscatter [18][19][20].Texture is a measure of the spatial homogeneity of the scattering, and in this sense, should contain information about forest structure.It is reasonable to assume that the predictive power of texture is strictly related to the spatial resolution of the SAR data and its radiometric accuracy.However, none of these assumptions were proven and it is unclear whether the better performance of a retrieval based on texture was solely due to the properties of the texture or could be explained with the type of (empirical) modeling that was used for the retrieval.
With multiple observations of the SAR backscatter, strategies that exploit the temporal aspect of the radar signal were developed with the aim of decreasing the error of each single estimate due, e.g., to noise.Weighted combinations of biomass estimates from individual SAR observations were proposed in [15,16,21].Multiple regression models that were trained with in situ observations were proposed in [22].The retrieval was found to improve substantially for observations that were weakly correlated in time and with an overall weak sensitivity to biomass, e.g., at C-band [15].On the contrary, the improvement with respect to the best retrieval from a single observation was marginal in case of strong temporal correlation and strong sensitivity to biomass, e.g., at L-band [23].Thereof, the requirement on the number of observations necessary to improve the retrieval error is less stringent.Yet, also unclear is whether criteria exist according to which the retrieval performance can be measured on the basis of a minimum number of observations.

Retrieval of Biomass Using InSAR Observations
Coherence-based retrieval models are mostly parametric, with an equal share of studies investigating empirical regression models [24][25][26][27] and semi-empirical models [8,9,28].Their formulation is rather simple and the shape of the model predicted for a given set of observations differs only in case of rather long baselines, since empirical models do not include a term that is related to volume decorrelation [29].Similarly to the Water Cloud Model for the SAR backscatter, the Interferometric Water Cloud Model (IWCM) in Equation ( 3) describes the complex coherence of a forest as a sum of two contributions from the forest floor and the canopy.Each contribution has its own temporal decorrelation term (γ gr and γ veg ).The volumetric decorrelation that was induced by the spatial baseline is accounted for in the canopy term of the IWCM, with α being the two-way tree transmissivity (in dB/m) and ω being expressed in Equation ( 4) by B n the perpendicular component of the spatial baseline, λ the wavelength, R the slant range distance and θ the local incidence angle.As for the WCM in Equation ( 2), in the IWCM, we used the symbol V to refer to biomass.
The performance of coherence-based retrieval models depended strongly on whether temporal decorrelation enhanced the sensitivity of the coherence to forest structural parameter (e.g., under windy conditions) or cancelled out such sensitivity (e.g., after rainfall).The C-band ERS-1/2 one-day coherence was found to outperform the backscatter as long as at least one image pair that was acquired under dry and stable environmental conditions was acquired [8].In the case of ERS-1 3-to 12-days repeat-pass intervals, the sensitivity of the coherence to biomass decreased; accordingly, the retrieval error increased [30].Certain potential for the use of L-band repeat-pass coherence from the JERS-1 and ALOS PALSAR sensors with 46-and 44-days repeat-pass interval was demonstrated in Siberian boreal forest [26,27,31].Long winter-frozen conditions guaranteed the maximum contrast of coherence between low and high biomass.On the other hand, single-pass coherence as from the TanDEM-X mission, is only affected by volume decorrelation so that the sensitivity to biomass depends on the length of the spatial baseline [32].
As in the case of the backscatter, a multi-temporal dataset of coherence observations can be used in a multi-temporal combination to improve the accuracy of the biomass estimates with respect to values that were obtained from individual coherence observations [8].Experiments that were undertaken with ERS-1/2 tandem data revealed that a small number of coherence observations that were acquired under environmental conditions that preserve the coherence were sufficient to obtain the best possible retrieval accuracy [33,34].The use of multi-temporal coherence has not been investigated for other interferometric datasets either because a large number of coherence observations have never been obtained (e.g., at L-band from ALOS PALSAR) or there has not been sufficient interest to evaluate repeat-pass datasets that are potentially suitable for retrieving biomass (e.g., X-band TerraSAR-X 11 days or COSMO SkyMed 1-16 days).Retrieval of biomass from Sentinel-1 6-and 12-days repeat-pass coherence has not been reported yet; however, it is assumed that reliable estimates will be obtained only in areas with long periods of winter/frozen conditions or dry conditions preserving the coherence.
An InSAR height-based retrieval has enormous potential because of the direct relationship between the interferometric phase, ∆Φ, and elevation, Equation (5).
In the case of vegetation, it is worth noting that h int refers to the elevation of the scattering center, which is a function of radar frequency, canopy closure, and the vertical distribution of the scatterer.These factors, as well as information on the elevation of the ground, need to be considered when estimating biomass from the InSAR height.Several research papers demonstrated that simple linear relationships could predict biomass from estimates of InSAR height of single-pass datasets [35][36][37][38][39][40][41][42].Nevertheless, it is unclear whether such linear models that were validated at a number of test sites in boreal and savannah forest apply in other forest ecosystems as well [42].An advanced solution having more potential for generalization is given by physically-based models, such as Equation ( 3), which take into account the sensitivity of the InSAR elevation to baseline and by a combination of single-images estimates of biomass with a multi-temporal combination [40].Regardless of the retrieval approach, the quality of the retrieved biomass was found to be more affected by the uncertainty of the height estimate, i.e., the coherence, and by the availability of accurate information on the elevation of the terrain beneath the forest rather than by the degree of penetration of the microwaves into the forest canopy, i.e., the location of the effective phase scattering center.
Estimates of terrain height can be obtained with laser scanning techniques or with a low frequency interferometric system.The necessity of having (i) a single-pass interferometer, possibly at high frequency, such as TanDEM-X, and, (ii) an auxiliary dataset on terrain elevation implies that such an approach is not feasible to map large areas in the nearest future unless in regions with an advanced forest mapping and monitoring system [41].In [38], the use of the smallest InSAR phase has been proposed as a means to overcome the use of an independent dataset of terrain elevation with interesting results to be further pursued.
Although an interferogram contains two observables that are both potentially suitable to support the retrieval of biomass, most retrieval studies favored the use of only one observable, neglecting the other.In repeat-pass scenarios, the retrieval could not profit from the InSAR phase because the large uncertainty [43].With single-pass interferometry and long baselines, however, both InSAR height and coherence are sufficiently sensitive to forest structural properties and allow for an exploitation of their synergy.The two-level model (TLM) inversion that is proposed in [44,45] goes in this direction by estimating two parameters (area-fill factor and InSAR height) from the complex coherence, which resemble canopy closure and forest height, respectively, i.e., two variables that are closely related to biomass.In [38], it was argued, however, that coherence from TanDEM-X interferograms would only contribute 7% to a combined estimated of biomass.Further along the line of exploiting primarily InSAR phase information, the use of individual Fourier Transform frequency components of the vertical profile was suggested to be estimating biomass more accurately compared to using the mean InSAR height for the reference unit (as used on most studies here cited) [10].This appears to be a promising approach to be further evaluated.It is worth noting that the predictors and approaches that are proposed in [38,42] are favored by the high spatial resolution of the SAR data used in the experiments, pointing at the importance of scales in the context of biomass retrieval with interferometric data.

Multi-Frequency Retrieval Approaches
The literature survey identified two types of multi-frequency retrieval approaches based on SAR backscatter data.One type was developed with AIRSAR (C-, L-, and P-band) and/or SIR-C/X-SAR (X-, C-and L-band) images of the SAR backscatter that were acquired mostly over northern boreal and temperate forest during the 1990s [46][47][48][49].Both empirical (multivariate regression) and physically based models were systematically assessed.The second type can be considered more "opportunistic", since the models were developed with spaceborne SAR images that were acquired by multiple sensors independently from each other over a certain area, e.g., ERS and JERS [21,25] or ALOS PALSAR and RADARSAT-2 [50,51], with in addition TerraSAR-X [52].Except for [25,50], where coherence was used as one of the predictors, in all other research papers dealing with multi-frequency retrieval, the predictor consisted of the SAR backscatter only.
The multi-frequency retrieval approaches could be grouped in terms of their models relating observations to biomass as follows: • least-squares regression models applied to SAR backscatter and SAR backscatter ratios of several bands and polarizations; • neural networks inverting a physically-based model; • non-parametric models; and, • multi-temporal combinations of biomass estimates obtained from multi-frequency datasets.
The contribution of the different bands to the retrieval accuracy differed from study to study.P-band and L-band were judged to be the most predictive, but there was no consensus whether one specific polarization would be more effective than others to predict biomass.C-band data was reported to have less potential than longer wavelength, but could provide improved estimates when combined with L-band data only (i.e., when P-band data were unavailable).The contribution of X-band backscatter data was considered negligible when data from at least two other frequencies were available.The retrieval instead improved in combination with L-band data [53,54].Non-parametric models performed better than a multi-linear regression in [54], whereas using a physically based model, the retrieval performed better than using an empirical multivariate model [55].Finally, adding InSAR height at C-band to a linear model expressing biomass as a function of multi-polarized AIRSAR data improved the retrieval [56]; furthermore, the performance was better for increasing the wavelength.
With the launch of the P-band BIOMASS satellite ahead, one question is how much data that are acquired by sensors operating at higher frequencies (C-, L-and X-band, possibly S-band) can aid biomass retrieval from data acquired by BIOMASS.With AIRSAR data, one major benefit of adding data from shorter wavelengths to a retrieval that is based only on P-band was discussed by [57], who showed that the range of biomasses that could be predicted increased from 160 tDM/ha to 240 tDM/ha (tDM = tons of dry matter) in a mangrove forest.This aspect deserves further investigation as P-band campaigns by the European Space Agency are delivering datasets that can be combined with other airborne and spaceborne datasets.
It is interesting to observe that only one paper coordinated the use of multi-frequency backscatter and interferometric observations [58].Biomass was estimated from the difference of interferograms that were obtained at X-and P-band, representing the elevation of surface of the forest and terrain underneath, respectively.The (X-P)-band vertical information was combined with P-band backscatter, which was considered to be a measure of vegetation density, to estimate biomass.It was argued that such an approach mimics the most "natural" way of calculating volume, on the basis of the modeling framework that is reported in [59].

Pathways of Biomass Estimation Approaches Based on SAR and InSAR Data
The literature survey showed that both parametric and non-parametric models are feasible to retrieve forest biomass with mono-and multi-frequency SAR data.Non-parametric models are quite performing and could be a natural candidate if the aim is to deliver AGB estimates that are to be used as one of several layers in environmental studies, policy making, predictions, forecasting, etc. If, instead, the focus is on having an algorithm that is robust to environmental conditions and forest structure, attention should be given to parametric models.Strengths and weaknesses of the biomass retrieval approaches surveyed in this study and their prospects are summarized in Table 7.The synoptic view of biomass retrieval approaches that are based on SAR data should not be taken as conclusive with respect to which is best suited for a given investigation.The large range of retrieval statistics that are reported in the surveyed papers was not conclusive with respect to which specific model or equation is most performing.The performance of a biomass retrieval scheme has to be judged in its entirety, meaning that a very powerful retrieval scheme will not perform if the input data is sub-optimal to the scope of retrieving biomass.The same applies if one has collected data that are potentially suitable for retrieving biomass but the algorithmic aspects are poorly characterized, i.e., the algorithms do not extract the information on biomass in the input data.It is here important to remark that the choice of a retrieval approach is often constrained by data availability.In [60], biomass for the northern hemisphere was estimated with the parametric model in Equation (2).For training the model, a solution that does not rely on in situ measurements of biomass had to be developed to account for the unavailability in several regions of the area mapped.The paucity of in situ observations implied that several assumptions had to be done in order to train the model and achieve biomass estimates being at the same time reliable and spatially consistent.Having to relax on the modeling framework had the consequence that the retrieval performance was judged to be inferior to what could be expected if the model had been trained locally with in situ data, so to adapt to local environmental conditions at the time of image acquisition and local structural features of the vegetation.While the rules underpinning model training and biomass retrieval are being established for large-scale biomass estimation [15,25,60,61] using SAR backscatter data or coherence data, there seems to be a need to further investigate the spatial variability of the intriguing relationship between InSAR height and biomass [37,38,[40][41][42].The integration of multiple observations from InSAR and SAR backscatter at different frequencies have not been attempted yet and should begin deserving more attention despite the lack of data that are truly suitable for complementing as in the study proposed by [58].
Uncertainty of biomass estimation has seldom been approached [60,62,63].In such cases, error models for each of the terms that are involved in the retrieval procedure have been presented and discussed.The uncertainty of the retrieved biomass based on one or a few observations of the SAR backscatter was reported to be too large to consider the estimate meaningful (at L-band) [62].Averaging the SAR backscatter over adjacent pixels improved the uncertainty of the biomass estimates following a strong reduction of uncertainty of SAR observation [63], which is considered to be the largest contribution to the overall retrieval uncertainty [62].Alternatively, large stacks of weakly correlated observations, such as those that were obtained at C-band, can improve the uncertainty with respect to a retrieval based on a single observation [60].
Table 7. Summary of retrieval approaches surveyed in this study with an outlook on their performance when using multi-frequency synthetic aperture radar (SAR) data as input.

Approaches for Estimating the Biomass in Trunks, Branches and Foliage
The backscatter that is received by a radar from forested terrain, σ 0 for , may be described as the sum of four scattering mechanisms, each contributing with more or less power to the total backscatter that the radar receives: with σ 0 g representing the backscatter from the forest floor, σ 0 c the backscatter from the canopy, σ 0 tg trunk-ground interactions, and σ 0 cg crown-ground interactions.It therefore makes sense to postulate that forest above-ground biomass can be estimated starting from estimates of its components in stem, branches, and foliage.Theoretical forest scattering models have been developed based on such description, e.g., [64][65][66][67], so to predict radar backscatter at different wavelengths, polarizations, and incidence angles as a function of size, orientation, and dielectric properties of the major tree constituents, and hence provide a framework for analyzing the expected effect of tree architectural differences and varying biomasses in trunks, branches, leaves, and needles on backscatter.Overall, modeling results agree in that with increasing radar wavelength, the scattering from larger tree constituents gains importance; nonetheless, the modeled backscatter signal was hardly dominated by a single scatterer type/backscatter contribution consistently in all of the studies surveyed [64,[68][69][70][71].As a result, scattering theory suggested that: 1.
specific radar configurations should be best suited for the retrieval of a particular biomass compartment (i.e., biomass in foliage, large and small branches, and trunk), dependent on how exclusively scattering from forest at a certain wavelength and polarization is associated with a single scatterer type and scattering mechanism, 2.
the performance of the retrieval of total above-ground biomass with any single wavelength and polarization is constrained by the inherent correlations between the biomass compartments the radar senses to the total above-ground biomass, and 3.
the performance of the retrieval of above-ground biomass should benefit from the use of multiple wavelengths and polarizations since each maximizes the sensitivity to the biomass in different compartments.
So far, only few studies have attempted to verify the hypotheses that are formulated above with actual SAR data and even fewer have developed retrieval algorithms that seek to optimize the retrieval of total above-ground biomass through a combined use of multi-frequency and multi-polarization SAR data for estimating the biomass in different compartments of trees [46,47,49,55,[72][73][74][75][76].All studies were conducted at temperate and boreal forest sites across North America, almost exclusively with data acquired by AIRSAR and SIR-C/X-SAR during the 1990s.Given the small number of studies that are addressing the retrieval of compartment biomass, we provide a brief review of each, focusing on the highlights from an algorithmic point of view.
Under the assumption that L-HV backscatter is more closely related to basal area and height, AGB was estimated by first estimating these two attributes from L-band radar data and then applying allometric equations to convert the height and basal area to biomass [72].The advantage of estimating biomass indirectly via basal area and height was however not demonstrated.Multiple linear regression models relating multi-frequency and polarization backscatter intensities to the logarithm of compartment biomass were developed in [75].A stepwise regression analysis showed that most of the variability of biomass in different compartments was explained with P-band backscatter in all of the polarizations and L-band backscatter in HV polarization.This study suggested that, when estimating branch biomass from SAR, the biomass in other compartments as well as the total above-ground biomass could be estimated via allometric relationships.It was found that the approach of estimating biomass compartments indirectly via allometric relationships and SAR derived branch biomass performed better than a retrieval based on models relating the SAR backscatter to the biomass compartment of interest directly.A similar approach for estimating compartment and total biomass was presented in [73].Total biomass was estimated by summing up the canopy and trunk biomass estimates, the latter being estimated from the SAR derived estimates of basal area and height with the aid of ancillary information on the tree species' wood density and taper functions describing the trunks' shape.The rationale of the approach also followed the idea that backscatter in different wavelengths and polarizations is most correlated to specific biomass compartments.A comparison of the performance of different approaches for the retrieval of total above-ground biomass and the biomass in different tree components with multi-frequency radar was presented in [55].The two methods in which the total above-ground biomass was estimated indirectly via branch biomass [75] or via basal area, height, and crown biomass [73] performed slightly better than the direct retrieval of total above-ground biomass.
Knowledge of the biomass in different tree components is important for understanding the dynamics and impacts of forest fires, in particular, in fire prone ecosystems, such as boreal forests or savannahs [49].Therefore, investigating the use of airborne multi-polarization C-, L-, and P-band imagery for estimating key forest biophysical parameters with respect to forest fire, such as the biomass in trunks and crowns (including the biomass in foliage, branches, as well as biomass of non-forest vegetation such as sagebrush), canopy bulk density (i.e., the crown biomass per unit crown volume), and the foliage moisture content.
While all of the studies discussed so far were relating backscatter to total and compartment biomass using empirical models, a semi-empirical modeling was followed in [76] for estimating crown and trunk biomass via the crown and trunk water content.It was argued that radar backscatter is as much a function of the tree architecture and biomass distribution across trunks, branches, and foliage as it is a function of the water content, which primarily drives the dielectric properties of vegetation.An inversion targeting the vegetation' moisture content may therefore be more adaptive to the time-variant moisture influence on backscatter.A semi-empirical model, which was considering direct crown as well as crown-ground and trunk-ground scattering, was formulated that expressed backscatter as function of crown and trunk moisture content.
Model calibration and the estimation of compartment and total above-ground biomass from multi-frequency SAR data generally rely on the availability of in situ measurements.An alternative approach, first proposed by [77] and tested by [78], is to link forest succession and scattering models to predict the backscatter response to changing tree architecture and biomass across an entire chrono-sequence of forest growth and for a wide range of site conditions, forest management practices, disturbance regimes, etc.In [78], a GAP model was deployed, which predicts tree-and population-level forest dynamics by simulating individual tree birth, growth, and mortality given specific site conditions (e.g., in terms of temperature, light availability, soil moisture, soil fertility) and the competitive behavior of species.Such type of forest models enabled stand-level predictions of forest structural aspects that were relevant for modeling radar backscatter.The forest model output, together with ancillary information on branch and foliage geometry and dielectric constants of tree components, was then used to model the SAR backscatter at co-polarization C-, L-and P-band based on a scattering model developed by [65].Linear models were calibrated based on forest model predictions of above-ground biomass and the associated scattering model predictions of backscatter to relate co-polarization backscatter at P-, L-, and C-band to total above-ground biomass.A direct application of the model for estimating above-ground biomass from airborne radar imagery was not feasible due to the systematic offsets between modeled and observed backscatter, in particular, in the case of P-band.However, after cross-calibrating the model with the aid of forest plots for which forest structure agreed with forest model predictions of structure, allowed for estimating the above-ground biomass from the radar imagery with reasonable accuracy and comparable to the performance of a model that was calibrated directly with in situ data.
The majority of studies that are discussed above suggested that the use of multi-frequency SAR data for estimating the biomass in different tree compartments as well as the total above-ground biomass allows for improved estimation accuracies.Even though one might expect that the independent modeling of multi-frequency backscatter as function of compartment biomass should allow for a better capturing complex, non-linear relationships between compartment and total above-ground biomass throughout forest succession, the results that are presented so far were overall inconclusive with respect to the assumption that total above-ground biomass may best be estimated via independent estimates of the biomass in different tree compartments.Only the results that were presented by [75] for loblolly pine forests demonstrated that estimating total above-ground biomass via radar-based estimates of compartment biomass (i.e., branch biomass) performed better than the direct estimation of total biomass.In [55], instead, rather minor improvements were reported when estimating total biomass with the sum of radar derived estimates of trunk and crown biomass or via an allometric relationship between branch and total above-ground biomass when compared to the direct approach.In addition, radar configurations that were identified as being ideal for estimating the biomass in different tree compartments did not always comply with expectations from scattering theory.
The inconclusiveness of the results may be associated with the following three factors.

•
The high inherent correlation between biomass compartments and the total biomass complicates the identification of causative relationships between the biomass in different compartments and multi-frequency/-polarization backscatter.

•
Environmental conditions (soil moisture, canopy moisture, freeze/thaw) at the time of image acquisition could introduce backscatter variations that have a magnitude similar to the backscatter changes associated with changing biomass; in addition, they may alter the relative contribution of different scattering mechanisms and obscure the underlying correlations between backscatter and compartment biomass.Only a few of the existing studies were concerned with the retrieval of compartment biomass interpreted their results in light of the prevalent imaging conditions [55].

•
While the modeling results suggested that the backscatter is often dominated by a single scattering mechanism, the correlation analyses between backscatter and compartment biomass in the studies that are discussed above did not present clear evidence for this.Differences between modeled and actually observed backscatter were in many cases significant [71,79].
Each of the factors above deserves further investigation to clarify whether the retrieval of biomass from SAR data is better addressed by estimating biomass compartments rather than predicting directly total biomass.At this stage, the choice of the specific modeling framework to estimate biomass is considered to be of minor importance.

Conclusions
This paper summarized the results of a literature survey on forest biomass retrieval with SAR backscatter and interferometric SAR data with the scope of identifying pathways of research and suggesting future advances.Pathways clearly depend on two factors: data and models.
Single sensor observations are useful in understanding to which extent biomass can predicted with the configuration of the sensor (frequency, polarization, look direction, spatial resolution, thermal noise, etc.).Nonetheless, the key for biomass retrieval appears to be the combination of data from multiple sources.A multi-frequency SAR perspective is an advance with respect to a mono-spectral retrieval, which is still dominant in the remote sensing community.Assuming that a given SAR frequency and polarization sense a particular component of a forest, the predictors of a multi-frequency retrieval approach bring along a more complete description of the forest biomass to the retrieval model when compared to the case of single-sensor approach.As spaceborne SAR observations are spanning from X-to L-band, and will potentially extend to P-band in the future, there are substantial reasons for considering this pathway as realistically improving the accuracy of biomass retrieval and reducing uncertainties.A multi-frequency retrieval solution, however, shall be seen in a wider perspective where polarization, look direction, and spatial resolution are best combined to maximize the extraction of biomass-related information from the SAR data.Combination of InSAR observations and SAR backscatter observations are here envisaged; a limited number of observables from current and future spaceborne SAR missions may be able to characterize biomass to a level of error and uncertainty (e.g., 20% and 50%, based on the best results in the surveyed papers) of great appeal to science communities that are requesting spatially explicit estimates of carbon pools in vegetation.In addition, we encourage repeated acquisition of multi-frequency SAR datasets by the different missions.Since the training phase adapts the retrieval model to the SAR observable, the environmental conditions at the time of image acquisition are somewhat transferred to the biomass estimate.Exploiting the temporal features of the observations can reduce such conditioning and allow for maximizing the extraction of the biomass-related information from the set of input observations.
The reasoning on complementing datasets to improve the retrieval can be expanded by bringing in data that are acquired at other, non-strictly, microwave frequencies.Optical data can, for example, allow for the stratification of species or complement SAR observations to retrieve biomass; we foresee substantial advances on the rather novel radar-optical synergy with the increasing amount of repeated observations by currently orbiting sensors, primarily Sentinels.Even more, complementing backscatter observations, which reflect the horizontal structure of a forest, i.e., the density, with observations of vertical structure, such as those provided by PolInSAR, TomoSAR, and LiDAR, would provide a three-dimensional representation of vegetation, thus potentially leading to improved estimates when compared to using a single observable.Again, the multi-frequency and multi-temporal perspective on the integration of datasets is seen as an asset to ensure the best possible performance of the retrieval.In our opinion, it is beyond the scope of this paper to discuss perspectives of integration of datasets.
Entering an era that is characterized by a multitude of SAR observations demands future researches on biomass retrieval to: (1) identify which datasets carry which information on biomass; (2) best extract such information from each dataset; and, (3) combine such information to provide an as accurate as possible estimate of biomass.It is also believed that investigating the retrieval of biomass components from individual frequencies targets the three points listed above and should be considered as a main line of research.Such approaches are seen primarily as a step forward towards a full characterization of biomass in retrieval models; this shall be considered to be achieved once the entire range of observations describing forest structure and forest functioning are encompassed.The increasing amount of SAR data that are publically available and the forthcoming launch of several spaceborne missions, all having as explicit goal the mapping of forest biomass, is expected to boost developments on biomass retrieval approaches and substantially increase the number of peer-reviewed publications on forest biomass retrieval in the next 10-15 years.

Figure 1 .
Figure 1.Detailing the temporal distribution of research papers in terms of year of publication.

Table 1 .
Number of studies per sensor/platform.Sensors are listed alphabetically.

Table 2 .
Number of studies per frequency band or group of frequency bands.

Table 3 .
Number of studies grouped in terms of SAR observable.

Table 5 .
Number of retrieval investigations per forest ecosystem or zone.

Table 6 .
Number of retrieval investigations per type of biomass being retrieved.The unit of measurement is included for completeness.