TomoSAR Mapping of 3D Forest Structure: Contributions of L-Band Conﬁgurations

: Synthetic Aperture Radar (SAR) measurements are unique for mapping forest 3D structure and its changes in time. Tomographic SAR (TomoSAR) conﬁgurations exploit this potential by reconstructing the 3D radar reﬂectivity. The frequency of the SAR measurements is one of the main parameters determining the information content of the reconstructed reﬂectivity in terms of penetration and sensitivity to the individual vegetation elements. This paper attempts to review and characterize the structural information content of L-band TomoSAR reﬂectivity reconstructions, and their potential to forest structure mapping. First, the challenges in the accurate TomoSAR reﬂectivity reconstruction of volume scatterers (which are expected to dominate at L-band) and to extract physical structure information from the reconstructed reﬂectivity is addressed. Then, the L-band penetration capability is directly evaluated by means of the estimation performance of the sub-canopy ground topography. The information content of the reconstructed reﬂectivity is then evaluated in terms of complementary structure indices. Finally, the dependency of the TomoSAR reconstruction and of its structural information to both the TomoSAR acquisition geometry and the temporal change of the reﬂectivity that may occur in the time between the TomoSAR measurements in repeat-pass or bistatic conﬁgurations is evaluated. The analysis is supported by experimental results obtained by processing airborne acquisitions performed over temperate forest sites close to the city of Traunstein in the south of Germany.


Introduction
Forest (canopy) structure is commonly defined as "the organization in space and time, including the position, extent, quantity, type and connectivity, of the aboveground components of vegetation" [1][2][3][4][5]. Mapping forest structure is critical for understanding the history, function and future of forest ecosystems. Indeed, forest structure expresses forest state, functionality, biodiversity and evolution, and is an indicator of the successional stage and development as well as sustainability and habitability [6][7][8]. Due to this, it is an important parameter for assessing forest productivity, biomass and biodiversity [9][10][11][12]. Finally, the knowledge of structure evolution in time (due to anthropogenic or natural disturbance, regeneration, growth, etc.) supports the modelling of the function and development of forest ecosystems and the development of accurate and robust forest biomass estimators [13].
Remote sensing techniques have the potential to map the 3D distribution of vegetation compartments and of their changes at large scales with spatial and temporal continuity. Today, only lidar and Synthetic Aperture Radar (SAR) can provide 3D information with metric resolution and repeatedly in time and contribute to the observation and quantitative characterization of forest structure at large scales [14,15]. While lidar configurations allow a more or less dense sampling of accurate (vertical) structure measurements, SAR configurations realize imaging with wide swath widths enabling continuous global scale 1. Penetration and information content (Section 5)-First, the ability of L-band to penetrate into and through the canopy down to the ground is assessed directly in terms of both the ground-to-volume power ratio and the estimation performance for the location of the underlying ground. Second, the capability in deriving physical forest structure indices from L-band TomoSAR reflectivity is discussed within the chosen structure framework. In this, the understanding of the role of the availability of multiple polarization channels is of particular importance.

2.
TomoSAR imaging geometry (Section 6)-Conditioned to the (TomoSAR) reconstruction algorithm, different TomoSAR geometries in terms of the distribution of tracks/ orbits in the direction orthogonal to the line of sight, and the number of images lead to different vertical resolutions and quality (e.g., in terms of ambiguities) for the reconstructed reflectivity. The imaging geometry depends primarily on flight/orbital constraints, but also changes across a scene with the incidence angle variation from near-to far-range. The impact of such changes on the spatial gradients the retrieved structure descriptors is evaluated. 3.
Technical implementation (Section 7)-At present, there is no configuration that allow a simultaneous acquisition of all TomoSAR images. As a consequence, any configuration is affected by changes of the 3D reflectivity in time. While these changes can be neglected in typical L-band airborne acquisitions where short repeat-pass times are possible, they become relevant in space borne ones. For this, conventional repeatpass and bistatic TomoSAR implementations are compared in terms of their ability to retrieve physical structure information in presence of temporal reflectivity changes.
Finally, the conclusions are drawn in Section 8. The analysis is supported by experimental results achieved in the framework of recent airborne SAR (DLR's F-SAR platform) campaigns over two forest sites close to the city of Traunstein in south Germany.

The TomoSAR Inversion Problem
In monostatic TomoSAR acquisitions K single-look complex (SLC) images are collected with slightly different incidence angles in N P polarimetric channels along displaced tracks/orbits. For a fixed polarization ω and range-azimuth coordinate, after coregistration, compensation of the range dependent phase component (e.g., flat-Earth compensation), phase calibration and spectral shift filtering [25], the complex amplitudes of the different images are collected in the K-dimensional vector y(ω). The associated TomoSAR covariance matrix is R(ω) := E y(ω)y H (ω) , where E{·} and (·) H indicate the statistical expectation and the Hermitian operators, respectively. The generic (m, n)th element of R(ω) can be written in terms of the underlying vertical reflectivity profile F(z, ω) as [14,16]: where κ Z,k is the vertical wavenumber [25,26] associated to the kth image. The TomoSAR inversion problem concerns the estimation of F(z, ω) by means of the inversion of the Fourier relationship in Equation (1) applied to an estimate of R(ω) in a multilook cell consisting of N independent samples (e.g., looks). The maximum available wavenumber determines the Rayleigh vertical resolution of the reconstruction [14,16,26]:

Challenges of TomoSAR Imaging of Forest Volumes at L-Band
Due to penetration, L-band (back-)scattering occurs at multiple heights within the volume, and it is reasonable to assume continuous and extended vertical reflectivity profiles. For this reason, the TomoSAR inversion problem is typically underdetermined as the number of the available images K is lower than the number of profile amplitudes required to be estimated for an appropriate characterization of F(z, ω).
As a rule of thumb, for Rayleigh vertical resolutions in the order of 10 m and not too irregularly distributed wavenumbers, non-parametric spectral estimators are a reasonable choice in terms of final reconstruction performance [26]. The Capon spectral estimator improves the vertical resolution and the sidelobe attenuation of a direct Fourier spectral estimator [26][27][28][29], and provides a data-adaptive estimate F C (z, ω): a(z) contains the height (z)-dependent phase differences of all images with respect to the reference one. Its kth element is [a(z)] k = e jκ Z,k z . Even though widely employed, the tomographic formulation of (3) does not match to forest volumes by design [26,28,29]. The expected resolution and sidelobe attenuation improvements refer to a (deterministic) Dirac-δ approximation of the scattering contribution at each height. However, in a forest volume every single scattering contribution cannot be considered isolated from the others. As a consequence, in order to deal with the underdetermination of the inversion, the Capon optimization provides the maximum reflectivity value among the possible ones, introducing biases in the radiometric reconstruction of the (forest) volume scattering contributions [30]. This is independent of the track distribution, meaning that an improvement in the Rayleigh vertical resolution is in general not sufficient to improve the radiometric accuracy. Additional non-idealities like an inaccurate knowledge of R(ω) (in terms of a biased estimation) and/or a(z) (due to residual track positioning errors or atmospheric propagation) results into additional radiometric inaccuracies, that can finally end up even in underestimating stronger contributions (self-cancellation) [27,28]. However, they can be mitigated by diagonally loading R(ω) in h(z, ω), thus reducing the adaptivity and tending in the limit to a Fourier estimate [24]. There are alternative approaches to better account for the presence of volumes. The first one relies on a parametric description (e.g., modeling) of the reflectivity function. Such an approach is used for instance for the generalized Capon formulations in [31] and [32], which are also related to more general covariance matching approaches [29]. The second approach does not model the reflectivity profile but constrains the solution (sub-)space of the TomoSAR inversion. This is the case of the Polarization Coherence Tomography [33] and Compressive Sensing [34][35][36] approaches. The physical significance of the model parameters and of the constrained solution space directly affect the physical significance and the accuracy of the final reflectivity estimates. To date there are no established full electromagnetic models nor appropriate (sub-)spaces fully supporting an accurate TomoSAR profile inversion especially at L-band.

Physical Information Extraction from the Reflectivity Profiles: Distribution of Peaks
The potentials and limitations of TomoSAR reconstructions are today understood and well assessed experimentally [26]. However, the interpretation of reflectivity profiles in terms of 3D physical structure attributes is not as established at any of the frequencies and in particular not at L-band. Two aspects make this interpretation challenging: • At a fixed wavelength, the relative scattering contribution of the different vegetation elements with respect to each other depends on the viewing geometry (e.g., the incidence angle), the polarization, but also on their dielectric properties, which are relevant at L-band [37]. There are not appropriate electromagnetic models able to describe these dependencies accurately enough to provide a transfer function between the 3D distribution of vegetation elements and 3D reflectivity. Additionally, the limited resolution of conventional SAR systems does not allow to distinguish single trees, preventing a direct correspondence between TomoSAR reflectivity and conventional (single-tree based) structural descriptors established in forestry and ecology.

•
The TomoSAR acquisition configuration and profile reconstruction algorithm(s) concur to determine both (i) the resolution at which the profile peaks corresponding to scattering contributions (i.e., the scatterer mainlobes) can be distinguished from each other in height, and (ii) the level of the sidelobes of each of them. Weaker scattering contributions are particularly penalized as their mainlobe can be confused with, or even masked by, the sidelobes of stronger contributions. The regularity (e.g., distribution) of the wavenumbers plays a critical role, but also the temporal distribution of the acquisitions compared to the changes of reflectivity [26] is significant.
as well after detecting the meaningful ones. For instance, within a height interval of interest, the meaningful peaks at the lowest and highest heights can be used to extract ground topography [22,42] and top height [43], respectively. Moreover, it has been shown by models and experiments that the 3D distribution of the identified meaningful peaks reflects the variability of the vegetation elements within a stand [18,[44][45][46], supporting the assumption of their physical significance. In this framework, the ability to identify the meaningful peaks corresponding to "true" scattering contributions is critical and depends on the TomoSAR reconstruction algorithm as well as on the acquisition configuration.

Coherence Matrix Model
Without losing generality, the estimation of a normalized F(z, ω) from the inteferometric complex coherence matrix Γ(ω) associated to R(ω) is considered.
where W is a diagonal matrix with [W] k,k = [R(ω)] k,k . As coherence values are normalized by the SLC intensity, the methodologies and analyses reported in the following are independent of radiometric correction factors applied e.g., to reduce modulations of the backscattered power induced by slopes. For Γ(ω) a number of model assumptions are considered. First of all, the received signal is modelled as the sum of the backscattered signal with power P S (ω) and receiver white noise with power P N (ω). As a consequence: where SNR(ω) = P S (ω)/P N (ω), and I denotes the identity matrix. In the absence of temporal decorrelation (i.e., no reflectivity changes during the TomoSAR acquisition time), the signal coherence matrix Γ S (ω) can in turn be written as the sum of a ground and a volume contribution as for example addressed by the so called Random-Volume-over-Ground (RVoG) model [47,48]: where µ(ω) is the ground-to-volume amplitude ratio, and Γ G = a(z G )a H (z G ) with z G the ground height. The RVoG model assumes that the volume-only reflectivity is but for a scaling factor the same across all polarimetric channels, and that the ground reflectivity is described by a Dirac-δ function [47,48]. Equations (5) and (6) can be put together in a compact form as: (7) is formally valid also in the presence of residual phase calibration errors. Let ϕ be a Kdimensional vector containing the phase errors at the different images. Usually, ϕ is assumed to be a zero-mean Gaussian distributed vector with covariance matrix σ 2 ϕ I [49,50]. In presence of phase errors, R(ω) becomes [49,50].

Ground Topography
The hypothesis of a Dirac-δ ground reflectivity implies that the ground scattering contribution induces at z G a peak in the reflectivity profile. As a consequence, z G can be estimated from a TomoSAR profile as the position of the meaningful peak at the lowest height. Meaningful peaks are supposed to exceed an amplitude threshold T. The choice of an appropriate value of T for distinguishing the ground peak from the nearby sidelobes is addressed in the following.
If no diagonal loading is applied, the Capon profile estimate can be written in a compact form as: where F C,V N (z) is the Capon profile associated to the volume-only under the RVoG hypothesis and noise components: Considering only the z range around and below the ground height z G and neglecting the volume contributions means approximating Γ V N ≈ (1 − α)I, leading to: where PSF(z − z G ) = a H (z)a(z G ) 2 /K 2 indicates the TomoSAR point spread function centered around z G , and corresponds to the Fourier profile of a point-like scatterer located at z G . Therefore, the TomoSAR PSF determines the position and the amplitudes of the sidelobes of F C (z, ω) in the selected height interval. For detecting the ground, it is sufficient to set the following threshold: corresponding to the highest sidelobe amplitude expressed by the PSL (peak-sidelobe level). The use of the PSL in Equation (13) is appropriate as long as the highest sidelobe is next to the mainlobe in the PSF, as in the reported experiment. Should this not be the case, the amplitude of the PSF sidelobe closest to the mainlobe should be used instead.
Complications arise if the volume-only reflectivity is not negligible around the ground height z G . This may introduce a systematic bias in the position of the ground peak if the first derivative of volume-only profiles does not equal zero at z G . At the same time, an optimization of the threshold value can be carried out only by making assumptions on Γ V and/or on the level of the corresponding sidelobes. Finally, there is a minimum ground-to-volume ratio µ min required for the detection of the ground. Assuming now both the noise ( α → 1 ) and the ground estimation bias to be negligible, it results: where T is now a generic threshold.

Horizontal and Vertical Structure Indices
Indices based on the 3D distribution of the identified meaningful peaks between ground and canopy top within a given area on ground (referred as structure window in the following) can be used to interpret TomoSAR profiles in terms of structural heterogeneity in the horizontal and vertical dimensions. Let P = {p 1 , p 2 , . . . , p P } be the ensemble of the P peaks in the structure window (excluding the ground peaks), and Z = {z 1 , z 2 , . . . , z M } the ensemble of the corresponding M unique peak heights. A horizontal structure index HS can be defined as [16,17]: where: where n(P top ) is the number of elements of a subset P top of P constituted by all the peaks with height in the top (vegetation) layer, A is the area of the structure window and HS 0, max is a (reference) maximum value. The top layer corresponds to the range of heights above ε·max(Z ), where ε < 1 is a usually empirically defined factor. HS 0 expresses then the density of peaks in the top layer: the larger the number of peaks, the denser (homogeneous) the stands and HS tends to 0. HS tends to 1 in the opposite case, indicating a sparser (heterogenous) stand. At the same time, a vertical structure index VS can be defined as [23,24]: where var{Z } is the variance of the peak heights in Z and VS 0, max is a reference normalization value. Similarly to HS, VS increasing to 1 corresponds to increasing vertical heterogeneity. In [22,24], it has been shown that the indices in Equations (15)-(17) agree in different forest types with equivalent indices estimated from lidar data. Even more, they correlate with equivalent indices used in forestry and ecology derived from field measurements linking space occupation with tree size. In particular, HS correlates with the stand density index (closely related to the basal area) and VS with the standard deviation of the diameter at breast height [24]. The calculation of the structure indices in Equations (15)-(17) presupposes the accurate identification meaningful peaks, also here usually implemented through an amplitude threshold. Different than in the case of ground topography estimation, no reliable a priori assumptions can be made for the volume and a reasonable threshold value can be only empirically set. However, an intrinsic robustness against missed detection and false alarms can actually be expected, as Equations (15)-(17) rely on statistics of P top and Z, and can be further increased by increasing the window area A for a fixed TomoSAR profile resolution. P and M will increase accordingly, and the possible sidelobes included in the selection are expected to play only a secondary role. However, this could limit structure windows to larger values, reducing its significance for heterogenous stands [24].

Test Sites and Data Sets
The experimental analyses in the next Sections have been carried out by processing TomoSAR airborne L-band data sets acquired by the DLR's F-SAR platform over two temperate forest sites located in the proximity of Traunstein in the south of Germany. The location of the two sites is depicted in Figure 1a.
The first site is in a locality called Froschham. The topography is essentially flat around 590 m above sea level, and canopy heights reach 40 m. Management activities aim at transforming even aged monospecific stands into uneven aged mixed species stands. This transition has been captured by the TomoSAR acquisition. Homogenous forest stands dominate the eastern part (appearing in near range and lower azimuth distances in the Pauli RGB composite image in Figure 1b), while the western part (higher azimuth distances and/or far range) is characterized by multi-layered mixed stands and complex stand structure. This clear differentiation extending for several hectares makes this site ideal for investigating the structural information content of L-band TomoSAR profiles, and its sensitivity to the acquisition geometry reported in Sections 5 and 6. In 2015, an extensive ground measurement campaign took place in a 25 ha plot that has been included in the ForestGEO network [51]. In addition to the extensive ground measurements, in 2016 fine-beam airborne Lidar data were collected over this site and are used in this paper for comparison. The TomoSAR acquisition took place in June 2017. In this experiment, K = 15 images have been acquired along uniformly distributed tracks with a horizontal displacement between 5 and 70 m at a reference flight height of about 3000 m above mean terrain height. The data set has been processed by a standard repeat-pass interferometric processor, including co-registration, flat-Earth and terrain phase compensation and residual phase calibration [52]. The processed range and azimuth SLC resolutions are 1.3 m and 0.6 m, respectively. The final Rayleigh vertical resolution varies between 3 m in near and 9 m in far range. The acquisition parameters are summarized in Table 1 extensive ground measurement campaign took place in a 25 ha plot that has been included in the ForestGEO network [51]. In addition to the extensive ground measurements, in 2016 fine-beam airborne Lidar data were collected over this site and are used in this paper for comparison. The TomoSAR acquisition took place in June 2017. In this experiment, = 15 images have been acquired along uniformly distributed tracks with a horizontal displacement between 5 and 70 m at a reference flight height of about 3000 m above mean terrain height. The data set has been processed by a standard repeat-pass interferometric processor, including co-registration, flat-Earth and terrain phase compensation and residual phase calibration [52]. The processed range and azimuth SLC resolutions are 1.3 m and 0.6 m, respectively. The final Rayleigh vertical resolution varies between 3 m in near and 9 m in far range. The acquisition parameters are summarized in Table 1.   The second site is the Traunstein Bürgerwald forest, indicated by the southern polygon in Figure 1a. Through the years, the Traunstein forest is being reconverted from a homogeneous one-age forest to a structurally rich, heterogeneous forest. Most forest stands are complex in terms of tree species richness and heterogeneous stand structures. The topography ranges from 630 to 720 m above sea level and is general flat with localized steep slopes. Forest top heights reach 40 m. The "close-to-nature" silviculture is reflected in five growth stages distributed across the site according to field inventories [53]. Young stands are constituted predominantly by young trees with low density. Density increases in the growing forest stands, which include scattered taller trees above the young shorter ones. Stands in a transition stage are characterized by regrowth of shorter trees below an older and taller vegetation layer. Mature stands are homogenous with a dense top canopy layer. Finally, the so-called "Plenterwald"stands are very heterogeneous in terms of species and ages. The German term "Plenterwald" (selection cutting) indicates forest stands undergoing management procedures aiming at boosting structure heterogeneity in a constant mixture of trees of different species and ages. In 2014, The DLR's F-SAR system Remote Sens. 2021, 13, 2255 9 of 29 flew four times in more than two months on May 20, June 10, June 16 and July 28 [26]. A Pauli RGB composite image is shown in Figure 1c. Each time the same slightly irregular TomoSAR configuration consisting of 5 tracks displaced in the horizontal direction by up to 25 m was acquired (see Table 1 for additional acquisition parameters). However, deviations from the nominal tracks resulted into more or less significant distortions of the PSF. The largest one occurred on June 10, and the related PSF can locally exhibit sidelobes up to just −4 dB with respect to the main lobe [26]. In order to minimize the TomoSAR performance across the four sets, the corresponding data vectors have been interpolated to a common uniform wavenumber distribution using the interpolator in [54]. The interpolation process was implemented exactly as discussed in [55] and was constrained by the individual volume heights fixed here according to a fine-beam lidar acquisition carried out in 2012. Each data set was collected in less than 1 h and given the calm wind conditions temporal decorrelation between the individual tracks is assumed negligible. As the four data sets span a time interval of two months, they are suitable to evaluate different TomoSAR implementations in terms of the provided robustness of structure indices to changes of the vertical reflectivity profiles induced by seasonality in Section 7. The weather stations in Nilling and Schönharting located within a 20 km radius from the Bürgerwald reported very similar weather conditions for all four campaign days [26].

Reference Lidar Structure Indices
The lidar data, e.g., the ground topography (digital terrain model, DTM) and canopy height (canopy height model, CHM), have been projected in the slant range geometry of the TomoSAR acquisition. The lidar top forest height, estimated as the maximum height within a 5 × 5 m range-azimuth cell, is shown in Figure 2a for the Froschham site. Canopy height profiles have been generated as the histograms of heights of all the lidar returns in 5 × 5 m. These profiles have then been used to calculate the structure indices according to Equations (15)-(17) within a (moving) 50 m × 50 m structure window. Past analyses have shown that this window size allows to map physically relevant structure variations on this site and at the same time to aggregate a statistically significant number of independent profiles (in this case about 100) to allow a meaningful calculation of the structure indices [24]. The obtained HS and VS are shown in Figure 2b,c, respectively. According to the previous analyses in [23,24], ε = 0.6 was chosen. The normalization factors HS 0, max and VS 0, max correspond to the maximum HS 0 and VS 0 obtained in the area covered by the lidar acquisition. HS assumes larger values where the height varies faster in space, as occurring in the stands with isolated trees located at the center of the site. The taller mixed stands at either larger azimuth distances or far range show the highest horizontal homogeneity. Being composed by multilayered vegetation, these stands exhibit also the higher VS values. At the same time, the monospecific stands in near range and lower azimuth distances appear homogeneous, consistent with the management practices in this site. Notice that in this area, a high correspondence between the lidar-derived indices and the ones based on field measurements has been found and analyzed in [24].
Remote Sens. 2021, 13, x FOR PEER REVIEW 10 of 30 lidar acquisition. assumes larger values where the height varies faster in space, as occurring in the stands with isolated trees located at the center of the site. The taller mixed stands at either larger azimuth distances or far range show the highest horizontal homogeneity. Being composed by multilayered vegetation, these stands exhibit also the higher values. At the same time, the monospecific stands in near range and lower azimuth distances appear homogeneous, consistent with the management practices in this site. Notice that in this area, a high correspondence between the lidar-derived indices and the ones based on field measurements has been found and analyzed in [24].  Figure 3a,b show the Capon profiles in the HH and HV channels over the transect along azimuth depicted in Figure 1b, respectively, obtained by processing the coherence matrix of the full stack of images ( = 15) using a 20 × 20 m multilook cell (corresponding to around 500 nominal independent looks). For the visualization, the lidar DTM has been compensated, and the height axis is thus referred to the ground height. The ground scattering is, as expected, more visible in HH than in HV. Increasing the azimuth distance this transect crosses monospecific stands (until 1 km) with constant top height of around 20 m, then a bare area and finally multilayered stands reaching up to 30 m. The volume scattering extends in height and has typically one or two peaks at each azimuth coordinate. Nevertheless, a (slightly) increasing profile complexity can be noticed moving from the homogeneous to the heterogeneous stands.   Figure 1b, respectively, obtained by processing the coherence matrix of the full stack of images (K = 15) using a 20 × 20 m multilook cell (corresponding to around 500 nominal independent looks). For the visualization, the lidar DTM has been compensated, and the height axis is thus referred to the ground height. The ground scattering is, as expected, more visible in HH than in HV. Increasing the azimuth distance this transect crosses monospecific stands (until 1 km) with constant top height of around 20 m, then a bare area and finally multilayered stands reaching up to 30 m. The volume scattering extends in height and has typically one or two peaks at each azimuth coordinate. Nevertheless, a (slightly) increasing profile complexity can be noticed moving from the homogeneous to the heterogeneous stands.

Profiles and Penetration
Using the two-layer RVoG model, the N P polarimetric channels can be combined coherently and used to separate ground and volume scattering contributions. For this, model (6) can be extended in the polarimetric case by means of a sum of Kronecker products [56]: where R P is the covariance matrix associated to the KN P dimensional data vector y P (stack of the single-polarization data vectors y(ω)), "⊗" denotes the Kronecker matrix product and C G and C V are the polarimetric covariance matrices of the ground and the volume, respectively. Model (18) is ambiguous by its own nature: an ensemble of Γ G , Γ V , C G and Remote Sens. 2021, 13, 2255 11 of 29 C V can be combined to provide the same R P . A least squares optimization can only provide candidate estimates of Γ G and Γ V [56]: where R 1 and R 2 are two (K × K)dimensional matrices obtained from the singular value decomposition of a permutated version of R P [56]. Analogous expressions can be found for C G and C V . The scalars a and b vary in intervals that make the four matrices positive semi-definite and define the ambiguity of the reconstruction. Therefore, the elements of Γ G and Γ V in the same positions vary on two segments belonging to the same line in the complex plane, consistently with the usual polarimetric interferometric (Pol-InSAR) geometric interpretation of the RVoG model [56]. For each segment, ground and volume solutions closer to the Pol-InSAR coherence region ( Γ G,CR and Γ V, CR ) and closer to the unitary circle ( Γ G,UC and Γ V, UC ) can be found in the complex plane. The coherence region contains all the possible interferometric coherences resulting from a linear combination of the polarimetric channels for a fixed interferometric baseline.
Remote Sens. 2021, 13, x FOR PEER REVIEW 11 of 30 Using the two-layer RVoG model, the polarimetric channels can be combined coherently and used to separate ground and volume scattering contributions. For this, model (6) can be extended in the polarimetric case by means of a sum of Kronecker products [56]: where R is the covariance matrix associated to the −dimensional data vector y (stack of the single-polarization data vectors y(ω)), "⨂" denotes the Kronecker matrix product and C and C are the polarimetric covariance matrices of the ground and the volume, respectively. Model (18) is ambiguous by its own nature: an ensemble of Γ , Γ , C and C can be combined to provide the same R . A least squares optimization can only provide candidate estimates of Γ and Γ [56]: Figure 3c,d show the Capon profiles corresponding to Γ G,CR and Γ V, CR for the selected azimuth line. It is apparent that the ground contribution is stronger in Γ G,CR than in HH, and weaker in Γ V, CR than in HV. However, a residual volume component remains in Γ G,CR and a residual ground in Γ V, CR . Consistent with the underlying RVoG assumption in the optimized solution (19) there are no differences in the appearance of the volume components in the different profiles. On the other hand, given the high Rayleigh vertical resolution and the number of images available, the attenuation or amplification of the ground scattering component does not change significantly the null-placing mechanism of the Capon estimator.
Moving on the segments from Γ G,CR towards Γ G,UC amplifies further the ground scattering contribution but does not cancel the volume (Figure 3e). In contrast, moving from Γ V, CR to Γ V, UC cancels almost totally the ground scatterer, together with all the vegetation contributions close to the ground (Figure 3f). Only the contributions close to the top remain. It is worth noting that being the solutions closest to the unitary circle solutions, Γ G,UC and Γ V, UC are also the ones tending more to ill-conditioning. For this reason, the corresponding Capon profiles tend to be noisier.
The profiles in Figure 3 show clearly the change and sensitivity of the L-band To-moSAR profiles as a function of polarization. Remembering (6), in the frame of the RVoG the polarimetric dependency can be characterized quantitatively by estimating the variation of µ(ω) across ω. Under the Dirac-δ assumption for the ground reflectivity, a sufficient condition to estimate a unique Γ V by regularizing the Sum-of-Kronecker-Product decomposition (19) is to use the ground height. A detailed description and discussion of this is reported in [57]. Here, the lidar DTM has been used. The regularization procedure proposed in [57] provides estimates of Γ V close to Γ V, CR in the majority of cases. Notice that this solution is particularly meaningful also from a physical point of view: the corresponding points within the unitary circle are the most robust ones with respect to the number of images used in the optimization. In contrast, Γ G,UC and Γ V, UC come closer to the coherence region at the change of the number of images and/or image geometry [58].
The estimation of C G and C V follows the estimation of Γ V by applying a least-squares optimization. The diagonal elements of C G and C V are the separated ground and volume powers in the different polarimetric channels and additionally allow the calculation of the ground-to-volume ratios (µ HH , µ VV and µ HV ) in the HH, HV and VV channels. Finally, the classical polarimetric contrast optimization can be applied to find the polarimetric channel with the maximum and minimum ground-to-volume ratios µ max and µ min [48,59]. The maps of µ HH , µ HV and µ max are shown in Figure 4, and they are consistent with the differences seen in the profiles of Figure 2. By comparing these maps with the lidar height and structure maps in Figure 2, a relationship between µ HH and µ max with height becomes apparent: the taller the stands, the more attenuated the ground scattering. This agreement is not visible in µ HV as the ground scattering is weaker. On the other hand, no relevant correlation between ground-to-volume ratios and (geometric) structure stands out. A larger role is certainly attributed to the dielectric properties of the different stands [14,37]. The distributions of µ HH , µ HV , µ min and µ max are shown in Figure 5. It is apparent that the change of polarization can change the ground-to-volume ratio up to 20 dB. µ HH and µ HV are separated by around 5 dB, and the optimization of polarization increases µ HH up to 3 dB.
As a final remark, it is worth to be noted that there is basically no difference between the Capon profiles obtained in the polarimetric channels corresponding to µ max and µ min and the ones corresponding to Γ G,CR and Γ V, CR , respectively, across the whole site. This suggests a possible physical interpretation for these two candidate solutions. From a more practical point of view, it makes also possible to obtain coherence matrices (and profiles) in the extreme µ cases without resorting to a more or less arbitrary regularization similar as proposed in [60]. However, this conclusion may not be valid at different frequencies and/or different forest types. Remote Sens. 2021, 13, x FOR PEER REVIEW 13 of 30  As a final remark, it is worth to be noted that there is basically no difference between the Capon profiles obtained in the polarimetric channels corresponding to and and the ones corresponding to Γ , and Γ , , respectively, across the whole site. This suggests a possible physical interpretation for these two candidate solutions. From a more practical point of view, it makes also possible to obtain coherence matrices (and profiles) in the extreme cases without resorting to a more or less arbitrary regularization similar as proposed in [60]. However, this conclusion may not be valid at different frequencies and/or different forest types.

Estimation of the Ground Height
The possibility to obtain accurate ground height estimates from the L-band Capon profiles is addressed. As described in Section 3.2, the ground height is estimated as the minimum location of the profile peaks above an amplitude threshold. A mean SNR equal to 25 dB and a residual phase calibration error with 10° standard deviation has been considered. The application of the adaptive threshold (13) requires the accurate knowledge of the ground-to-volume ratio for which the ground height is needed (see Section 5.2). For this a constant threshold value = 0.3(corresponding to = −3dB) has been used, thus accepting a minimum detectable (ω) as in (14). The possibility to maximize the groundto-volume ratio by combining different polarizations becomes critical for increasing the   As a final remark, it is worth to be noted that there is basically no difference between the Capon profiles obtained in the polarimetric channels corresponding to and and the ones corresponding to Γ , and Γ , , respectively, across the whole site. This suggests a possible physical interpretation for these two candidate solutions. From a more practical point of view, it makes also possible to obtain coherence matrices (and profiles) in the extreme cases without resorting to a more or less arbitrary regularization similar as proposed in [60]. However, this conclusion may not be valid at different frequencies and/or different forest types.

Estimation of the Ground Height
The possibility to obtain accurate ground height estimates from the L-band Capon profiles is addressed. As described in Section 3.2, the ground height is estimated as the minimum location of the profile peaks above an amplitude threshold. A mean SNR equal to 25 dB and a residual phase calibration error with 10° standard deviation has been considered. The application of the adaptive threshold (13) requires the accurate knowledge of the ground-to-volume ratio for which the ground height is needed (see Section 5.2). For this a constant threshold value = 0.3(corresponding to = −3dB) has been used, thus accepting a minimum detectable (ω) as in (14). The possibility to maximize the groundto-volume ratio by combining different polarizations becomes critical for increasing the

Estimation of the Ground Height
The possibility to obtain accurate ground height estimates from the L-band Capon profiles is addressed. As described in Section 3.2, the ground height is estimated as the minimum location of the profile peaks above an amplitude threshold. A mean SNR equal to 25 dB and a residual phase calibration error with 10 • standard deviation has been considered. The application of the adaptive threshold (13) requires the accurate knowledge of the ground-to-volume ratio for which the ground height is needed (see Section 5.2). For this a constant threshold value β = 0.3 (corresponding to µ = −3 dB) has been used, thus accepting a minimum detectable µ(ω) as in (14). The possibility to maximize the ground-to-volume ratio by combining different polarizations becomes critical for increasing the robustness and the range of µ(ω) and allowing the correct detection of the ground even if using with a suboptimum threshold (like in this case), or an unfavorable array configuration in terms of PSL, or for a small volume reflectivity at the ground level. In this case, the strongly polarized ground response at L-band (as seen in Section 5.2) is favorable. Finally, it is worth noting that the discussion on the determination of amplitude threshold in Section 3.2 assumed a very large, theoretically infinite, number of looks. When the number of looks becomes finite, the threshold values need to be increased to protect from random fluctuations of the profile amplitudes by, e.g., (a fraction of) the standard deviation of the profile amplitudes (see [27] for the Capon estimator). To account for this, a 20% increased threshold margin is considered.
The obtained ground height error maps estimated against the lidar DTM obtained by processing the full stack of images (K = 15) with a 20 × 20 m multilook cell and for HH, HV and Γ G,CR are shown in Figure 6a-c. The corresponding histograms are plotted in Figure 7a, together with the ones corresponding to the VV and Γ G,UC estimates. In HV there are several areas with an estimation error larger than 5 m. A certain trend from near to far range can be observed as a consequence of the worsening of the Rayleigh vertical resolution. Overall an estimation bias of around 1.3 m and a standard deviation around 4.5 m are obtained. The errors reduce significantly in HH as the number of cases in which an "underground" sidelobe is misinterpreted as the ground mainlobe decreases. The bias is almost zero and the standard deviation reduces to 2.7 m. A maximization of the ground-to-volume ratio to Γ G,CR improves the estimates in a few cells. The bias does not improve when Γ G,UC is considered, however the standard deviation increases to 3.5 m as a result of the profile noise introduced by the poorly conditioned coherence matrix. The consistency of these results indicates the validity not only of the threshold formulated in Equation (12), but also of the assumptions behind it, at least in a statistically sense for the whole site.
the strongly polarized ground response at L-band (as seen in Section 5.2) is favorable. Finally, it is worth noting that the discussion on the determination of amplitude threshold in Section 3.2 assumed a very large, theoretically infinite, number of looks. When the number of looks becomes finite, the threshold values need to be increased to protect from random fluctuations of the profile amplitudes by, e.g., (a fraction of) the standard deviation of the profile amplitudes (see [27] for the Capon estimator). To account for this, a 20% increased threshold margin is considered.
The obtained ground height error maps estimated against the lidar DTM obtained by processing the full stack of images ( = 15) with a 20 × 20 m multilook cell and for HH, HV and Γ , are shown in Figure 6a-c. The corresponding histograms are plotted in Figure 7a, together with the ones corresponding to the VV and Γ , estimates. In HV there are several areas with an estimation error larger than 5 m. A certain trend from near to far range can be observed as a consequence of the worsening of the Rayleigh vertical resolution. Overall an estimation bias of around 1.3 m and a standard deviation around 4.5 m are obtained. The errors reduce significantly in HH as the number of cases in which an "underground" sidelobe is misinterpreted as the ground mainlobe decreases. The bias is almost zero and the standard deviation reduces to 2.7 m. A maximization of the groundto-volume ratio to Γ , improves the estimates in a few cells. The bias does not improve when Γ , is considered, however the standard deviation increases to 3.5 m as a result of the profile noise introduced by the poorly conditioned coherence matrix. The consistency of these results indicates the validity not only of the threshold formulated in Equation (12), but also of the assumptions behind it, at least in a statistically sense for the whole site.  Finally, Figure 8a shows the distribution of the (mean) ground estimation error in absolute value obtained by using the ground-optimized Γ , in the plane defined by the corresponding and the volume-only Capon amplitude , ( )estimated at the lidar ground. For this diagram, only the estimates corresponding to the "true" ground peak (i.e., the closest to the reference lidar ground) have been considered. The black curve indicates the limit of (14) and represents the minimum ground-to-volume ratio at which a peak corresponding to the ground scattering can be detected in the Capon profiles for a given fixed threshold. All points considered are above the detection limit, and the height error increases in correspondence to a low and / or , ( ). If , ( ) = 0, the minimum detected ground is for = −13 dB and the height error is about 4 m. Errors Finally, Figure 8a shows the distribution of the (mean) ground estimation error in absolute value obtained by using the ground-optimized Γ G,CR in the plane defined by the corresponding µ max and the volume-only Capon amplitude F C,V (z G ) estimated at the lidar ground. For this diagram, only the estimates corresponding to the "true" ground peak (i.e., the closest to the reference lidar ground) have been considered. The black curve indicates the limit of (14) and represents the minimum ground-to-volume ratio at which a peak corresponding to the ground scattering can be detected in the Capon profiles for a given fixed threshold. All points considered are above the detection limit, and the height error increases in correspondence to a low µ max and/or F C,V (z G ). If F C,V (z G ) = 0, the minimum detected ground is for µ max = −13 dB and the height error is about 4 m. Errors lower than or equal 2 m are obtained already for µ max = −10 dB independently of F C,V (z G ). Finally, Figure 8a shows the distribution of the (mean) ground estimation error in absolute value obtained by using the ground-optimized Γ , in the plane defined by the corresponding and the volume-only Capon amplitude , ( )estimated at the lidar ground. For this diagram, only the estimates corresponding to the "true" ground peak (i.e., the closest to the reference lidar ground) have been considered. The black curve indicates the limit of (14) and represents the minimum ground-to-volume ratio at which a peak corresponding to the ground scattering can be detected in the Capon profiles for a given fixed threshold. All points considered are above the detection limit, and the height error increases in correspondence to a low and / or , ( ). If , ( ) = 0, the minimum detected ground is for = −13 dB and the height error is about 4 m. Errors lower than or equal 2 m are obtained already for = −10 dB independently of , ( ).

Estimation of Horizontal and Vertical Structure
The horizontal and vertical structure concept and formulations described in Section 3.3 have been applied to the L-band Capon profiles obtained with the full TomoSAR stack ( = 15). The identification of the meaningful peaks in the following experiments follows the approach proposed in [61] where it was shown that considering all peaks with an amplitude until 6 dB smaller than the maximum profile peak is sufficient to distinguish among the different structure types in Froschham.

Estimation of Horizontal and Vertical Structure
The horizontal and vertical structure concept and formulations described in Section 3.3 have been applied to the L-band Capon profiles obtained with the full TomoSAR stack (K = 15). The identification of the meaningful peaks in the following experiments follows the approach proposed in [61] where it was shown that considering all peaks with an amplitude until 6 dB smaller than the maximum profile peak is sufficient to distinguish among the different structure types in Froschham.
In order to get some insights on the capability of the Capon profiles at different polarizations to describe structure differences, the transects in Figure 9a-d show the number of profile peaks retrieved in a 50 m × 50 m structure window for each azimuthheight bin in the same azimuth line as considered in Figure 3. Each bin measures 5 m in azimuth and 1 m in height. The underlying profiles have been calculated using 10 × 10 m multilook cells. This provides 25 independent profiles for each structure window, which is good enough for the qualitative comparisons of peak distributions. Additionally, the chosen multilook cell provides a reasonable margin for well-conditioned estimates of the polarimetric covariance matrices R P . In HH (Figure 9a), the detected peaks are more concentrated around the ground and the few detected volume peaks are mainly close to the canopy top. Yet, they show an increasing height variance from left to right reflecting the increasing vertical structure complexity when moving from the homogeneous monospecies stands to the heterogeneous multilayered stands. This gradient is clearer in HV (Figure 9b). Peaks are still concentrated around the ground, however the increased sensitivity to the volume scattering turns into an increased concentration of peaks close to the canopy top in the homogeneous and to a wider height dispersion in the heterogenous stands with respect to HH. Polarimetric optimization appears to have little advantage in terms of structure differentiation. Considering Γ V,CR (Figure 9c) the number of peaks close to the canopy top increases, but with no significant change with respect to their density in the top layer or their vertical distribution with respect to HV. Finally, using Γ V,UC (Figure 9d) minimizes the density of peaks at the ground level, but increases the vertical dispersion of the volume peaks (already noticed in Figure 3). This is expected to reduce the vertical structure gradient, similarly to HH.  The horizontal and vertical structure indices defined in Section 3.3 have been calculated as well using only the peaks 5 m above the ground. Only the HH and HV channels are considered, as the previous qualitative analysis of the distributions of the peak heights has shown that an optimization of the polarization channel does not provide any significant advantage. As the profile estimation deals with smaller coherence matrices, a smaller multilook window can be employed and the number of independent profiles within the structure cell increases. A of 5 m × 5 m multilook, the smallest possible for obtaining still well-conditioned coherence matrices was chosen. The obtained TomoSAR horizontal structures are shown in Figure 10a  The horizontal and vertical structure indices defined in Section 3.3 have been calculated as well using only the peaks 5 m above the ground. Only the HH and HV channels are considered, as the previous qualitative analysis of the distributions of the peak heights has shown that an optimization of the polarization channel does not provide any significant advantage. As the profile estimation deals with smaller coherence matrices, a smaller multilook window can be employed and the number of independent profiles within the structure cell increases. A of 5 m × 5 m multilook, the smallest possible for obtaining still well-conditioned coherence matrices was chosen. The obtained TomoSAR horizontal structures are shown in Figure 10a,b. Both the HH and HV maps show a very similar behavior of HS across the scene, with the highest values showing off for the isolated sparse stands in the center. In near range, the lowest values of HS are obtained for the tall multilayered stands, and some larger values in the lower half. This is in strong agreement with the lidar map in Figure 3b. The time difference between the lidar and TomoSAR acquisition is not expected to bias the comparison as the forest growth would in the order of 1 m. At the same time there are of course also areas with a lower level of agreement as a result of the differences in wavelength, resolution, and looking geometry of lidar and SAR imaging. The differences between the maps amplify when moving from near to far range with increasing incidence angle. The TomoSAR HS becomes larger (by around 20%) than the lidar one, although still denoting homogeneity. This might be due to the reduction of vertical resolution with increasing incidence angle: both, the number and the location of the peaks may change accordingly (as will be discussed in Section 6). The best agreement between the lidar and TomoSAR HS has been found in HV with a correlation coefficient of around 0.84, as shown in the 2-D histogram in Figure 11b. The slight overestimation as a function of range reflects in the positive bias for a lidar HS below 0.2. In HH (Figure 11a), a (small) positive bias is distributed across the whole range of HS values, due to the lowest concentration of peaks close to the canopy top with respect to HV (see Figure 9a,b). a result of the differences in wavelength, resolution, and looking geometry of lidar and SAR imaging. The differences between the maps amplify when moving from near to far range with increasing incidence angle. The TomoSAR becomes larger (by around 20%) than the lidar one, although still denoting homogeneity. This might be due to the reduction of vertical resolution with increasing incidence angle: both, the number and the location of the peaks may change accordingly (as will be discussed in Section 6). The best agreement between the lidar and TomoSAR has been found in HV with a correlation coefficient of around 0.84, as shown in the 2-D histogram in Figure 11b. The slight overestimation as a function of range reflects in the positive bias for a lidar below 0.2. In HH (Figure 11a), a (small) positive bias is distributed across the whole range of values, due to the lowest concentration of peaks close to the canopy top with respect to HV (see Figure 9a The vertical structure index has been evaluated as well. The obtained range-azimuth TomoSAR maps with = 15 in HH and HV are shown in Figure 12a,b and their a result of the differences in wavelength, resolution, and looking geometry of lidar and SAR imaging. The differences between the maps amplify when moving from near to far range with increasing incidence angle. The TomoSAR becomes larger (by around 20%) than the lidar one, although still denoting homogeneity. This might be due to the reduction of vertical resolution with increasing incidence angle: both, the number and the location of the peaks may change accordingly (as will be discussed in Section 6). The best agreement between the lidar and TomoSAR has been found in HV with a correlation coefficient of around 0.84, as shown in the 2-D histogram in Figure 11b. The slight overestimation as a function of range reflects in the positive bias for a lidar below 0.2. In HH (Figure 11a), a (small) positive bias is distributed across the whole range of values, due to the lowest concentration of peaks close to the canopy top with respect to HV (see Figure 9a  The vertical structure index VS has been evaluated as well. The obtained rangeazimuth TomoSAR maps with K = 15 in HH and HV are shown in Figure 12a,b and their comparison with the lidar derived VS of Figure 1c in Figure 13a,b. Overall, a good agreement between the TomoSAR and lidar derived structure indices. In both cases the monospecific homogenous stands can be distinguished very well from the multilayered ones. The decrease of vertical resolution with range induces here a decrease of VS, that can be observed as a (slight) negative bias in the 2-D histogram for intermediate lidar VS values. Structure estimates in HH have a slightly larger dispersion than in HV, with a tendency to underestimation in accordance with the transects of Figure 9.
Remote Sens. 2021, 13, x FOR PEER REVIEW 19 of 30 comparison with the lidar derived of Figure 1c in Figure 13a,b. Overall, a good agreement between the TomoSAR and lidar derived structure indices. In both cases the monospecific homogenous stands can be distinguished very well from the multilayered ones. The decrease of vertical resolution with range induces here a decrease of , that can be observed as a (slight) negative bias in the 2-D histogram for intermediate lidar values. Structure estimates in HH have a slightly larger dispersion than in HV, with a tendency to underestimation in accordance with the transects of Figure 9.

Effect of the Acquisition Geometry at L-Band
The experimental results in Section 5 show that using an ideal L-band acquisition configuration, i.e., (nominally) uniformly distributed tracks/orbits and high vertical Rayleigh resolution, accurate estimates of structure parameters can be achieved allowing the discrimination of stands with different structure characteristics. However, under less ideal conditions this might not be anymore the case. In this Section, the robustness of the characterization of structure from Capon profiles is evaluated in a reduced scenario in which comparison with the lidar derived of Figure 1c in Figure 13a,b. Overall, a good agreement between the TomoSAR and lidar derived structure indices. In both cases the monospecific homogenous stands can be distinguished very well from the multilayered ones. The decrease of vertical resolution with range induces here a decrease of , that can be observed as a (slight) negative bias in the 2-D histogram for intermediate lidar values. Structure estimates in HH have a slightly larger dispersion than in HV, with a tendency to underestimation in accordance with the transects of Figure 9.

Effect of the Acquisition Geometry at L-Band
The experimental results in Section 5 show that using an ideal L-band acquisition configuration, i.e., (nominally) uniformly distributed tracks/orbits and high vertical Rayleigh resolution, accurate estimates of structure parameters can be achieved allowing the discrimination of stands with different structure characteristics. However, under less ideal conditions this might not be anymore the case. In this Section, the robustness of the characterization of structure from Capon profiles is evaluated in a reduced scenario in which

Effect of the Acquisition Geometry at L-Band
The experimental results in Section 5 show that using an ideal L-band acquisition configuration, i.e., (nominally) uniformly distributed tracks/orbits and high vertical Rayleigh resolution, accurate estimates of structure parameters can be achieved allowing the discrimination of stands with different structure characteristics. However, under less ideal conditions this might not be anymore the case. In this Section, the robustness of the characterization of structure from Capon profiles is evaluated in a reduced scenario in which only K = 6 images acquired along uniformly distributed tracks are available, but with a maximum displacement amounting to 1/3 of the full-stack case. As a consequence, the Rayleigh vertical resolution becomes 21 m at mid-range, i.e., three times worse than in the full-stack case. The largest volume occupies now around two height resolution units. The underlying L-band penetration and sensitivity to structure components does not change, but the ability to characterize them does. Figure 14 shows the Capon profiles obtained along the same azimuth line of Figure 3 with the reduced stack. The reduced vertical resolution is apparent, and not always peaks at different heights are separated from each other, additionally reducing the capability to distinguish the homogenous from the multilayered stands at the two sides of the transect.
Remote Sens. 2021, 13, x FOR PEER REVIEW 20 of 30 only = 6 images acquired along uniformly distributed tracks are available, but with a maximum displacement amounting to 1/3 of the full-stack case. As a consequence, the Rayleigh vertical resolution becomes 21 m at mid-range, i.e., three times worse than in the full-stack case. The largest volume occupies now around two height resolution units. The underlying L-band penetration and sensitivity to structure components does not change, but the ability to characterize them does. Figure 14 shows the Capon profiles obtained along the same azimuth line of Figure 3 with the reduced stack. The reduced vertical resolution is apparent, and not always peaks at different heights are separated from each other, additionally reducing the capability to distinguish the homogenous from the multilayered stands at the two sides of the transect. The effect of the estimation of the ground height using the methodology discussed in Section 3 and implemented as discussed in Section 5 is evaluated first. Figure 6d shows the round height estimation error obtained using Γ , , i.e., the maximized ground-tovolume ratio. A residual estimation error increasing from near to far range (e.g., in the direction of decreasing vertical resolution) is now apparent when compared to full-stack case. As the volume height compared to the Rayleigh resolution becomes lower, the overall bias increases to 3 m and the standard deviation to 6 m. The height error histograms in Figure 7b show that the poorer the vertical Rayleigh resolution, the stronger the ground contribution needs to be in order to achieve a certain ground height estimation performance. This makes the polarimetric optimization of the ground-to-volume ratio more important. Indeed, the optimization of the ground-to-volume ratio reduces the volume bias on the ground estimates reaching a minimum of around 2.3 m if the candidate solution closer to the unit circle is chosen. The increase of the standard deviation with respect to the better conditioned Γ , is negligible. Further, the error distribution in Figure 7b shows that a ground estimation error lower than 5 m can in this case be achieved with > −10 dB provided that the volume-only Capon profiles have some contribution at the ground level. The bias associated to the loss of resolution increases the volume-only power measured by the Capon estimator at the ground with respect to the full-stack case. Recalling (14), this helps the detection of a ground peak with < −10 dB, but with a significant loss of performance.
The estimation of the structure indices and now relies on the 3D distributions of peaks extracted from profiles with lower vertical resolution than in the full-stack case. The examples in Figure 9e-h show the changes across the polarization channels similar to the ones in the full-stack case. However, in the taller stands the reduction of resolution causes a larger concentration of peaks closer to the canopy top, since peaks at lower height are more difficult to be separated from the ground one. In turn, all the intermediate values of become lower as shown in Figure 10c. On the other hand, volume peaks in shorter stands might be missed simply because they are not resolved from the ground. As a consequence, can now reach higher values in these stands. Overall, the TomoSAR is slightly positive biased and more dispersed with respect to the lidar , but the two are still in agreement ( Figure 11c). As expected, a distribution of volume peaks in a The effect of the estimation of the ground height using the methodology discussed in Section 3 and implemented as discussed in Section 5 is evaluated first. Figure 6d shows the round height estimation error obtained using Γ G,CR , i.e., the maximized ground-to-volume ratio. A residual estimation error increasing from near to far range (e.g., in the direction of decreasing vertical resolution) is now apparent when compared to full-stack case. As the volume height compared to the Rayleigh resolution becomes lower, the overall bias increases to 3 m and the standard deviation to 6 m. The height error histograms in Figure 7b show that the poorer the vertical Rayleigh resolution, the stronger the ground contribution needs to be in order to achieve a certain ground height estimation performance. This makes the polarimetric optimization of the ground-to-volume ratio more important. Indeed, the optimization of the ground-to-volume ratio reduces the volume bias on the ground estimates reaching a minimum of around 2.3 m if the candidate solution closer to the unit circle is chosen. The increase of the standard deviation with respect to the better conditioned Γ G,CR is negligible. Further, the error distribution in Figure 7b shows that a ground estimation error lower than 5 m can in this case be achieved with µ max > −10 dB provided that the volume-only Capon profiles have some contribution at the ground level. The bias associated to the loss of resolution increases the volume-only power measured by the Capon estimator at the ground with respect to the full-stack case. Recalling (14), this helps the detection of a ground peak with µ max < −10 dB, but with a significant loss of performance.
The estimation of the structure indices HS and VS now relies on the 3D distributions of peaks extracted from profiles with lower vertical resolution than in the full-stack case. The examples in Figure 9e-h show the changes across the polarization channels similar to the ones in the full-stack case. However, in the taller stands the reduction of resolution causes a larger concentration of peaks closer to the canopy top, since peaks at lower height are more difficult to be separated from the ground one. In turn, all the intermediate values of HS become lower as shown in Figure 10c. On the other hand, volume peaks in shorter stands might be missed simply because they are not resolved from the ground. As a consequence, HS can now reach higher values in these stands. Overall, the TomoSAR HS is slightly positive biased and more dispersed with respect to the lidar HS, but the two are still in agreement ( Figure 11c). As expected, a distribution of volume peaks in a narrower height interval leads to lower VS values when compared to the full-stack case (Figure 12c), thus reducing the range of structure types that can be distinguished from each other. Similarly to HS, despite the slight negative bias and an increased standard deviation, the TomoSAR derived VS still agrees with the lidar derived VS (Figure 13c). Finally, consistently with the full-stack case, it appears that any polarization optimization does not really improve the ability to distinguish the different structure types from each other.

Effect of the TomoSAR Acquisition Implementation
Up to now the discussion has been based on TomoSAR acquisitions performed within few hours, guaranteeing the stationarity of the reflectivity. In these cases, even if the scatterers move due to wind, introducing temporal decorrelation, the underlying vertical reflectivity profile does not change. However, with longer observation times, the vertical reflectivity profile can change as a result of changing geometric and dielectric properties of the scattering distribution induced by, e.g., rainfalls, droughts, seasonality, disturbance, etc. Data sets of (realistic) space borne TomoSAR implementations will build up of images acquired within weeks or even months. In this case, reflectivity changes can occur from acquisition to acquisition, and the stationarity of the underlying reflectivity within the tomographic set is not anymore given [37]. Depending on the amount and nature of change, the resulting temporal decorrelation (in the sense of non-stationarity) of the underlying reflectivity can cause the defocusing of the estimated vertical profile independently of the used reconstruction algorithm with a consequential resolution and sidelobe level degradation [62].
The effect of such TomoSAR defocusing on the extraction of structure information from the estimated reflectivity profiles is addressed in this Section with respect to two different temporal decorrelation scenarios associated to two different TomoSAR acquisition implementations. In a conventional repeat-pass implementation each SLC image is acquired at a different time. In a bistatic implementation [20,26,63,64] each acquisition consists of (practically) simultaneously acquired pairs of images where each pair is acquired at a different time and with a different vertical wavenumber. Differently from the repeat-pass case, in the bistatic case temporal decorrelation (including the change of the underlying reflectivity) may occur between image pairs, but not within the same pair. Figure 15 shows the interferometric coherences between the same reference tracks at different days together with the corresponding (residual) vertical wavenumber κ Z resulting from deviations from the nominal track. In addition, here the coherences have been calculated over 20 × 20 m range-azimuth multilook windows, corresponding to 500 (nominally) independent looks. The closer to zero κ Z is, the more the interferometric coherences express only temporal decorrelation. The highest coherences are obtained for the smallest time differences, i.e., between June 10 and June 16. Over baser areas the coherence is around 0.7 in average. In forested areas, the coherence decreases to 0.4 in those tall stands with large(r) κ Z . Since volumetric and temporal effects cannot be separated, this value may actually contain a significant volume decorrelation contribution. This conclusion is supported by the work in [65]. The lowest coherences are obtained for the larger time difference, i.e., between May 20 and July 28. In this case, bare areas exhibit low coherence (around 0.35), while in the forest stands the coherence does not drop below 0.35. However, a larger time interval does not necessarily imply a lower coherence. For instance, the coherence for the pair (J May 20, July 28,) reaches similar decorrelation levels to the ones for the pair (May 20, Jun 10) despite the 1 month longer separation. However, the decorrelation in the (May 20, June 10) pair might include also residual volumetric effects as indicated by the higher κ Z . The largest mean coherence in forested areas has been found for the pair (June 20, July 28) about 0.45. Capon profiles have been generated using the coherences matrices after interp in multilook cells of 5 × 5 m (corresponding to 30 nominal independent looks). Fol the same procedure as in Sections 3 and 5, the structure indices are estimated. Figu d shows the number of peaks in the HV channel in every 5 × 1 m azimuth-height bin an azimuth transect for the four acquisition dates. The transect crosses mature (u azimuth distance of 3 km) and "Plenterwald" stands. On May 20, a higher concen of peaks appears at ground level heights, while vegetation peaks tend to be sparse tributed. It is expected that the two forest types are more distinct in terms of vertica ture rather than of horizontal structure. As spring progresses towards summer (fro 20 to July 28), the profile peaks concentrate more and more towards the canopy top cially in the mature stands. These stands become now more distinguishable also in of horizontal heterogeneity. This agrees with what observed and reported in [3 could be an indication for the seasonal redistribution of water in the tree tissues: w Capon profiles have been generated using the coherences matrices after interpolation in multilook cells of 5 × 5 m (corresponding to 30 nominal independent looks). Following the same procedure as in Sections 3 and 5, the structure indices are estimated. Figure 16a-d shows the number of peaks in the HV channel in every 5 × 1 m azimuth-height bin along an azimuth transect for the four acquisition dates. The transect crosses mature (until an azimuth distance of 3 km) and "Plenterwald" stands. On May 20, a higher concentration of peaks appears at ground level heights, while vegetation peaks tend to be sparsely distributed. It is expected that the two forest types are more distinct in terms of vertical structure rather than of horizontal structure. As spring progresses towards summer (from May 20 to July 28), the profile peaks concentrate more and more towards the canopy top, especially in the mature stands. These stands become now more distinguishable also in terms of horizontal heterogeneity. This agrees with what observed and reported in [37] and could be an indication for the seasonal redistribution of water in the tree tissues: while in spring the water is more present in the tree trunks, in summer transpiration phenomena concentrate water towards the canopy top. There is a stronger reflectivity change between May 20 and June 10 than between the June and July, that agrees with the temporal decorrelation maps of Figure 17. spring the water is more present in the tree trunks, in summer transpiration phenomena concentrate water towards the canopy top. There is a stronger reflectivity change between May 20 and June 10 than between the June and July, that agrees with the temporal decorrelation maps of Figure 17.  A repeat-pass TomoSAR data set is synthesized by combining after interpolation SLC's from different days with wavenumbers decreasing with time (see Table 1). The transect obtained is shown in Figure 15e. Supported also by the analysis in [26], the defocusing caused by the presence of temporal decorrelation over such a large time span degrades the estimated volume scattering components making a distinction between them difficult if not impossible. As a result, the structural characteristics of the two stands cannot be retrieved anymore. The bistatic implementation has been emulated by considering a set of coherences with the same decrease of wavenumber in time similar as in the repeat-pass case. Following the procedure detailed in [20,26,63,64], a coherence matrix can be built up with a coherence set with uniformly distributed wavenumbers following a Toeplitz form. The Capon estimator can be applied to the reconstructed coherence matrix, and the structure indices calculated. The corresponding transect is shown in Figure 16f indicating that complex coherences measured in a bistatic implementation can make L-band structure characterization robust to the underlying temporal reflectivity changes. The change of the vertical distribution of peaks along azimuth now reflects the structural difference between the two stands. A repeat-pass TomoSAR data set is synthesized by combining after interpolation SLC's from different days with wavenumbers decreasing with time (see Table 1). The transect obtained is shown in Figure 15e. Supported also by the analysis in [26], the defocusing caused by the presence of temporal decorrelation over such a large time span degrades the estimated volume scattering components making a distinction between them difficult if not impossible. As a result, the structural characteristics of the two stands cannot be retrieved anymore. The bistatic implementation has been emulated by considering a set of coherences with the same decrease of wavenumber in time similar as in the repeat-pass case. Following the procedure detailed in [20,26,63,64], a coherence matrix can be built up with a coherence set with uniformly distributed wavenumbers following a Toeplitz form. The Capon estimator can be applied to the reconstructed coherence matrix, and the structure indices calculated. The corresponding transect is shown in Figure 16f indicating that complex coherences measured in a bistatic implementation can make L-band structure characterization robust to the underlying temporal reflectivity changes. The change of the vertical distribution of peaks along azimuth now reflects the structural difference between the two stands.
The comparison between the three TomoSAR configurations (single-pass, repeatpass and bistatic) is extended to the whole site in Figures 17 and 18 for the horizontal and vertical structure indices. The data set of July 28 has been reported as a reference "singlepass" one with no reflectivity changes between acquisitions. The spatial trends of both and correlate with the growth stage patterns in Figure 19a [23,53]. To help the interpretation, in Figure 19b the mean values of and for each stage are plotted one as a function of the other for the four acquisition dates. Mature and growing stands are comparable in terms of , but the decreasing density of the growing stands is reflected in a higher . increases further in the young stands due to the lower density where increases as well. These three stages are then well distinguishable from the transition and "Plenterwald" stands thanks to the large difference in . For the two latter stages, remains on intermediate values-larger than in mature stands, but lower than young ones. However, they are structurally equivalent. These characteristics are unaltered across the acquisition dates, but the single mean values of and can present some variability with acquisition time along the seasonal change. Recalling what already commented The comparison between the three TomoSAR configurations (single-pass, repeat-pass and bistatic) is extended to the whole site in Figures 17 and 18 for the horizontal and vertical structure indices. The data set of July 28 has been reported as a reference "singlepass" one with no reflectivity changes between acquisitions. The spatial trends of both HS and VS correlate with the growth stage patterns in Figure 19a [23,53]. To help the interpretation, in Figure 19b the mean values of VS and HS for each stage are plotted one as a function of the other for the four acquisition dates. Mature and growing stands are comparable in terms of VS, but the decreasing density of the growing stands is reflected in a higher HS. HS increases further in the young stands due to the lower density where VS increases as well. These three stages are then well distinguishable from the transition and "Plenterwald" stands thanks to the large difference in VS. For the two latter stages, HS remains on intermediate values-larger than in mature stands, but lower than young ones. However, they are structurally equivalent. These characteristics are unaltered across the acquisition dates, but the single mean values of HS and VS can present some variability with acquisition time along the seasonal change. Recalling what already commented for the transects in Figure 18, the two furthest points along the VS axis on July 28 represented by the mature and the "Plenterwald" stands become closer on May 20 and June 10 than on July 28. However, on July 28 mature and growing stands they become the closest in terms of HS, while they are the most separated ones in terms of HS in the May 20 and in the June 20 acquisitions. The spatial structural gradients (Figures 17b and 18b) and their relationships across the different growth stages (Figure 19c) are lost in the repeat-pass case as the defocusing drives many profile peaks below the amplitude threshold making almost all stands to appear heterogeneous horizontally and homogeneous vertically. In contrast, the robustness of the bistatic implementation to the underlying L-band reflectivity changes allows to retrieve meaningful structural relationships (Figures 17c, 18c and 19c) very well within the variability of the four data sets representing the single-pass case.
in terms of , while they are the most separated ones in terms of in the May 20 and in the June 20 acquisitions. The spatial structural gradients (Figures 17b and 18b) and their relationships across the different growth stages (Figure 19c) are lost in the repeat-pass case as the defocusing drives many profile peaks below the amplitude threshold making almost all stands to appear heterogeneous horizontally and homogeneous vertically. In contrast, the robustness of the bistatic implementation to the underlying L-band reflectivity changes allows to retrieve meaningful structural relationships (Figures 17c, 18c and 19c) very well within the variability of the four data sets representing the single-pass case.  in terms of , while they are the most separated ones in terms of in the May 20 and in the June 20 acquisitions. The spatial structural gradients (Figures 17b and 18b) and their relationships across the different growth stages (Figure 19c) are lost in the repeat-pass case as the defocusing drives many profile peaks below the amplitude threshold making almost all stands to appear heterogeneous horizontally and homogeneous vertically. In contrast, the robustness of the bistatic implementation to the underlying L-band reflectivity changes allows to retrieve meaningful structural relationships (Figures 17c, 18c and 19c) very well within the variability of the four data sets representing the single-pass case.

Discussion and Conclusions
In this paper, the potential of tomographically reconstructed L-band reflectivity for the (qualitative and quantitative) characterization of 3D forest structure has been reviewed and discussed not only with respect to the information content of the reconstructed reflectivity, but also regarding the tomographic acquisition configuration and implementation. The analysis has been carried out by processing airborne L-band TomoSAR data acquired over the temperate forest of Traunstein (south of Germany). The reconstruction was performed by applying the conventional Capon spectral estimator to the different data sets. The experiments in this paper confirm that L-band is able to penetrate until the ground even through the dense and tall forest stands, while remains sensitive enough to the distribution of the smaller canopy elements that determine the 3D structure. The horizontal and vertical forest structure is described by a pair of complementary indices derived from the distribution of the peaks of the reconstructed reflectivity within a given structure window (on the order of a quarter of hectare). The ability to fully characterize both dimensions has been confirmed by comparing the obtained index values with the ones derived by lidar and is supported by other experiments reported in the literature where also the correspondence to equivalent structure indices derived from ground inventory measurements has been established [24]. L-band structure measurements by means of TomoSAR are of course not limited to temperate forests and have already been demonstrated in tropical as well as in boreal forest sites [22]. A high(er) vertical resolution becomes critical in order to identify both the weak(er) ground and those relevant, vegetation scattering contributions in the lowest stand compartments [14,22]. Together with a large(r) number of images, it also facilitates the reconstruction of more complex reflectivity profiles [22,66]. The effect of non-optimal TomoSAR configurations in terms of lower vertical resolution and higher level of sidelobes on the reconstructed reflectivity can be counteracted by employing superresolution estimators like, e.g., compressive sensing. The characterization of structure also benefits from high(er) range-azimuth resolutions. On the one hand, the vertical reflectivity reconstruction is more accurate due to the larger number of independent looks available for a certain multilook cell size. On the other hand, the calculation of structure indices is more robust as each structure window includes a larger number of (independent) reflectivity samples (e.g., profiles) within an area that is still meaningful for heterogeneous stands.
Within the RVoG hypothesis, the change of the polarimetric channel implies a change of the ground-to-volume ratio. The "visibility" of the polarized ground at L-band makes this change significant. For a fixed vertical point-spread function defined by the TomoSAR configuration, the ability to increase the ground-to-volume ratio by changing the polarization channel becomes relevant because it (1) increases the detectability of the ground peak in the TomoSAR profiles regardless of the contribution of close to ground vegetation components to reflectivity and (2) improves the performance of estimating its position by reducing the vegetation-induced bias. In Froschham, the ground-to-volume ratio increases by 5 dB in average from HV to HH. For an ideal regular track configuration with vertical resolution better than 10 m, this increase allows a complete compensation of the vegetation-induced ground height bias and of the reduction of the standard deviation by around 40% (to lower than 3 m). Polarimetric interferometric [48,59] optimization techniques can increase the ground-to-volume ratio by 3 dB in average. This is in many cases enough to ensure a robust and accurate ground detection. For example, the estimation error reduces from 4 m in HH to 2 m. Polarization diversity becomes even more critical for those TomoSAR configurations with a rather low vertical resolution. The experiments in Froschham showed that for a reduced-stack configuration allowing a vertical resolution of about 20 m, the vegetation bias is reduced by almost 50% when changing from HV to HH. The polarimetric maximization of the ground-to-volume ratio reduces the bias by additional 30%, while the (very) limited resolution prevents any further improvement. The standard deviation reaches 6 m in this case.
Still under the RVoG hypothesis, even for an infinite TomoSAR vertical resolution, the choice of the polarization channel would have no effect on the estimation of the horizontal and vertical structure indices as the estimated vegetation reflectivity components would not change. This is indeed confirmed at a large extent by simply comparing the experimental results obtained in HH and HV using the full-stack configuration in the Traunstein forest. The small residual difference is caused by the performance of the TomoSAR imaging in focusing weak(er) scatterers (here volume components weaker in HH than in HV), but also by a sub-optimum amplitude threshold used for the selection of the meaningful peaks. As a consequence, the detected "meaningful" peaks in HH tend to be concentrated closer to the ground than in HV, resulting into a slight (10%) bias in the horizontal structure and a slightly higher dispersion in the vertical structure. A further polarimetric interferometric optimization to minimize the ground scattering contribution does not lead to any significant improvement compared to HV, as it is reasonable to expect. Two factors affect the structure indices values at a larger extent. First, a reduction of the Rayleigh vertical resolution reduces the range of values of the structure indices and/or their distribution, especially in the vertical direction. Employing a spatially larger structure window may increase robustness, as long as it is still appropriate to represent structure variations. If this is the case, then the Capon (or in some cases) even the Fourier-based reconstructions might be used. However, if a smaller structure window is needed, better performing estimators (like, e.g., compressive sensing) need to be employed for reconstructing the reflectivity. Second, it has been seen that the L-band sensitivity to seasonal changes, as for example caused by the re-distribution of the water content within the tree tissues occurring within in a time span of 2 months between spring and summer may change the peak distribution and lead to (slightly) different structure index values. However, this change does not significantly impact the capability to discriminate among different forest stands. This robustness is definitely enhanced by using reflectivity peak positions rather than reflectivity values itself [24,37].
The robustness to the temporal decorrelation effects induced by possible changes of the reflectivity in time also drives the choice of the implementation of the TomoSAR acquisitions in space borne scenarios. The experimental results have shown that the long-term seasonal variations at L-band violate the stationarity assumption, and a simple decay with time may not be sufficient to describe them. In contrast to a conventional repeat-pass implementation, a bistatic-like TomoSAR implementation in which a temporal decorrelation-free complex coherence measurement is obtained at each revisit appears to be robust against the seasonal change of vertical reflectivity in a 2-month time span, and able to restore the capability to discriminate the different structure types. It is worth noting that in the case of space borne implementations the ability to perform all the required acquisitions in the minimum possible time becomes relevant for minimizing the effects of seasonal reflectivity changes. For instance, a revisit time in the order of 1 week or less could allow the estimation of structure parameters at L-band even in a repeat-pass mode [65,66].
It is anticipated that the experiments described in this paper trigger a more general characterization of the potential of L-band SAR tomography for forest structure mapping. The presented structure framework is just a first step towards the interpretation of TomoSAR measurements in terms of physical 3D structure. The development of such a framework is essential not only in light of a number of planned or proposed L-band missions aiming at forest monitoring (e.g., NASA's NISAR [67], ESA's ROSE-L [68] and DLR's Tandem-L [69]), but also to better shape acquisition requirements and finally initiate novel information products able even to accommodate synergies and complementarities with measurements at different SAR and lidar wavelengths.