A Novel Four-Stage Method for Vegetation Height Estimation with Repeat-Pass PolInSAR Data via Temporal Decorrelation Adaptive Estimation and Distance Transformation

: Vegetation height estimation plays a pivotal role in forest mapping, which signiﬁcantly promotes the study of environment and climate. This paper develops a general forest structure model for vegetation height estimation using polarimetric interferometric synthetic aperture radar (PolInSAR) data. In simple terms, the temporal decorrelation factor of the random volume over ground model with volumetric temporal decorrelation (RVoG-vtd) is ﬁrst modeled by random motions of forest scatterers to solve the problem of ambiguity. Then, a novel four-stage algorithm is proposed to improve accuracy in forest height estimation. In particular, to compensate for the temporal decorrelation mainly caused by changes between multiple observations, one procedure of temporal decorrelation adaptive estimation via Expectation-Maximum (EM) algorithm is added into the novel method. On the other hand, to extract the features of amplitude and phase more effectively, in the proposed method, we also convert Euclidean distance to a generalized distance for the ﬁrst time. Assessments of different algorithms are given based on the repeat-pass PolInSAR data of Gabon Lope Park acquired in AfriSAR campaign of German Aerospace Center (DLR). The experimental results show that the proposed method presents a signiﬁcant improvement of vegetation height estimation accuracy with a root mean square error (RMSE) of 6.23 m and a bias of 1.28 m against LiDAR heights, compared to the results of the three-stage method (RMSE: 8.69 m, bias: 4.81 m) and the previous four-stage method (RMSE: 7.72 m, bias: − 2.87 m). inversion experiments were conducted using PolInSAR image of Lope National Park, where the sparse savannas and dense forests are the dominant vegetations. The inversion results were illustrated by qualitative analysis and quantitative comparison. Through a series of experiments, this paper proved the rationality of the models and the feasibility of the inversion process. The comparison between the new model and the traditional models also showed the superiority of the GRVoG-vtd model.


Introduction
As the importance of forest in earth system attracts increasing attention, the inversion of forest parameters is gradually becoming a research hotspot. In particular, polarimetric interferometric synthetic aperture radar (PolInSAR) images are widely used in forest height inversion because of its global coverage and low weather sensitivity as well as its ability to extract scattering information and height information simultaneously [1,2]. In the ideal case, forest height can be reflected by the phase difference between the volumeonly and ground-dominant polarization states [3]. To extract the phase centers, optimal polarimetric coherence algorithm [3], estimating signal parameter via rotational invariance techniques (ESPRIT) algorithm [4], and model-based target scattering decomposition [5,6] have been developed to effectively extract the phase centers. However, to some extent, the forest vertical structure and wave extinction can affect the estimation of phase centers. Based on the homogeneous hypothesis, Treuhaft et al. proposed a two-layer physical model named the random volume over ground (RVoG) model [7,8]. The model contained improve accuracy and solve the problem of ambiguity. Section 2 introduces the new model, in which the scatterer distribution and scatterer motion are modeled, respectively. The novel four-stage algorithm with the additive temporal decorrelation estimation stage and the modified height estimation stage is also presented. Section 3 assesses models and algorithms by estimating forest height from the real single-baseline repeat-pass PolInSAR data of Lope National Forest Park in Gabon, Africa, and comparing the results with corresponding LiDAR data. The experiment results indicate the new method has a great potential to compensate for the temporal decorrelation and obtain more accurate inversion results. The discussion and conclusion are given in Sections 4 and 5, respectively. With the following three hypotheses: the hypothesis of random volume scattering, the hypothesis of numerous scatterers and the hypothesis of homogeneous distribution of scatterers, Treuhaft et al. proposed the random volume (RV) model defined as follows [7]:

Model-Based Inversion
where k z means the vertical wavenumber, which is depended on the wavelength λ, incidence angle θ and incidence angle difference ∆θ. τ c is range-facing terrain slope angle, which can be useful in complex terrain case. h v and κ e are unknown parameters indicating vegetation height and wave extinction, respectively. The vertical wavenumber k z can be calculated as follows in the bistatic repeat-pass case [27]: Based on the RV model, Treuhaft et al. modeled the ground scattering and the interaction between ground and trunk in the forest area, and further obtained the RVoG model, which is widely used in the field of forest height inversion. Its relationship with the RV model can be expressed as [8]: where φ 0 denotes the interferometric ground phase and different ω represents different polarization modes. µ(ω) is the ground-to-volume backscattering ratio, which lies in the range 0 ≤ µ(ω) ≤ ∞ in the ideal case, with limits representing pure surface scattering (µ(ω) = ∞) and pure volume scattering (µ(ω) = 0). The increase of unknown parameters in the RVoG model makes the nonlinear inversion process more complicated.

Three-Stage Inversion Process
The three-stage inversion method proposed by Cloude et al. [14] is based on the RVoG model and greatly reduces the complexity of the inversion procedure. Therefore, it has been commonly used and has achieved great effect in many cases. The characteristic that the coherence under different polarization states are distributed on a straight line in the complex unit circle (CUC) has been used effectively to obtain the ground phase φ 0 and volume coherence γ v . The three-stage inversion process can be conducted as follows: • Least squares line fit. Since Equation (5) indicates that coherence values in different polarization states lie along a straight line in CUC, the first stage is to find the best-fit line of interferometric coherence values in different polarization modes, such as HH, VV, HH-VV, HH+VV, and HV. • Ground phase removal. In the second stage, ground phase must be determined and removed from the coherence. The phases of two intersection points of the straight line and the CUC are the candidates of ground phase. Generally, the relative location of coherence values in different polarization states along the best-fit line arranges according to Figure 1, which becomes one criterion for distinguishing the real ground phase. • Height and extinction estimation. The pre-calculate look up table (LUT) of volumeonly coherence is employed to estimate vegetation height and mean extinction in last stage. The parameters are determined by minimizing the distance between the calculated volume coherences and the observed volume coherence.

RVoG-vtd Model
The temporal decorrelation is a major source of decorrelation especially under the repeat-pass case. When the RVoG model is adapted in repeat-pass case, dealing with temporal decorrelation becomes an essential step. The RVoG-vtd model proposes the idea of compensating for the effect of temporal decorrelation by adding a real-valued factor. Then, the model becomes: where α vt is the temporal decorrelation factor applied to the volume coherence. The rest of the RVoG model equations are unaffected.

Four-Stage Inversion Algorithm
Due to the addition of parameter, the solution is now ambiguous for single-baseline data. In contrast to the methods such as supervised training and solving the global nonlinear least squares problem mentioned above, a four-stage inversion algorithm is adopted to solve the ambiguous problem and keep the calculation complexity at a low level simultaneously. The volume scattering phase center moves to the top of the canopy according to the higher mean extinction value, which indicates the relative position of the observed volumetric coherence on the coherence line can be employed to limit the range of mean extinction coefficient [15]. In this framework, an index was suggested to interpret the relative location of the observed volume coherence, γ HV , on the coherence line as [15]: wherein D.I is the distance ratio index of the ambiguous line length (A.L.) and the visible line length (V.L.), as shown in Figure 2. With the expectation that the mean extinction value and D.I. value is inversely related, Managhebi et al. defined the mean extinction coefficient as the following linear function [28]: where κ e is the mean extinction coefficient, D.I is the distance ratio index, a and b are the model parameters computed by least squares method using real L-band PolInSAR data pair. To summarize, the four-stage inversion process consists of the following four steps: • Least squares line fit. • Ground phase removal. • Extinction estimation. • Volume height and temporal decorrelation estimation.
The first two steps are similar to the three-stage inversion process. In the third step, D.I and the mean extinction coefficient are calculated using Equations (7) and (8). Then the fixed mean extinction coherence locus is determined to estimate the volume height and the real temporal decorrelation multiplying factor from the intersection point between the volume coherence loci and φ = φ v .

GRVoG-vtd Model
Since more parameters make it hard to achieve a unique and precise solution, a direct idea is to set wave extinction to a reasonable value for reducing variable. However, in the RVoG model, it is assumed that the scatterer density of the forest from the surface to the canopy is a constant, which can simplify the derivation process and the model function. Therefore, the wave extinction values in the model are not necessarily indicative of the actual extinction. In practice, the extinction parameter in the RVoG model is sensitive to changes in the vertical structure [27], which means before fixing wave extinction, the forest structure should be modeled carefully. Presently, the LiDAR data can provide a profile of scatterer vertical distribution of forest [18], which varies from species to species. Therefore, in this research, the generalized RVoG (GRVoG) model with general scatterer vertical distribution is first introduced.
Start by the distribution of volume scatterer density ρ(z), the volume coherence in GRVoG model can be derived in Appendix A as [7]: )dz (9) ρ(z) can reflect the vertical structure of the vegetation. The model degenerates to RVoG when ρ(z) is set to a constant value. In practice, ρ(z) could vary with the species of trees, crown depth and other physical conditions. The GRVoG-vtd model compensates for temporal decorrelation with a real-value factor similarly to the RVoG-vtd model as follows: To solve the ambiguous problem caused by adding parameter α G vt , this paper also explores the relationship between the temporal decorrelation and forest height. The temporal decorrelation follows from physical changes and is closely related to the motion of scatterers in a general view. Assuming Gaussian-statistic random motion, like [25], the temporal decorrelation can be denoted in following form: where σ r (z) means the standard deviation of Gaussian distribution in the layer of height z.
To derive an explicit expression for α G vt (h v ), ρ(z) in this model is set to be a constant value ρ 0 like RVoG model and a first-order approximation of the motion variance is considered [26]: with σ 2 g represents the motion variance of scatterers on the ground. Since σ 2 g is independent of height, it can be moved out of integration and estimated as a constant α g . The GRVoG-vtd model can be derived as: with β = − 1 2 ( 4π λ ) 2 β 0 is a constant corresponding to the assumption of σ 2 r (z) which can be empirically determined. It can be seen that GRVoG-vtd degenerates to RVoG-vtd when β 0 = 0 and ρ(z) is a constant function.

A Novel Four-Stage Inversion Algorithm
Based on the GRVoG-vtd model, this paper proposed a novel four-stage inversion algorithm. The unknown parameters in GRVoG-vtd are the vegetation height h v , the mean extinction coefficient κ e , and the constant real-value decorrelation factor α g . According to the form of the GRVoG-vtd model, one can see that the coherences in different polarization states are still distributed on a straight line in the CUC with an intersection point e jφ 0 . Therefore, the first two stages of the method are still the least squares line fit and removal of the ground phase similarly to the three-stage inversion process. The volume coherence obtained after the first two steps actually includes the temporal decorrelation factor and can be expressed as Since the sparse savannas and the dense forests in vegetation area show a clear difference in volume coherence, the third stage uses EM algorithm to classify these species and estimate the constant factor α g , simultaneously.
Assuming the distribution of volume coherence amplitude (ρ) is a Gaussian mixture model (GMM) with K components, the probability of the coherence amplitude is: where µ k , σ 2 k represent the mean value and the variance of the k-th Gaussian distribution, respectively. π k is the proportion of the k-th component. Let l nk denote the probability that the n-th sample ρ n belongs to the k-th components. The EM algorithm adopts alternate iterations of the Expectation step and the Maximization step until convergence to divide the samples into corresponding component without any prior information. E-step: M-step: where Since the height of sparse savannas mainly distributes under 3 m, the volume decorrelation is inconspicuous compared to temporal decorrelation. Accordingly, the amplitude of coherence in savanna region reflects the real-value factor α g . Let the k-th component represents sparse savannas, thenα The fourth stage is to estimate the vegetation height and extinction coefficient. The LUT of vegetation height, extinction coefficient, and volume coherence can be pre-calculated similarly to three-stage inversion algorithm. The main difference is the generalized distance is used to replace the Euclidean distance in the original algorithm. The shortest Euclidean distance criterion in the original algorithm does not consider the importance and reliability of coherence amplitude and phase, which would significantly reduce the precision of inversion since the distribution of volume coherence is inhomogeneity in practice. This paper proposes a generalized distance to measure the similarity of amplitude and phase respectively and obtain a more reasonable inversion result.
where ρ i , φ i are the amplitude and the phase of the i-th coherence. For sparse savanna region obtained in the third stage, the coherence amplitude is large, and the distribution is concentrated, which indicates the height inversion result mainly depends on the phase of the volume coherence. Therefore, in the generalized distance the parameter λ is relatively small for this type. On the contrary, the coherence amplitude is small for dense forest and the uncertainty of phase can cause more error. In this regard, the coherence amplitude should occupy a larger proportion in the generalized distance measurement. In summary, the following step-by-step outline describes the whole process.
1. Generate the coherence in different polarization states and fit the least square line in the CUC. 2. Choose the ground underlying phase from the two intersection points between the best fitted line and CUC. Calculate the volume coherence by removing the ground phase and projecting the farthest coherence from the ground coherence point to the fitted line.
3. Classify sparse savannas and dense forest by the amplitude of the volume coherence using EM algorithm. Determine the constant parameter α g by calculating the mean value of the amplitude in sparse savanna region. 4. Estimate the vegetation height and mean extinction based on the pre-calculate LUT of GRVoG-vtd model by minimizing the generalized distance between calculated volume coherences and the observed volume coherence. Figure 3 shows the flowchart of proposed four-stage algorithm. The pre-processing and the first two stages are carried similarly to the three-stage algorithm. The main difference is the addition of the third stage including vegetation classification and constant factor estimation as well as the transformation of the distance form from the Euclidean distance to the generalized distance. The extinction coefficient is determined by the length of the segment line in the previous four-stage algorithm for RVoG-vtd model. Then the coherence curve is selected by fixed extinction and the vegetation height is estimated based on the phase of the coherence. In the novel four-stage algorithm for GRVoG, the order to estimate parameters has changed. The temporal decorrelation factor is estimated first and the coherence amplitude and phase are both used to get more precise vegetation height. In the generalized distance, the ratio λ is set to assign weights of amplitude and phase to reflect the reliability and importance of the two factors for the purpose of adapting to different situations.

Analysis of Models and Corresponding Algorithms
The differences of different models, i.e., RVoG, RVoG-vtd, and GRVoG-vtd are visually illustrated in Figure 4. For mean extinction coefficient κ e varying from 0 to 0.9, the LUTs of RVoG, RVoG-vtd with a fixed α vt and, GRVoG-vtd with a fixed α g are scattered on CUC, respectively. Different colors represent different mean extinction values, and the curve becomes more concentrated as the decrease of the mean extinction. Under the ideal case, the coherence locates at the point (1, 0) when the vegetation height is 0. Consequently, the curve of RVoG begins from the point (1, 0) and gradually shrink into the center of CUC for the influence of temporal decorrelation factors is ignored. Correspondingly, the curves of RVoG-vtd and GRVoG-vtd start from the points (α vt , 0) and (α g , 0) respectively. The RVoG-vtd model provides a scale factor for the whole table, which has not changed the shape of the curve. In contrast, the addition of the temporal decorrelation factor in the GRVoG-vtd model makes the curve converge rapidly and the correspondence between coherence and vegetation height has changed significantly.
As Figure 4 demonstrates, the LUTs of the three models appear as high curvature curves in the CUC, which cause great difficulties in subsequent estimation. However, the curve has a potential to be manifold embedded. Taking the LUT of GRVoG-vtd as an example, Figure 5 explores whether the amplitude-phase plane can better represent the LUT. Compared to the CUC plane, the curve in the amplitude-phase plane shows more linear characteristics and can be easier distinguished, which suggests that the distance in the amplitude-phase plane is more efficient to measure the similarity and estimate vegetation heights. To further explain this statement and seek for a more reasonable criterion, several basic approaches to estimate vegetation height are conducted and analyzed in Figures 6 and 7.    Figure 6 shows the relationship between the regions in the CUC plane and the vegetation heights based on the Euclidean distance, the amplitude distance and the phase distance respectively defined as follows: In practice, using these three distances is not enough to truly reflect the similarity between observed coherence and model-based coherence for effective estimation of vegetation height. Since the height discrimination is clearer in the amplitude-phase plane as discussed above, we consider solving the estimation problem in this new plane. The corresponding regions in the amplitude-phase plane for these three distances are also given in Figure 7a-c diagrams, respectively. It can be seen that the least distance criterion in CUC lead to an unexpected classification region which has poor interpretability referring to the ideal curve and the amplitude of coherence plays an insignificant role under most conditions. In this regard, the generalized distance is proposed to measure the similarity more equitably for the amplitude and the phase. The typical corresponding regions are shown in the diagram of Figure 7d which has stronger consistency with the calculated curve. Nevertheless , the amplitude and phase are affected by varying degrees of noise which should be considered carefully to reach higher accuracy. Consequently, the parameter λ is set according to the situations.

Study Area
The study area is the Gabon Lope Park region which locates on the west coast of Africa with the loci of 0 • 30 00 S 11 • 30 00 E and an area of 4910 km 2 as Figure 8 shows. It covers diverse habitats for its over 70 years history of being a wildlife reserve and therefore is an ideal test site to estimate forest biomass and height. Here, the focus is on a small area of interest in the northeastern Lope with the central longitude 11 • 34 00 E and latitude 0 • 13 30 S.

Data Set
The experimental data consists of the airborne single-baseline repeat-pass PolInSAR data and corresponding LiDAR data in Lope National Park region.
The PolInSAR data was obtained by the 2-nd pass and the 4-th pass of the 11-th flight in February2016 in the AfriSAR campaign by FSAR of the German Aerospace Center (DLR). The frequency band is L-band, and the central frequency is 1.3 GHz. The range and azimuth resolution of this SAR data are 1.92 m and 0.65 m, respectively. The campaign introduction and data can be found in the European Space Agency (ESA) website (https: //earth.esa.int/eogateway/campaigns/afrisar-2016).
The LiDAR data used in this study was collected by Land Vegetation and Ice Sensor (LVIS). As in [10,18,22], the LVIS RH100 metrics were chosen to validate the PolInSAR forest height estimations. The LiDAR-based vegetation height has been compared to TanDEM-X DEM in AfriSAR Final Report [32] which found that the top of the canopy computed from the LiDAR data is extremely close to the TanDEM-X data as expected. The LiDAR-based height also has a strong consistency with the ground data in different plots which proved its strong reliability in the previous work [22]. Therefore, in this work, we decided to compare height estimations of PolInSAR via different algorithms with LVIS RH100 metrics to verify their performance.
Although the height precision is high for LiDAR measurements, the horizontal resolution of LVIS is much lower, which is about 25 m, indicating that the LVIS RH100 metrics need to be interpolated to fit PolInSAR geometry. Consequently, the PolInSAR image show more speckle characteristics and there will be a large deviation when using a single pixel to do height estimation. Therefore, this paper uses 50 × 50 pixels as a block in the subsequent quantitative analysis, which is more reasonable considering LiDAR resolution.
In this data set, the overlapped region of PolInSAR and LiDAR covers a size of about 7000 × 1000 pixels in the PolInSAR image. However, there are some small pieces with missing values. We intercept an interested area with the size of 3000 × 1000 pixels from the whole image which contains few missing values and large fluctuations. The Pauli image and LVIS RH100 height image of the interested area are showed in Figure 9, which consists of sparse savannas and dense forests of varying heights up to 60 m.

Experimental Results
The observed coherences attained in the first two stages can be clearly divided into two vegetation species for the coherence amplitude shows a distinct bimodal distribution as demonstrated in Figure 10. The two-component GMM is reasonably assumed and EM algorithm is adopted to adaptively distinguish these categories without any prior information.
The calculated coherences represented by the curve in Figure 5 illustrates the LUT in the amplitude-phase plane is closer to straight lines. To verify this statement, the observed coherences sampled at the rate of 0.02% with varying heights are scattered in the CUC plane and the amplitude-phase plane respectively as Figure 11 shows. As expected, the distribution of coherence in the amplitude-phase plane is obviously more linear, which brings great convenience to the subsequent height discrimination. However, we can also see that the coherence is greatly affected by the noise, which indicates the block of 50 × 50 pixels is necessary.  The visualized height estimation results of these three models are all shown in Figure 12. The inversion results show the characteristics of each algorithm in a macroscopic way. The RVoG model does not compensate for the temporal decorrelation, which results in the high calculated coherence amplitude. Nevertheless, due to the influence of temporal decorrelation, the observed coherence amplitude is low, which makes the algorithm overestimate in most areas. After the estimation of the extinction coefficient, the height is only relative to the phase in RVoG-vtd model. The lack of the information of coherence amplitude leads to decreased accuracy in areas with high vegetation. The GRVoG-vtd model further models the temporal decorrelation and distinguishes the vegetation types. The temporal decorrelation factor is estimated by regions with low volume decorrelation adaptively and the coherence amplitude and phase are comprehensively used. Therefore, the estimated height has stronger consistency with the corresponding LVIS RH100 height. Still, there are some areas with large estimation errors because it is difficult to accurately estimate the vegetation with large range of height distribution only by using single-baseline data. The accuracy of the three models all depends on k z to some extent. For the area with large k z , the maximum value of inversion height is limited due to the wrapped phase, which cannot accurately reflect the tree height. From the view of detail, since the LiDAR image has been interpolated, it is relatively smooth and uniform, the PolInSAR-based results show more speckle characteristics because of the susceptibility to noise and the random fluctuations of the forest. To obtain intuitive results, we selected three vertical lines distributed in the left, middle and right parts of the image respectively as Figure 13 shows. Each line contains 60 consecutive blocks with significant fluctuations. It can be seen that RVoG overestimates tree height in most areas, especially in low vegetation height area. In contrast, the RVoGvtd model generally underestimates vegetation height because of the lack of amplitude information. The GRVoG-vtd model maintains strong consistency with the LiDAR-based height in a large range of variation and has a stronger adaptability. A more detailed quantitative analysis is based on dividing the whole map into several small pieces. The whole image is divided into 1200 pieces and each piece has a size of 50 × 50 pixels. The height estimation results of RVoG, RVoG-vtd and GRVoG-vtd are assessed versus LiDAR height, respectively. As shown in Figure 14, each point represents one piece of the image. The scatter plot also illustrates that the RVoG model overestimates most of the block regions. The GRVoG-vtd model and RVoG-vtd model both have a great potential to effectively compensate for the effect of temporal decorrelation and have a more reasonable output. Compared to RVoG-vtd, GRVoG-vtd is more elaborate and can take full advantage of the amplitude and phase of volume coherence, hence the heights estimated are more reliable. Since k z has a significant influence on the output for single-baseline data [33], the phase will be wrapped when tree height is too high, hence the errors of inversion results will be larger for vegetations with height over 30 m. The quantitative comparison of the employed methods with respect to LiDAR measurements are listed in Table 1, which include bias, RMSE and R 2 . As expected, the RVoG model has an overestimation of 4.81 m averagely and its RMSE is the highest. Both RVoGvtd and GRVoG-vtd have a significant improvement compared to RVoG. From the three indicators, we can also see the superiority of the GRVoG-vtd model in this experiment.

Discussion
This section further analyzes the inversion errors of different algorithms, and discusses their performances on different vegetation types.

Analysis of Inversion Error
As can be seen from Table 1, the proposed four-stage inversion method for GRVoGvtd model has stronger consistency with LiDAR-based data. To more intuitively reflect the errors of each algorithm, the error distribution functions are given in Figure 15. The expectation values of the error distribution functions are the biases in Table 1. Combined with the scatter diagram of the inversion results in Figure 14, it can be indicated from the error distribution curves that most of the estimation errors of RVoG are distributed on the positive axis for the influence of temporal decorrelation. Due to the lack of effective application for coherence amplitude, more errors of RVoG-vtd are distributed in the negative axis. The overall error distribution of the GRVoG-vtd model proposed in this paper is more concentrated. The standard deviation of the distribution is σ = 2.67 m, which is slightly lower than that of the first two methods, which are 2.69 m and 2.68 m, respectively. The expectation value µ = 1.28 m, which is also closer to 0 simultaneously. The errors distributed on positive and negative semiaxes are relatively balanced, and most of the errors are distributed in µ ± 3 σ range, which is 1.28 m ± 7.9 m. Therefore, a better vegetation estimation result can be obtained.  Table 2 divides the vegetation height into three intervals, which represent the sparse savanna (0-15 m), low forest (15-25 m), and high forest (25-60 m) respectively. It can be seen from the table that the RVoG-vtd model performs best in the region with vegetation less than 15 m, followed by the GRVoG-vtd model. The RVoG model performs better in the region of 15-25 m vegetation and GRVoG-vtd is slightly lower than RVoG. In dense forest region, the GRVoG-vtd model shows a better performance due to the adaptive classification of vegetation types and estimation of temporal decorrelation. The proposed method shows high stability in the overall three intervals, and the error distribution is relatively concentrated, which also reflects the conclusion of Figure 15. It can be indicated from Table 2 that for low vegetation, the coherence phase is more effective, while for dense forest, the coherence amplitude should be emphasized to obtain more reasonable results, which shows the rationality of the proposed four-stage inversion algorithm in this paper.

Discussions of Inversion Results
As Figure 13 demonstrated, in the areas with vegetation less than 25 m, the RVoGvtd and GRVoG-vtd models are more consistent with LiDAR-based vegetation height, while RVoG model has obvious overestimation. However, in the regions with over 25 m vegetation, RVoG and GRVoG-vtd models perform better, and RVoG-vtd underestimates significantly. The RVoG model has a best effect when the vegetation height is about 25 m, but its weakness is the resolution of height. Generally, the inversion results of RVoG model are concentrated around 25 m for high forest. The GRVoG-vtd model has higher inversion accuracy and higher resolution, which can reflect the height fluctuation obviously. The reason is the coherence points are very close in the CUC plane when the amplitude is small, but in the amplitude-phase plane, the distances between the coherence points are more affected by coherence phase which can still be distinguishable, promising a higher resolution. However, the inversion will be more easily affected by noise correspondingly, so the inversion variance will be slightly larger. Although the RMSE of GRVoG-vtd is the smallest on the whole, it will also be slightly worse locally. In practice, when the coherence amplitude is small, the influence of coherence phase can be regarded as floating in a relatively small range, which results in the small difference between RMSEs of GRVoG-vtd and RVoG in dense forest regions as Table 2 presented.
From Figure 14 we can arrival at a similar conclusion. The estimations of RVoG-vtd and GRVoG-vtd are relatively accurate when the vegetation height is less than 25 m. In addition, for the regions with vegetation height greater than 25 m, the height estimation errors of all the three methods increase in different degrees. The RVoG model estimations gradually converge to a certain height, while RVoG-vtd and GRVoG-vtd models show relatively divergent estimations. The average vertical wavenumber k z is about 0.12 for the PolInSAR data. When the vegetation height is greater than π/k z , i.e., 27 m in this case, the determination of ground phase would depend entirely on the distribution of coherences in different polarization states on the fitted line. The errors of ground phase extraction in the first two stages will be relatively large. Consequence, the inversion errors of these algorithms based on single-baseline PolInSAR data will be larger in the corresponding region. At the same time, k z also affects the resolution of vegetation height inversion. It can be seen from Figure 5 that with the increase of vegetation height, the corresponding coherence distribution becomes more concentrated. Therefore, when the vegetation height is too large, the small deviation of the complex coherences will cause a large error in the estimation of the vegetation height, which can be called the ill problem. In this regard, multi-baseline fusion is recommended to further solve the problem that different values of k z have different height precision for varying height intervals.

Conclusions
The RVoG model does not compensate for temporal decorrelation, hence the observed coherence amplitude will be substantially less than the theoretical calculation result. As a result, the vegetation height will be overestimated in most areas. When the three-stage inversion method is directly applied to the RVoG-vtd model, the phenomenon that one coherence point corresponds to multiple heights will appear which is also called ambiguous problem. The previous four-stage inversion algorithm was proposed to solve this problem while maintaining a low computational complexity. The height will be underestimated in some areas because only the coherence phase is used for estimation after the extinction coefficient is fixed. In the conventional three-stage process, the Euclidean distance in the CUC plane is used as the similarity measurement. However, in the CUC plane, the theoretical distribution curve of coherence is a high curvature manifold, so it is misleading to use Euclidean distance for the measurements of the similarity. Although the Euclidean distance between the two points is very close, they may be far away from each other on the manifold, and the corresponding vegetation height is also very different. Therefore, the Euclidean distance is not enough to truly reflect the similarity between the observed coherence and the theoretical coherence.
In view of the above problems, this paper extended the GRVoG-vtd model and the corresponding novel four-stage inversion algorithm. The GRVoG-vtd model focused on the random motion of scatterers to compensate for temporal decorrelation and took the random scatterer density distribution into account. The novel four-stage method extended the conventional three-stage method by an additional stage for vegetation species classification and real-value factor estimation. The vegetation height estimation stage was also modified by converting the Euclidean distance to the generalized distance, which provided a new idea to estimate heights in the amplitude-phase plane instead of the CUC plane. The height inversion experiments were conducted using PolInSAR image of Lope National Park, where the sparse savannas and dense forests are the dominant vegetations. The inversion results were illustrated by qualitative analysis and quantitative comparison. Through a series of experiments, this paper proved the rationality of the models and the feasibility of the inversion process. The comparison between the new model and the traditional models also showed the superiority of the GRVoG-vtd model.  Assuming the distribution of volume scatterer density is ρ(h), the solution of the equation can be obtained by the stationary phase method [35]: where h v is the height of the forest in the area, d j is the depth of the j-th scatterer relative to the height of the tree, and θ j is the angle between the line connecting the transmitting radar R 1 and the scatterer position R j and the vertical direction, i.e., the depression angle. According to the same method, we can get: After obtaining the single-frequency echo from single scatterer, it is necessary to integrate the response of the single scatterer over the entire bandwidth to obtain the overall echo signal of the scatterer. Furthermore, we add up the echoes of all the scatterers to get the expression: with κ e is the vegetation extinction coefficient defined as: Finally, the complex coherence is normalized, i.e.: