Multibaseline Interferometric Phase Denoising Based on Kurtosis in the NSST Domain

Interferometric phase filtering is a crucial step in multibaseline interferometric synthetic aperture radar (InSAR). Current multibaseline interferometric phase filtering methods mostly follow methods of single-baseline InSAR and do not bring its data superiority into full play. The joint filtering of multibaseline InSAR based on statistics is proposed in this paper. We study and analyze the fourth-order statistical quantity of interferometric phase: kurtosis. An empirical assumption that the kurtosis of interferograms with different baselines keeps constant is proposed and is named as the baseline-invariant property of kurtosis in this paper. Some numerical experiments and rational analyses confirm its validity and universality. The noise level estimation of nature images is extended to multibaseline InSAR by dint of the baseline-invariant property of kurtosis. A filtering method based on the non-subsampled shearlet transform (NSST) and Wiener filter with estimated noise variance is proposed then. Firstly, multi-scaled and multi-directional coefficients of interferograms are obtained by NSST. Secondly, the noise variance is represented as the solution of a constrained non-convex optimization problem. A pre-thresholded Wiener filtering with estimated noise variance is employed for shrinking or zeroing NSST coefficients. Finally, the inverse NSST is utilized to obtain the filtered interferograms. Experiments on simulated and real data show that the proposed method has excellent comprehensive performance and is superior to conventional single-baseline filtering methods.


Introduction
Interferometric synthetic aperture radar is an important extension of synthetic aperture radar (SAR), which is extensively adopted to topography surveying [1], surface deformation monitoring [2] and so forth. Multi-baseline interferometry can comprehensively utilize the diversity of interferograms with different baselines in the same scene to effectively extract the height information of difficult topography, particularly under the circumstance in which the interferometric phase does not satisfy phase continuity assumption [3]. The interferometric phase filtering is a critical step in multibaseline interferometric SAR (InSAR). The interferometric phase is contaminated by massively coherent noise brought from thermal noise decoherence, baseline decoherence, time decoherence and many other decoherent factors in practice [4]. The noise directly affects the difficulty of subsequent phase estimation and the accuracy of the final digital elevation model (DEM). The main motivation of the interferometric phase filtering is to eliminate noises as much as possible while preserving most of the detail information. estimator has higher operation efficiency due to skipping the clustering operation in Reference [23]. Considering the noise variance is space-variant, block estimation is applied. With the help of estimated noise variance, the wiener filter eliminates noise more accurately. Last but not least, the result of experiments on simulated data and real data confirms the efficiency and excellent performance of the proposed method.

Signal Model
In the case of single-look, the probability density function of interferometric phase can be represented as (1). The interferometric phase satisfies additive noise model in spatial domain, which is deduced in Reference [13]. It can be expressed as (2).
where x is the ideal phase deduced by the natural topography. y is the observed phase disturbed by the zero mean noise n, which is assumed to be independent of x. The phase jump ranged from −π to π, which is induced by the interferometric phase wrapping, derived a high frequency similarity to noise in frequency domain. Therefore, not surprisingly, it is apt to greatly be confused with noises. It is desirable that we convert the image to the complex domain to get the continuous complex phase and filter the real part and the imaginary part respectively. The signal model in the complex domain can be induced as exp(jy) = cos(y) + jsin(y).

The Nonsubsampled Shearlet Transform
Wavelet is prone to deal with 1-D signals existing pointwise singularities. Nevertheless, it is weak to handle multidimensional data dominated by distributed discontinuities, such as edges and fringes. In an effort to solve this problem, the wavelet basis with much higher directional sensitivity and more flexible shapes is encouraged for effectively capturing the singularity features of multidimensional data, involving composite wavelets [24], contourlets [25], and so forth. The shearlet transform is an important part of composite wavelet theory, which merges classical geometry and multiscale analysis [26][27][28][29]. The shearlet provides nearly optimal nonlinear approximation performance and produces an optimal sparse representation of images with distributed discontinuities. Thanks to its time-frequency local feature and directional sensitivity, the shearlet transform can be applied in image processing, for example, image denoising, image fusion, texture feature extraction, and so forth. In the context of composite wavelet, the discrete shearlet is defined as where ψ ∈ L 2 (R 2 ). S is the anisotropic dilation matrix related to scale transformation. j denotes the scale parameter in particular, which dominates the refinement of frequency and the redundancy of basis elements. G is the shear matrix related to geometrical transformation. l denotes the shear parameter which restricts the orientation of each shearlet element. Moreover, k indicates the shift parameter to locate distributed discontinuities in spatial domain. Calculating the Fourier transform to elements ψ j,l,k (x), we getψ It has frequency support as (9). The frequency division produced by the shearlet transform is illustrated in Figure 1.
The partition of frequency domain; (b) Frequency structure of the shearletψ j,l,k (w 1 , w 2 ), The asymptotic approximation error of the shearlet transform is N −2 (logN) 3 when N → ∞ [29]. So it precisely depicts the interferometric fringe. Besides, the shearlet forms Parseval frames in frequency domain. Its elements are trapezoidal pairs whose area is 2 j × 2 2j and oriention is along the zero-crossing line with slope of −l2 −j [29]. The corresponding orientation in spatial domain is along the line with slope of l2 −j . the shearlet elements can be discriminated by scales, locations and orientations. In addition, it apace decays in spatial domain. The aforementioned content indicates the highly directional sensitivity of shearlet, which makes a huge difference in the interferogram filtering.
In practice, the shearlet is shift-variant. The shearlet transform adopts the shift operation of the window function to realize the directional filtering. It involves a subsampling operation, which causes spectral aliasing in frequency domain. Thereby the Gibbs distortion occurs in the reconstructed image. To solve this problem, Easley and Labate proposed the nonsubsampled shearlet transform (NSST) which is enlightened by the great performance of the nonsubsampled contourlet transform. The NSST replaces the subsampled operation with convolution in the directional filtering. It is shift-invariant and efficiently eliminates the pseudo-Gibbs phenomenon in reconstructed images. Hereby the reconstructed image is more effective and intuitive. The decomposition procedure primarily contains two steps as shown in Figure 2.
Step 1: Multiscale Decomposition The image is decomposed into a high-frequency component and a low-frequency component by means of non-subsampled pyramid (NSP). Then iteratively execute this step till image is decomposed into the j scales.
Step 2: direction localization The core of direction localization is non-subsampled shearing filter banks (NSSFB), which impose the 2-D convolution of the shearing filter and the high-frequency component on the cartesian domain. The convolution averts subsampled operation, thereby the NSST is shift-invariant.

Pre-Thresholded Wiener Filter
On account of the shift-invariant property, the NSST displays great performance in image denoising, particularly for the texture image. It is also desirable that the NSST filter exploits the coefficient shrinkage method which is consistent with the wavelet filter. Shearlet gives a sparse expression to images. That is to say, the intrinsical information of image is concentrated on few coefficients spreading over each scale with a considerable large amplitude. By contrast, shearlet coefficients generated by noise widely distribute in shearlet domain and its amplitude is small. Owing to this feature, a more accurate pre-thresholded Wiener filtering method with known noise variance is employed to remove the noise component. It consists of two steps:the pre-thresholded operation and Wiener filter. The pre-thresholded operation ensures the smaller local expected square error (LESE) of linear approximation, which is represented as Then the Wiener filter obtains the best linear estimation of clean images. It is represented aŝ

Noise Level Estimation Based on Kurtosis
The noise variance is the crux of the Wiener filter. The robust noise level estimator [30] designed by Donoho et al. regards scales median of absolute coefficients as noise variance and is commonly used in many papers. It is straightforward and expedient but tends to over-filter in interferograms with high signal-noise ratio (SNR). A more precise noise variance estimator is extremely urgent to improve the filtering performance of the NSST filter. The most primary innovation of this paper is to introduce the kurtosis-based noise level estimator in Reference [23] to multibaseline InSAR. The kurtosis and the noise variance have a certain relationship in the additive Gaussian white noise model. There exist two unknown variables in the kurtosis model, the number of unknown variables is larger than the number of equations. The result of the minimization method, such as l1-minimization [31], l2-minimization [32] and so forth, exists great errors. To solve the problem, Dong et al combine the kurtosis model with a constraint, in which the kurtosis of images with different structures or statistical behaviors should be unequal, to improve the estimation accuracy of the noise variance. The K-means clustering process is applied to classify the whole image into non-overlapping image patches with different structures.
In this section, the kurtosis of the interferometric phase is introduced. And a special property of the kurtosis is proposed in multibaseline InSAR and is named as the baseline-invariant property. Along with the idea in Reference [23], the baseline-invariant property is regarded as a constraint to ensure the accurate estimation of the noise level. The modified method omits the clustering process and eliminates errors introduced by the fault of the cluster. Efficiency and performance get promoted. Next, the noise level estimator is introduced in two parts. The first one introduces the kurtosis of the interferometric phase and two important properties of the kurtosis. In the other part, the noise level estimation is introduced in detail.

Kurtosis
The image is decomposed into various coefficient components at different scales and directions by NSST. The research on the distribution of NSST coefficient components is conducive to the further analysis of images and is a significant topic in image processing. Among them, the research on its statistics is of great potential. The low-order statistic is weak, even invalid in interferograms which involve a large amount of textures and detail informations. Consequently, the scholar begins with the study of its higher-order statistic, such as kurtosis and skewness. In this section, we study and analyse the kurtosis of interferograms and NSST coefficient components. The kurtosis of a random variable Y is defined as where C k (•) is the kth cumulant function. The kurtosis reveals the concentration level of the probability density function. Intuitively, the kurtosis reflects the sharpness of the probability density distribution, wherein the kurtosis of the Gaussian distribution is 0. Based on the single-look probability density function of the interferometric phase, the kurtosis is calculated numerically as a function of the coherence, as shown in Figure 3. It indicates that the kurtosis is proportional to the coherence. Obviously, the clean interferogram emerges higher kurtosis when compared with interferograms disturbed by coherent noise with variable degrees. When the noise is strong enough to destroy the fringe structure of interferograms, the kurtosis is smaller due to the influence of the noise. In addition, the kurtosis tends to a negative number when the coherence is close to zero owing to the impact of non-Gaussian noises. In contrast, when the coherence is high, the fringe structure of interferograms plays a primary role. So the kurtosis increases with the improvement of coherence. It should be noted that NSST coefficient components of interferograms are sparse and its kurtosis is greater than zero. The proposed method takes advantage of two vital properties of the kurtosis of interferograms. One is the scale-invariant property, which works well on all natural images, that is, the kurtosis of coefficient components in the Linear Transform Domain should be held constant at different scales. It is verified and revised by some work in References [31][32][33]. A modified description for the scale-invariant kurtosis assumption is that the stability is effective in clean images throughout all scales and the variation is the specific impact of noise [32]. The scale-invariant property in the nonsubsampled shearlet transform can be formalized as (16) Another property, which is particular to interferograms and can be yielded from an empirical summary, is proposed in this paper. The kurtosis of images with similar structure or statistical behavior is assumed to be a constant [23]. Along this line, we suggest that the kurtosis of interferograms with different baseline keeps constant and denote it as the baseline-invariant property. Then an interpretation from two perspectives should be introduced. First, It will be explicated further in terms of the probability density function (pdf). The pdf of the interferometric phase is independent of baselines, so does the kurtosis. It can be revealed in (1). In other words, the kurtosis of the interferometric phase is baseline-invariant. However, in virtue of impacts of discretization operation such as sampling, numerical calculation and so forth, the kurtosis of interferograms with different baseline fluctuates around a constant in reality. Fortunately, the fluctuation variance is small enough. So the negative effect of the fluctuation variance can be ignored. On the other hand, as far as the image is concerned, interferograms of the same topography with different baseline intuitively have similar texture trends which show similar statistical behaviors. Correspondingly, the kurtosis maintains invariant in images with similar statistical behaviors, which gives strong support for the baseline-invariant property of kurtosis in multibaseline InSAR.
Some simulated analyses prove the baseline-invariant property. In order to verify the validity of that property for various types of interferograms (generated by various topographies), we select the DEM of five common topography, including Cone, Building, the Northeast plain, China (the elevation below 500 m, the relief is not more than 200 m), the Sichuan Basin, China, Mangkang Mountain, Tibet, China (Plateau, the elevation above 500 m, the relief is more than 200 m). All primordial elevation data are derived from simulated data and Shuttle Radar Topography Mission DEM (SRTM-DEM) elevation data which is provided by Geospatial Data Cloud site, Computer Network Information Center, Chinese Academy of Sciences (http://www.gscloud.cn). The elevation data and typical interferograms of them are shown in Figure 4. In the light of the DEM of ground scenes and parameters of multibaseline InSAR simulation systems as shown in Table 1, we start with projecting the elevation data into the slant coordinate system. Then we calculate 81 ideal interferograms for each topography when the baseline varies from 50 m to 500 m based on the interferometry principle. The kurtosis of interferograms with different baseline corresponding to each topography is calculated and its boxplot is shown in Figure 5. In order to approach to the actual situation, the fringe density of simulated interferograms should not be too sparse or too dense. Therefore, the system central frequency for different topography is different. However, in this section, the baseline-invariant property of the kurtosis of interferograms with different baselineswe is confirmed. In other words, the fact that the kurtosis of interferograms with different fringe density remains constant is verified. So the change of the central frequency is not taken into account. We observe the kurtosis standard deviation of each topography numerically in Table 2. The result implies that the maximum standard deviation of kurtosis is 0.1056. Considered the influence of numerical calculation and sampling, it is interpreted that the kurtosis of interferograms with different baseline keeps constant.  Figure 5. Boxplot of kurtosis corresponding to various topography.

Noise Level Estimation
In this section, the principle and process of the noise level estimator proposed in this paper is introduced in detail. The real part and the imaginary part of interferograms are handled respectively. Taking the real part as an example, it is decomposed into M components in NSST domain. In addition, the additive noise model applies to all shearlet components.
where y i ,x i ,n i represents the ith NSST coefficient of the observed phase, ideal phase and noise, respectively. The variance of y i is represented as where σ 2 indicates the estimated noise level of the ith NSST coefficient for a white Gaussian noise of standard deviation 1. It is calculated by the Monte Carlo Estimation Method. Then we deduce (21) from (18), (19) and (20): Since the assumption that n i obeys the Gaussian distribution, κ(n i ) = 0. Besides, the coefficient distribution of subband components is generally more centralized than the Gauss distribution, that is, κ(x i ), κ(y i ) ≥ 0, because the interferogram is subdivided into subband components with different scales and directions. The deterministic relationship between noise variance and kurtosis is deduced from (18) and (21), as shown in (22).
Equation (22) is the kurtosis model which describes the deterministic relationship between the kurtosis and the noise variance. The kurtosis and variance of the observed phase y i can be calculated directly but the kurtosis of the ideal phase and the noise variance are unknown. The number of unknown variables is larger than the number of equations, so the noise variance cannot be determined directly by Equation (22). A large number of texture structures appearing in interferograms represent similar characteristics with noise in the frequency domain. The existence of texture structure leads to great errors of the noise variance estimation achieved by the minimization method of (22), that is, l1-minimization [31], l2-minimization [32]. To solve this problem, Equation (22) and the baseline-invariant property which acts as another equation are used to jointly estimate the noise variance. With the help of the new information from the baseline-invariant property, the estimation with higher accuracy is realized. the baseline-invariant property is represented as The form of sqrt is adopted for the convenience of the subsequent solution of optimization model. The following optimization model is proposed from (22) and (23).
where the superscript j denotes the jth baseline and the subscript i denotes the ith coefficient component. The first term of optimization function is deduced by the baseline-invariant property of kurtosis and another one is added for fitting the kurtosis model in (22). Then the constraint is derived from the fact that the kurtosis of coefficients decreases owing to the noise disturbance as shown in Figure 3. The aforementioned optimization function is constrained and non-convex optimization problem with two variables: σ 2 n and κ(x j ) N j=1 , which should be considered and optimized simultaneously. This constrained and non-convex optimization problem can be decomposed into two continuous and convex optimization sub-problems by fixing one variable to optimize another variable. Firstly, fix the noise variance σ 2 n and then update κ(x j ) N j=1 . The optimization model 1 to be solved is Ignoring the second item which is independent with κ(x j ) N j=1 . Let A is a diagonal matrix of N × N and the diagonal element is Then the optimization function can be simplified as Because A + B is a positive definite matrix, it is a standard convex optimization for quadratic programming with constraints and can be solved directly.
Let the partial derivative of function (27) equals to 0 and then we get the noise variance.
Iteratively update κ(x j ) N j=1 and σ 2 n , until convergence. A pivotal assumption of the aforementioned method is that the noise variance remains constant spatially. Namely, the noise is homogeneous throughout the image space. Yet the interferogram suffers the coherent noise with spatially variable characteristic. The consistent noise variance induces unbalanced filtering results. So we further advance the global noise variance to the local noise variance. Specifically, we divide the image into a certain amount of non-overlapping patches with the same size and assume the stability of noise variance in each patch and estimate its local noise variance simultaneously. The noise level estimation procedure can be summarized in Algorithm 1.

Results
In this section, we validate the efficiency and validity of the proposed method via extensive experiments on simulated and real interferograms. Experiments consist of three parts. First of all, we demonstrate the estimation accuracy of noise level on simulated noisy interferograms. It is the crux of the proposed method. Then, the simulated experiments are conducted. As a promotion of NSST filter, the developed method is compared with five state of the art single-baseline filters, including: Goldstein method, local frequency estimate (LFE) algorithm, optimal integration-based adaptive direction filter (OADF), iterative NL-InSAR and InSAR-BM3D. Finally, the proposed method on real interferograms will be tested. For simplicity, the proposed method is termed as NSST in the following sections. The parameters of various filters are set as

Noise Estimation Experiments
The key of the proposed method is noise variance estimation. In this section, we verify the accuracy of the estimated noise variance on simulated data. The original elevation model is a cone, as shown in Figure 4a. As noted before, the interferometric phase accords with the additive noise model in complex domain, the real part and the imaginary part are denoised respectively. We generate three clean interferograms with different baseline. Noisy interferograms are disturbed by the circular complex standard Gaussian noise. The coherence of noisy interferograms is set to 0.1, 0.3, 0.5, 0.7, 0.9, respectively. The true noise variance of the real part, for example, is calculated numerically. Compared with the true value, the estimated noise variance is generated by the proposed method. In order to further test the robustness of the proposed method, 100 Mont-Calo simulations are conducted. Statistics of its results are shown in Figure 6, where the black dotted line is the mean of true value in 100 experiments. The comparison between the estimation and the mean implies the accuracy and stability of the proposed method. The maximum error rate is calculated to evaluate the estimation accuracy and is defined as Whereσ denotes the estimated value in 100 experiments,σ indicates the mean of true value. The result is shown in Table 3. It is obvious that some errors exist in the estimation. The higher the coherence is, the larger the SNR is. In the case of low coherence, the significant noise level engenders the confusion of the high-frequency information and the noise, which results in a slight overestimation. On the contrary, the noise near fringe in interferograms is mistaken for significant pixels owing to its weak effect to fringes in the case of high coherence. So the estimation is lower than the true value. We must emphasize that the maximum error rate is controlled within 8.76%. The underestimation is compensated by the excellent performance of Wiener filter.

Simulated Experiments
In this section, we simulate three interferograms of cone and mountain to assess the performance of the proposed method. The noise environment comprises two situations: constant and variable noise variance in spatial domain. It is necessary for comparative experiments within each section. The experiments in interferograms with unitary noise variance are conducted to inspect the reconstructed performance for phase jump and phase gradient mutation. We select interferograms with 400 × 400 pixels, which are generated by cone and contain both phase jump and phase gradient mutation. Its clean interferograms and noisy interferograms can be shown in Figure 7. Coherence is set to 0.5. The block operation is omitted because of the constant noise variance. The comparable results of the interferogram with the shortest baseline are shown in the first row of Figure 8.  Intuitively, the result of the Goldstein method is incorrect. There are obvious errors in OADF and LFE. NL-InSAR, InSAR-BM3D and the proposed method all obtain appreciable results. The mean square error (MSE) between the clean interferogram and the filtered interferogram confirms above statements. What is more, Table 4 lists the number of residues in the filtered interferogram and the computation time. Note that the bold font indicates the best performance in the table. Table 4 exhibits that the proposed method outperforms to others with a running time that is second only to Goldstein method. The similar MSE are found in InSAR-BM3D but its computation time is about twice as long as our method. The results in NL-InSAR is superior to Goldstein method, OADF, LFE but its operation efficiency is the worst due to iterative operation. By and large, a combination of minimum MSE, minimum number of residues and high efficiency has taken in our method. The second row of Figure 8 displays the mean and standard deviation of 100 Monte Carlo experiments at the central row of results. Thereinto, the black solid line is the true value. The blue dotted line denotes the mean of 100 experiments. The pale blue shadow is the range of three times standard deviation near the mean. The poorest result in Goldstein method is interrelated with fixed α and its boundedness to lower SNR. The result of NL-InSAR, InSAR-BM3D and our method for the stationary phase is close to unbiased estimation, while other methods emerge distinct deviation. The basic idea of OADF and LFE is the estimation to local direction and frequency of interferometric phase. So the invalid estimation has contributed to a heavy fluctuation near phase jump and phase gradient mutation. NL-InSAR produces excellent performance in phase jump but its non-local mean operation induces the outlier which can be observed on both sides. Nevertheless, it produces excellent performance in phase jump. Generally, InSAR-BM3D and our method outperform other methods but our method has higher operation efficiency.
Considering a more complex noise level model in the second experiment, in which the coherence ranges from 0.1 to 0.9 and increases from left to right at regular intervals. Figure 9 shows clean interferograms and noisy interferograms with the size of 240 × 240. The image is divided into 9 non-overlapping patches in noise variance estimation procedure. The size of each patch is 80 × 80.  In this part, a new evaluation index, which is expressed as the pixel-wise Gradient Magnitude Similarity (GMS) [34,35] between the reference and filtered images, is adhibited to evaluate the filtering results of various methods. Gradient magnitude is an apparent indication of the difference between adjacent pixels. The gradient of interferometric phase consists of two parts: the gradient of the local stationary phase reflects the local slope of topography and the similarity of gradient casts light upon the similarity of local topography. In addition, the outlier implies phase discontinuity within a phase period. Similar to the well-known Structure SIMilarity (SSIM) index, the gradient similarity of phase jump can also reflect the edge-preserving ability of methods. Therefore, it is worth to use GMS as a new evaluation index. GMS is defined as where λ is set to 0.0026 to ensure numerical stability. G o and G f indicate the gradient magnitudes of o and f . The gradient magnitudes is derived from (31) and (32).
where o and f indicate the original images and filtered images respectively. h x and h y indicate the Prewitt filter along the horizontal and vertical direction. They are derived from (33).
It should be noted that the larger the GMS value is, the higher the quality of the restored image is. When GMS = 1, the reference image is fully recovered. The mean of GMS map (GMSM) refers to the overall performance of GMS map. Figure 10 shows the filtering results, residual graph and GMS map corresponding to six different filters. Results show that all methods can correctly restore the original phase in the high-coherence region. However, only InSAR-BM3D and the proposed method get considerable results in the low coherence region. Besides, as far as the GMS map is concerned, our method has better ability to maintain the phase gradient, especially in the low-coherence region. The estimated phase of the proposed method tends to be more stationary and closes to the original phase. Table 5 shows the MSE, GMSM and computation time. Note that the bold font indicates the best performance in the table. The performance of various methods can be expressed as (where > denotes better performance): Computation efficiency: Goldstein>NSST>InSAR-BM3D>OADF>LFE>NL-InSAR As a whole, our method is superior to other methods. Goldstein OADF LFE NL-InSAR InSAR-BM3D NSST Figure 10. The filter results of the interferogram generated by a cone with variable coherence (top), the residuals graph (middle) and the Gradient Magnitude Similarity (GMS) map (bottom). We consider a more complex topography to test the performance of various methods. The elevation data of a steep mountain in Shaanxi Province, China is selected to generate three interferograms. Coherence is consistent with last experiment. The size of interferograms is 1600 × 1600. The image is divided into 25 non-overlapping patches in noise variance estimation procedure. The size of each patch is 320 × 320. Interferograms involve dense and sparse fringes. Dense fringes are mostly located in the region with low coherence, which can better verify the effectiveness of the proposed method. Figure 11 shows clean interferograms and noisy interferograms.
(a) Clean interferograms (b) Noisy interferograms Figure 11. Clean interferograms and noisy interferograms generated by a mountain with coherence ranging from 0.1 to 0.9. Figure 12 shows the filtered results. Table 6 shows the results are similar to the results of the previous experiment. Note that the bold font indicates the best performance in the table. The proposed method produces minimum MSE and maximum GMSM , which prove the prominent filtering performance of it. The minimum MSE of the proposed method proves that the result of the proposed method is closer to the true interferometric phase. And the maximum GMSM implies that the result of the proposed method has fewer outliers and better local stability. The number of residues of the proposed method is second only to InSAR-BM3D and is very close to InSAR-BM3D. The reduction of residues is up to 99.97% compared with the residues of noisy image. Moreover, the computation time of the proposed method is half of the time of InSAR-BM3D. In general, the proposed method not only has outstanding filtering performance but also has high operation efficiency. Goldstein OADF LFE NL-InSAR InSAR-BM3D NSST Figure 12. The filter results of the interferogram generated by a complex topography with variable coherence (top), the residuals graph (middle) and the GMS map (bottom).

Experiments on Real Interferograms
The original data set is three repeat-orbit InSAR data at Colorado Grand Canyon, USA, which is obtained by Alos-1 satellite. Figure 13 shows its interferograms. Baselines are 738.182, 1241.066 and 1827.02 m, respectively. The size of interferograms is 6000 × 5910. In noise variance estimation procedure, each interferogram is divided into 225 non-overlapping patches whose size is 400 × 394. Results are shown in Figure 14. Intuitively, the Goldstein method has completely failed. And the apparent noise remains in the result of OADF and LFE. The excellent results of NL-InSAR, InSAR-BM3D and the proposed method are similar.
In order to further compare various methods, the low-coherence region in the upper right corner (row: 1:1000, column: 4910:5910) is cropped to further analysis. Figure 15 presents denoising results of different methods. Table 7 lists the number of residues, the reduction rate of resides and computation time. Note that the bold font indicates the best performance in the table. The excellent performance of the proposed method can be confirmed directly by visual observation. In the proposed method, the reduction rate of residues (up to 99%) is remarkable and the result is more conducive for the subsequent phase unwrapping.  Eight phase profiles along the phase gradient direction, which involve intact phase period and satisfy local stationarity, are extracted for contrast. White lines in Figure 16 represent the phase profile at low-coherence region (line 2 and line 8), high-coherence region (line 1 and line 6), complex topography region (line 3), the region corresponding to steep topography (line 5), and so forth. As shown in Figure 17, results of phase profiles are arranged in the order of its position (increase from left to right, from top to bottom). For simplicity, Figure 17 only exhibits results of NL-InSAR, InSAR-BM3D and the proposed method, which are superior to other methods intuitively. As shown in line 3, none of three methods can recover the real phase correctly at complex topography region.The difficulty is inherent defect of interferogram with too long baseline. The phase profile at flat region with high-coherence, which corresponds to line 1 and line 6, is estimated appropriately by NL-InSAR and the proposed method. However, a few abnormal values arise in the result of InSAR-BM3D. The comparison result of the number of abnormal values at high-coherence region corresponding to steep topography (line 5) can be expressed as: the proposed method≥NL-InSAR>InSAR-BM3D. For the low-coherence region (line 2 and line 8), the proposed method outperforms NL-InSAR and InSAR-BM3D. The proposed method produces a more stationarity and authentic result. It is consistent with the result in Figure 15. On balance, the proposed method has the best comprehensive performance.

Conclusions
An attempt to the joint filtering method in multibaseline InSAR based on the statistical property of interferometric phase is proposed in this paper. This paper analyses the high-order statistical property of interferograms with different baseline and proposes an empirical assumption: the kurtosis of interferograms with different baseline keeps invariant. Simulated experiments give numerical support to it. The filtering process of the proposed method involves four parts: the NSST decomposition, the noise level estimation, pre-thresholded Wiener filter and inverse NSST. NSST gives an optimal sparse representation of distributed discontinuities, such as fringes of interferograms. We obtain a series of NSST coefficients at different scales and directions after NSST decomposition. Based on the kurtosis model in NSST domain and baseline-invariant property of interferograms, the noise variance of interferograms is represented as the solution of a constrained non-convex optimization problem. The clean NSST coefficient is estimated by the Wiener filter with the local noise variance derived by block estimation. The noise estimation experiments prove the validity of the noise level estimator. Experiments on simulated data and real data prove the edge-preservation performance and excellent filtering performance of the proposed method. Many coefficient components with the same kurtosis are obtained by NSST. Sufficient data means that the filtering performance of the proposed method is not affected by the number of interferograms. The great performance can be acquired when the number of interferograms is small. However, a large amount of memory is occupied by a large number of coefficient components. The algorithm has some requirements for memory performance. But this problem can be alleviated by adjusting the scale of NSST decomposition according to the actual computer performance.