Sparse Estimation Based on a New Random Regularized Matching Pursuit Generalized Approximate Message Passing Algorithm

Yongjie Luo 1, Guan Gui 2,*, Xunchao Cong 1 and Qun Wan 1 1 Department of Electronic Engineering, University of Electronic Science and Technology of China, No. 2006, Xiyuan Ave., West Hi-Tech Zone, Chengdu 611731, China; 201311020326@std.uestc.edu.cn (X.C.); wanqun@uestc.edu.cn (Q.W.) 2 College of Telecommunication and Information Engineering, Nanjing University of Posts and Telecommunications, 66 Xinmofan Road, Nanjing 210003, China; guiguan@njupt.edu.cn * Correspondence: guiguan@njupt.edu.cn


Introduction
Compressed Sensing (CS) has been a research focus in recent years.The idea of sparse estimation and compressed sensing has also been applied in adaptive filtering [1][2][3][4] and nonlinear dynamics systems [5,6].Define the support of an unknown K-sparse (K N) vector x ∈ C N as: S = {j : j ∈ {1, . . ., N}, x j = 0}.We consider an under-determined noisy linear system: where y ∈ C M is the measurement vector, ∈ C M is the additive Gaussian noise vector and A ∈ C M×N (M < N) is the projection matrix, also termed the sensing matrix.The aim of CS is to reconstruct x from y and A.
On the one hand, it is well known that the CS problem can be rephrased into a probabilistic inference problem on the bipart factor graph shown in Figure 1a.Donoho et al. proposed the Approximate Message Passing (AMP) algorithm [7,8] to solve it.Rangan et al. generalized AMP to adapt to an arbitrary sparse prior and arbitrary measurement noise; their work is called the Generalized Approximate Message Passing (GAMP) algorithm [9].Since the derivation of AMP/GAMP is based on a zero-mean Gaussian projection matrix hypothesis, they cannot work well if the projection matrix does not satisfy the hypothesis, as shown in Section 4. This phenomenon motivates us to improve the AMP/GAMP algorithm for a more general projection matrix.On the other hand, although 0 methods, such as Orthogonal Matching Pursuit (OMP) [10] and Compressive Sampling Matching Pursuit (CoSaMP) [11], are simple and effective, they use Least Squares (LS) to calculate the amplitude of non-zero elements.Therefore, these methods have to face the matrix inversion problem.When the scale of the projection matrix is small, this problem is not serious.However, when the scale becomes larger, the multipliers grow cubically as a function of sparsity K.This drawback stimulates us to find some approach to replace the LS step in 0 methods and to overcome the disadvantages of OMP and CoSaMP.In this paper, we propose a new sparse estimation algorithm, termed Random regularized Matching pursuit Generalized Approximate Message Passing (RrMpGAMP), to solve the CS problem.Our method can be regarded as a kind of 0 optimization algorithm, termed Random regularized Matching Pursuit (RrMP), but the mean and variance estimation of x are the output of a restricted GAMP algorithm, termed the Fixed support GAMP (FsGAMP).In this way, it allows a more general projection matrix and does not calculate the inverse matrix directly.The computational complexity of RrMpGAMP is comparable to GAMP, and the memory occupation of RrMpGAMP is much less than GAMP and other LS-based algorithms.
The rest of this paper is organized as follows.We describe the RrMpGAMP algorithm in Section 2, then analyze the convergence of FsGAMP by the replica method in Section 3 and then perform experiments in Section 4. We draw a conclusion in the last Section 5.The derivation of the FsGAMP algorithm is provided in the Appendix.

Random Regularized Matching Pursuit Generalized Approximate Message Passing
As mentioned in the Introduction, the RrMpGAMP algorithm includes two parts: the RrMP algorithm and the FsGAMP algorithm, which is embedded in RrMP.The former is a new 0 optimization algorithm; the latter is a tiny revised GAMP algorithm.
At the beginning, we assume the support S has been known, which means the probability of every zero element in x is one.Therefore, we just need to consider the probability of other non-zero elements in x.We suppose that the Probability Distribution Function (PDF) of x and the likelihood function of z = Ax are separable, i.e., Therefore, the probabilistic form of Equation ( 1) can be written as: Actually we do not know S in the first place; a straightforward idea is to find it by some matching pursuit-based method.

Random Regularized Matching Pursuit Algorithm
A traditional matching pursuit algorithm (MP or OMP) finds an index of the non-zero element of x at each iteration.It does not rollback even if an incorrect index has been chosen.Our proposed algorithm sequentially pursues multiple indexes at each iteration and has the ability to undo, like the well-known Regularized OMP (ROMP) algorithm and the CoSaMP algorithm.
We use the subscript l to denote left, r to denote right and |U| to designate the cardinality of temporary support U. Define three functions appearing in the algorithm: FixSuppGAMP(y, A, U) calculates the mean and variance estimation of x; R 2s\s (v) finds the indexes of the 2s < K absolutely largest entries of v, where s is the probe length, and then shuffles these indexes and bisects them; the outputs are the left and right indexes' subset and the amplitudes supported on them, respectively; support updating function H(u, c, Λ, S) generates a new indexes' set by a group of rules; we will explain it soon.
The steps of RrMP are listed in Algorithm 1.
They are separated into two stages: the first stage is Lines 2-5, and the second stage is Lines 6-9.At the first stage, Line 2 obtains two index candidates, Λ l and Λ r , and the correlation coefficients between the residual and columns of A, c l and c r .It is a key step, since the shuffle and bisecting operations are equivalent to a random search in N dimensional {0, 1} space.Line 3 merges the previous support set S n with two current candidates, respectively.Lines 4 and 5 roughly estimate x by the FsGAMP algorithm.At this stage, the candidate indexes are not replaced/dropped.Therefore, the number of possible incorrect indexes is typically greater than that at the second stage.This is the meaning of "roughly".At the second stage, Lines 6 and 7 choose the minimal residual branch as a new candidate.This prune operation can be regarded as a regularization since the algorithm chooses a small residual branch and discards a big residual branch.In another words, the algorithm discards half of the candidate indexes pursued by Line 2. Line 8 updates the support set.Line 9 estimates x again.Note that Lines 3-5 can be calculated by parallel computing.
We define the precision τ as the ratio of the residual norm over the measurement norm, and set the error bound to 0.01, which means 99.99% of the measurement energy has been recovered.When |P| equals sparsity K, if the precision does not reach the error bound, set S n+1 = P, keep running RrMP at most 2s times and then stop the algorithm.
The idea of the support updating function comes from two observations: (i) not all indexes in the candidate set are correct, so the algorithm should drop or replace a part of the indexes, as shown in Equations ( 7) and (8); the dropout/replacement operations can be regarded as a regularization; (ii) the candidate indexes corresponding to correlations c n+1 and to amplitudes u n+1 are not overlapping entirely; some correct indexes belong to the former, and some belong to the latter.It is necessary to merge them, as shown in Equation (9).Now, we define the support updating function H(u n+1 , c n+1 , Λ n+1 , S n ) as follows.Firstly, define two threshold functions corresponding to the sparse representation coefficients u, which are evolved with iterations: b max Secondly, define the indexes' updating procedure just for u: Thirdly, define the the indexes' updating procedure just for the correlation coefficients c: Lastly, we merge the two index sets to obtain the output of: These update rules are inspired by the regularized OMP algorithm [12,13].It seems a bit complex, but the basic rationale is simple.We segment the absolute amplitudes on the pursued support at each iteration to ladders; the height of each ladder is dominated by the minimum of the absolute amplitudes, which is supported on the previous pursued support set S n , and by the maximum of absolute amplitudes, which is supported on the current candidate support set Λ n+1 .Since the square of the amplitude is energy, the absolute amplitudes in the same ladder step can be explained to obey a similar energy level.For Equation (7), • If a 2 > b, this means that the absolute amplitudes of the current candidate indexes are much smaller than the absolute amplitudes of the previous pursued indexes, so the algorithm chooses only one current candidate index.This situation corresponds to a sharp ladder step.
• If a > b ≥ a 2 , this means the absolute amplitudes of the current candidate indexes are just a little smaller than the absolute amplitudes of the previous pursued indexes, so the algorithm chooses those indexes whose absolute amplitudes are greater than a/2.This situation corresponds to a flattened ladder step.
2 , this means the absolute amplitudes of the current candidate indexes are greater than the absolute amplitudes of the previous pursued indexes, but the minimum absolute amplitude of the previous pursued indexes is not a very small value, so the algorithm chooses those indexes whose absolute amplitudes are greater than b/2.This situation also corresponds to a flattened ladder step.

• If b
2 > a, that means the minimum absolute amplitude of the previous pursued indexes is very small, so the algorithm drops a part of the previous pursued indexes and adds a part of the current candidate indexes.This situation corresponds to a sharp ladder step.However, it is not enough to choose indexes just by absolute amplitudes.Some correlations about the residual and the columns of A are significant.The algorithm finds these indexes by Line 2 in Algorithm 1. Maybe, the absolute amplitudes supporting them are not significant, such that they are excluded by Equation (7).We can imagine these indexes as free electrons that are escaping to other energy levels.Therefore, the algorithm should retrieve them.This is the meaning of Equation ( 8).The rules are heuristic, the explanations are analogies to physics.

Fixed Support GAMP Algorithm
The concept of message passing on a factor graph can be found in [14].In contrast to AMP/GAMP, the FsGAMP algorithm does not reform Equation (1) to a bipart graph.It restricts the edges to a part of variable nodes, which correspond to the temporary support U, as shown in Figure 1b.Based on this factor graph and the posterior probability Equation ( 4), firstly, we define the messages from the factor node to the variable node.For the Max-Sum rule, they are: where c means constant ; for the Sum-Product rule, they are: Secondly, we define the messages from the variable node to the factor node.The form of the Max-Sum rule is the same as the form of the Sum-Product rule: The derivation of FsGAMP is very similar with GAMP [9].Due to space limitations, we only emphasize the difference in the main body and left the details to the Appendix.Since |U| non-zero elements of x have been pursued, we set other elements of x to zero directly, such that the ∑ N j =k a mj x j and ∑ Other derivation steps are the same as GAMP.It is worth noting that the derivation of GAMP is primarily based on the Central-Limit Theorem (CLT), but in FsGAMP, this condition is canceled, because we have known the index set of zero elements (though maybe incorrect), such that the mean and variance estimation of these elements are also zeros.This simplification eliminates uncertainty and brings stability to FsGAMP to adapt more types of projection matrices.
Rangan et al. [9,15] investigated the influence of damping in GAMP.They found that damping can induce convergence.Similar steps are used in the FsGAMP algorithm: The steps of FsGAMP are listed in Algorithm 2. Notation t indicates the iteration number; x and ν x denote the mean and variance of x; the element-wise product and division are denoted and .Function g out (•) and g in (•) calculate the Bayesian estimation of z and x, respectively.Please refer to [9] (Table 1) and the Appendix to find their specific forms.The prior p x (x S j ) can take a Gaussian, Laplacian or spike-and-slab distribution.

Input: y;
12: {damping steps} 14: stop ← TRUE 17: end if 18: end while GAMP calculates all elements of x in the loop, while FsGAMP estimates at most |U| non-zero elements of x in the loop.Since |U| increases gradually and |U| ≤ K, the memory footprint rate of FsGAMP is not more than K/N.Because K N, the memory saving is remarkable.

Computational Complexity Discussion
The computational complexity of the GAMP algorithm is dominated by four matrix-vector multipliers per iteration [9].The multipliers of FsGAMP coincide with GAMP: Lines 4, 6, 10 and 11.At each multiplication, the FsGAMP algorithm only computes M × |S| multipliers, while the GAMP algorithm needs M × N multipliers.
RrMpGAMP has two nested loops; the outer loop is RrMP; the inner loops are three FsGAMPs.Because the replacement/dropout operations in the RrMP algorithm are random, we cannot exactly calculate how many new indexes are chosen per iteration.However, if we roughly estimate the number of winners in these candidates to s/2 per iteration (usually, this number is greater than s/2 with the step in observation), the outer loop has J = K/(s/2) + 2s iterations.Furthermore, 2s can be removed, since it is a small quantity compared to K/(s/2) .Because the number of iterations of GAMP (as well as FsGAMP) is also uncertain, we simply use the maximal iteration number B as the worst setting.Firstly, taking no account of M and B, we have: where the first square brackets correspond to Lines 4-5 in Algorithm 1, and the second square brackets correspond to Line 9 in Algorithm 1.After summation and simplification, we have: Secondly, considering M, B, four matrix-vector multiplies per iteration, and expanding J, we have: For GAMP, the computational complexity can be estimated as O(4MNB), If 4K < N, the computational complexity of RrMpGAMP is less than GAMP.
Using the matching pursuit-based methods to solve a M-row n-column linear system, the computing time mainly depends on the least squares step, especially when the number of equations M is large and the number of variables n is not small.For example, the Householder-LS method needs O(2n 2 (M − n/3)) flops ( [16], Algorithm 5.3.2).Because OMP finds one column index at a time and n increases from one to K, the computational complexity of OMP using Householder-LS is: since K M. We compare two computational complexity results.For RrMpGAMP, it is O(16MBK); for OMP using Householder-LS, it is O(MK 3 ).If 16B < K 2 , the computational complexity of RrMpGAMP is less than Householder-LS OMP.

Convergence Discussion
We use the replica method [17,18] to asymptotically describe the evaluation of the logarithm of the partition function of the posterior distribution Equation (4).We concentrate on the Sum-Product FsGAMP algorithm in this discussion.The main advantage is that replica computations give a statistical physical meaning to the linear system.The replica method was used for compressed sensing in recent years [19][20][21][22]; we follow the research of [23].
We use the zero-mean Gaussian matrix assumption in the derivation of FsGAMP and replica analysis, because FsGAMP just works on the RrMP pursued support set, and the cardinality of it is much smaller than the cardinality of a sparse vector, i.e., |S| N. We know that the brief propagation on the factor graph tends to diverge if there are many cycles on it [14].When the number of cycles decreases, the brief propagation tends to converge.The factor graph form of compressed sensing is a bipart graph, and there are too many cycles on it.However, if we restrict the connections to the pursued variable nodes and sequentially construct the connections by some kind of pursuit strategy, such as RrMP pursuit, the number of cycles increases from zero to a relatively small integer, compared to the huge number of bipart connection cycles.These relatively small cycles and sequential pursuit method can be seen as a regularization to the non-zero-mean reality.This is our intuition.

The Replica Method Analysis of FsGAMP
The zero-mean Gaussian measurement noise is assumed an equivalent variance ∆ for every entry of , and generally, ∆ = ∆ 0 , where ∆ 0 is the true noise variance.This assumption means that the noise variance (inside ν z 0 (t) term), which is estimated by the FsGAMP algorithm, usually does not equal the true noise variance.The influence of their difference is shown in Equation (74).We rewrite the definition of the CS problem as follows: where s is the original signal, and we use the notation s to replace notation x in Equation ( 1) to avoid ambiguity in the following derivation and use µ to replace m in accordance with the replica computations' convention.We denote α = M/N with the number of measurements per variable.
In the asymptotic analysis, we are interested in the case of large system limitation N → ∞, while keeping signal density p x 0 (x 0 ) and measurement rate α of order one, where we let x 0 = s.It is also necessary to assume that the components of signal s and measurement y are also order one, such that we can consider the elements of sensing matrix A to be zero-mean.It is worth noting that the variance of the order of A's elements is O(1/K), not O(1/N), because we have pursued the indexes of non-zero elements.This makes it not necessary for the FsGAMP algorithm to summarize N entries in one row of A. In this way, the variance range of A's elements enlarges.Before the start of the computation, we briefly review the link between a Hamiltonian and linear system estimation problems (we know the form of compressed sensing is a linear system).The posterior distribution of a linear system in the Boltzmann form is: where p(x|θ) is the prior, θ are the model parameters, such as: Additionally, p(y|x, θ) is likelihood: Then, we get the Hamiltonian: Now, we derive the potential of Equation ( 21) under the pursued |S| column indexes' condition, as shown in Equation (4).The meaning of "potential" in inference theory equals the meaning of "Bethe free entropy" in statistical physics.For the simplification of notation, we ignore the uppercase letter S and just use the subscript, such as i, to indicate S i in Equation ( 4).The aim is to sample a vector x from the probability measure: Equation ( 26) can be seen as the Boltzmann measure on a disordered system with Hamiltonian: where the partition function Z is the normalization constant of the full posterior distribution Equation ( 26) and x is the configuration of the Hamiltonian.The thermodynamic properties of the disordered system are characterized by the average free entropy E A,s, (log Z), where: Because the expectation of the logarithm function is a hard problem, the average free entropy (also called the Bethe free entropy) can be evaluated via the replica trick as: where n independent replicas are introduced to reform the original disordered system to a new system (i.e., the name of the method) and E A,s, is the average over all of the sources of disorder in Equation (21).Each configuration of the new system is indexed by an n-tuple i = (i 1 , . . ., i n ), where each i a , a = 1, . . ., n has a continuous non-zero state value because i a corresponds to a non-zero element in x, which has been found by the matching pursuit process of the RrMP algorithm.Now, the problem of computing the free energy is converted into computing the n-th moment of the partition function Z.
[Z(A, s, )] n = 1 (2π∆) The average replicated partition function can be rearranged as: where a, b, . . ., n denote the replica indexes.Define: Equation ( 31) can be rewritten to: In order to compute X µ , firstly, we should define a variable: where x 0 i s i , to help the following derivation.By applying the central limit theorem to v a µ , since it is a sum of i.i.d.Gaussian terms of the sensing matrix A at a fixed signal s and configuration x, we conclude that v a µ obeys a joint Gaussian distribution.When the matrix A has i.i.d.elements with zero-mean and variance 1/K, we introduce the order parameters to summarize an amount of microscopic states of the replicas to four macroscopic states as follows: These parameters constitute a matrix, called the overlap matrix: Furthermore, according to the fact that both A and are zero-mean, we conclude the first two moments of the joint Gaussian distribution: where < s > 2 = s 2 p s (s)ds.The dependence on the measurement index µ of v µ and X µ is canceled due to the averaging; therefore, we note v µ = v and X µ = X for simplicity.In order to further calculate the expectation in Equation (33), we apply the replica symmetric ansatz: which is valid for the inference problem on a locally tree-like or highly dense factor graph under the prior matching condition.Fortunately, by the matching pursuit process in RrMP, we found the positions of non-zero elements in the signal, such that the condition has been satisfied.Based on this ansatz, Equation (39) equals: We want to compute: with a probability distribution: where the covariance matrix G of {v a } under the replica symmetric ansatz is given by: where 1 n is a n × n matrix with elements all equivalent to one and I n is an unitary diagonal matrix.Then, Equation (45) equals: The eigenvectors of G can be divided into two categories: one eigenvector of the form (1, 1, . . ., 1) with associated eigenvalue Q − q + n(q − 2m+ < s 2 > +∆ 0 ), and n − 1 eigenvectors of the form (0, . . ., 0, −1, 1, 0, . . ., 0) with eigenvalues Q − q, where the couple (−1, 1) shifts one by one.Now, we have: Now, back to Equation (33), we need to guarantee that the order parameters m, q, Q coincide with their definition Equation (35), and this guarantee can be obtained by Dirac functions: Since Dirac functions in the frequency field are the Fourier transform of const 1/2π in the time field, we have: (57) where j = √ −1.It is worth noting that ma , Qa and qab also satisfy the replica symmetric ansatz.Rewriting const one in the time field as the inverse Fourier transform of Dirac function 2πδ(•) in the frequency field and substituting the right-hand side of Equations ( 56)-( 58), we obtain: (60) and consider n replicas: Plugging Equation (62) into Equation (33), we obtain: We denote Γ as the integration in the {•} K .The exponential part of Γ has many cross terms of n replicas, such that we should decouple them by linearizing the exponent.According to the Hubbard-Stratonovich transform: with the Gaussian measure: and the square completion: we have: Using the replica symmetric ansatz again, we obtain: we have: such that: Now, we combine Equations ( 52) and (70) with Equation (63) and use the replica symmetric ansatz again: where: The integration Equation (71) is intractable, otherwise.We use the saddle point method to estimate this integration by taking the optimum of Φ( Q, Q, m, m, q, q).It is worth noting that the replica trick needs lim n→0 lim K→∞ (•), but the saddle point estimation needs lim K→∞ lim n→0 (•); furthermore, Equation ( 71) is obtained under the condition lim n→0 (•).Therefore, we assume that the limits can be exchanged, and the saddle point estimation of the integration Equation (71) should be done before we really let lim n→0 (•).These assumptions are not rigorous, but verified in the inference problem.Using the saddle point method and making derivatives for m, Q, q, respectively, we get: and: Using the saddle point method and making derivatives for m, Q, q, respectively, and substituting Equation (76) for the derivatives, we get: where the input end probability distribution is defined as: Then, the mean function is given by: and the variance function is given by: In fact, Equations ( 81) and (82) correspond to the results of the g in (•) function in Algorithm 2.

The Prior Matching Conditions and Nishimori Conditions
The replica symmetric ansatz means that all of the replicas belong to the same state configuration, such that every replica has the same statistical properties.This state configuration is given by the macroscopic order parameters m, Q and q.Based on the replica symmetric ansatz, m is a correlation between the original sparse signal s and its estimation x about all of the replicas; then, we have: Similarly, Q is a self-correlation of these replicas; then, we have: Another parameter q is the correlation between these replicas.As soon as the measurement and the original signal satisfy the correct reconstruction condition, such as the Restricted Isometry Property (RIP), the differences between these replicas are reflected in the measurement noise.Therefore, Notice that pinv is the matrix Moore-Penrose pseudoinverse based on an SVD decomposition.It is suitable for the case that the matrix has more rows than columns and is not of full rank; then, the overdetermined least squares problem does not have a unique solution.The implementation of OMP and CoSaMP comes from Needell's work [24]; BP comes from the software package 1 -MAGIC [25]; AMP comes from Kamilov's work [26]; and GAMP comes from gamplab [27].
We do Q = 60 trials in each experiment and use the relative mean square error (RMSE): 2 / x l 2 as the performance metric.Set M = 128, N = 256 in these experiments, except the last experiment.Define the sparsity-measurement ratio ρ K/M and choose K locations at random as the support of x.The amplitude of non-zero entry x i is independently drawn from N(x; 0, 1).The signal-to-noise ratios (SNR) are calculated by SNR = 10 log 10 ( Ax 2 /(Mν)).

More General Projection Matrix Cases
The third experiment investigates the reconstruction performance of various sparse projection matrices.Define the sparsity ratio of matrix A as η; set the range of η to [0.1, 0.2, . . ., 0.9]; and fix K = 20, SNR = 20 dB. Figure 3a shows that RrMpGAMP-L4 performs the same as GAMP when η > 0.2 and evidently better than the other four competitors.
For AMP and GAMP, although good performance can be achieved by zero-mean i.i.d.matrix A, it tends to drastically decline even for a small positive bias.The fourth experiment, termed the γ test, shows this phenomenon.The elements of A are independently drawn from a Gaussian distribution N(a; γ/N, 1/N), where the mean is controlled by a positive parameter γ.Set the range of γ to [1, 1.6, 1.8, 2, 2.2, 2.4, 3.4, 3.6, 3.8, 4, 5, 10, 20, 40, 60, 80, 100], and fix K = 20, SNR = 20 dB, Figure 3b shows that GAMP violently diverges at γ = 2; AMP diverges at γ = 4; but RrMpGAMP-L4 maintains good performance until γ = 40.Although OMP, CoSaMP and BP also work well, RrMpGAMP-L4 performs a little better than OMP and CoSaMP when γ < 40 and obviously better than BP.The fifth experiment, termed the α test, considers an even more troublesome setup for strong correlated A, and the elements of A are neither normal nor i.i.d.distributed.Fixing K = 12, SNR = 30 dB, the projection matrix is constructed by A = (1/N)PQ.The elements of P and Q i.i.d.obey Gaussian distributions p mr ∼ N(p; 0, 1) and q rn ∼ N(q; 0, 1), where P ∈ R M×R , Q ∈ R R×N and R αN .The variation of α changes the size of P and Q and changes the rank of matrix A. This means A is low rank for α < M/N.Set the range of α to [0.2, 0.3, . . ., 1.0].Since GAMP totally diverges (ξ ≈ 10 6 ) at every α, it is not visible in Figure 3c.This figure shows that RrMpGAMP-L4 keeps stable even in low rank scenario α ≈ 0.3, obviously better than AMP.

TomoSAR Imaging Application
The last experiment shows four TomoSAR imaging results.TomoSAR imaging is a spatial scatterer distribution reconstruction problem [28].SAR imaging algorithms can be categorized into two classes: finding a dense solution and finding a sparse solution.The former is based on Nyquist sampling theory and discrete Fourier transform, e.g., the Polar Format Algorithm (PFA) [29,30]; the latter is based on compressed sampling theory and a sparse-induced algorithm, such as OMP.Tomography in the SAR imaging is especially used in inferring forest structure from several acquisitions taken at different view angles.The images are not sparse in that case.However, there are some reasons to motivate a compressed sensing TomoSAR radar, e.g., a target in a wild dry lake bed or the scatters of a target are very sparse.We simply describe the TomoSAR imaging model.The two-dimensional spatial spectrum of the TomoSAR echo is given by: where f = 2π/λ is the spatial frequency, (x, y) is the position of a scatterer in the target coordinate, θ is the rotation angle between the radar coordinate and the target coordinate and g(x, y) is the scattering coefficient.Using summation to replace integration, the discrete variables are: spatial frequency f ∈ C P , where P is frequency sampling number; rotation angle θ ∈ C Q , where Q is the angle sampling number; scattering coefficient vector x ∈ C N , which is composed of g(x, y); where N ≥ P × Q is the pixel number of a TomoSAR image; received echo signal y ∈ C M ; and complex Gaussian noise ∈ C M , where M = P × Q.The linear system form of Equation ( 96) is the same as Equation ( 1).Define F(p, q, n) exp{−2j f p (x n cos θ q + y n sin θ q )}, and then: Because the scatterers are very sparse, it is feasible to randomly draw some rows from A to decrease P and Q, and to transform the TomoSAR imaging into a CS problem.
In this experiment, the TomoSAR echo data comes from a real-world crawler crane.P = 101 frequency samples are spaced drawn from the 1(GHz) bandwidth centered at carrier frequency f c = 9(GHz); Q = 101 angle samples are spaced drawn from 87.5 • ∼ 92.5 • centered at 90 • azimuth.The total pixels number is N = P × Q.We draw M = 0.5N rows from A randomly.Since the measurements number M is usually chosen to be O(K log N), K can be estimated roughly as τM/ log N; we set K = 50.Note that the elements of A and y are complex, so the prior of non-zero elements in x should be complex, as well.Use the PFA imaging result in Figure 4a as the reference, and compare OMP, CoSaMP and GAMP to RrMpGAMP-L6.Figure 4b shows that CoSaMP and GAMP cannot recover the TomoSAR image; OMP and RrMpGAMP-L6 work well, but RrMpGAMP-L6 performs a little better than OMP; see the reconstructed bottom margin.RrMpGAMP-L6 recovers a more continuous segment than OMP.

Conclusions
While the AMP/GAMP algorithm has been shown to be a very good approach for sparse signal recovery, it is also sensitive to problems that deviate its assumption.In this paper, we propose the stable RrMpGAMP algorithm, which matches GAMP's accuracy, performs better than AMP and GAMP, and remains robust to various projection matrices, with a small memory footprint.We use the replica method to analyze the FsGAMP algorithm, which is embedded in the RrMpGAMP algorithm, and find that enlarging the variance range of the elements of the sensing matrix does not break the convergence property of FsGAMP.We also obtain the convergence conditions of FsGAMP.Experiments confirm that our proposed algorithm performs very well in simulation and practical problems for which AMP and GAMP cannot be applied.In all cases, RrMpGAMP provides better performance as compared to BP, OMP and CoSaMP.Exact analysis of the RrMP algorithm remains an open problem for future work.

Figure 1 .
Figure 1.The factor graph description of a linear system.(a) Bipart factor graph; (b) Restricted factor graph.

Figure 2 .
Figure 2. Reconstruction performance comparison using the zero-mean Gaussian projection matrix.(a) Noise power test; (b) Sparsity range test.

Figure 3 .
Figure 3. Reconstruction performance comparison using three general form projection matrices.(a) Sparse matrix test; (b) Non-zero-mean matrix test; (c) Strong correlated matrix test.

Figure 4 .
Figure 4. TomoSAR imaging results comparison using four Compressed Sensing (CS) algorithms and Polar Format Algorithm (PFA) algorithm as a reference.(a) PFA reconstructed image; (b) CS reconstructed images.