Sparse Estimation Based on a New Random Regularized Matching Pursuit Generalized Approximate Message Passing Algorithm

Luo, Yongjie; Gui, Guan; Cong, Xunchao; Wan, Qun

doi:10.3390/e18060207

Open AccessArticle

Sparse Estimation Based on a New Random Regularized Matching Pursuit Generalized Approximate Message Passing Algorithm

by

Yongjie Luo

¹,

Guan Gui

^2,*

,

Xunchao Cong

¹ and

Qun Wan

¹

Department of Electronic Engineering, University of Electronic Science and Technology of China, No. 2006, Xiyuan Ave., West Hi-Tech Zone, Chengdu 611731, China

²

College of Telecommunication and Information Engineering, Nanjing University of Posts and Telecommunications, 66 Xinmofan Road, Nanjing 210003, China

^*

Author to whom correspondence should be addressed.

Entropy 2016, 18(6), 207; https://doi.org/10.3390/e18060207

Submission received: 14 February 2016 / Revised: 13 May 2016 / Accepted: 19 May 2016 / Published: 28 May 2016

(This article belongs to the Special Issue Information Theoretic Learning)

Download

Browse Figures

Versions Notes

Abstract

:

Approximate Message Passing (AMP) and Generalized AMP (GAMP) algorithms usually suffer from serious convergence issues when the elements of the sensing matrix do not exactly match the zero-mean Gaussian assumption. To stabilize AMP/GAMP in these contexts, we have proposed a new sparse reconstruction algorithm, termed the Random regularized Matching pursuit GAMP (RrMpGAMP). It utilizes a random splitting support operation and some dropout/replacement support operations to make the matching pursuit steps regularized and uses a new GAMP-like algorithm to estimate the non-zero elements in a sparse vector. Moreover, our proposed algorithm can save much memory, be equipped with a comparable computational complexity as GAMP and support parallel computing in some steps. We have analyzed the convergence of this GAMP-like algorithm by the replica method and provided the convergence conditions of it. The analysis also gives an explanation about the broader variance range of the elements of the sensing matrix for this GAMP-like algorithm. Experiments using simulation data and real-world synthetic aperture radar tomography (TomoSAR) data show that our method provides the expected performance for scenarios where AMP/GAMP diverges.

Keywords:

compressed sensing; random regularization; matching pursuit; generalized approximate message passing; replica method

1. Introduction

Compressed Sensing (CS) has been a research focus in recent years. The idea of sparse estimation and compressed sensing has also been applied in adaptive filtering [1,2,3,4] and nonlinear dynamics systems [5,6]. Define the support of an unknown K-sparse

(K ≪ N)

vector

x \in C^{N}

as:

S = {j : j \in {1, \dots, N}, x_{j} \neq 0}

. We consider an under-determined noisy linear system:

y = A x + ϵ

(1)

where

y \in C^{M}

is the measurement vector,

ϵ \in C^{M}

is the additive Gaussian noise vector and

A \in C^{M \times N}

(M < N)

is the projection matrix, also termed the sensing matrix. The aim of CS is to reconstruct

x

from

y

and

A

.

On the one hand, it is well known that the CS problem can be rephrased into a probabilistic inference problem on the bipart factor graph shown in Figure 1a. Donoho et al. proposed the Approximate Message Passing (AMP) algorithm [7,8] to solve it. Rangan et al. generalized AMP to adapt to an arbitrary sparse prior and arbitrary measurement noise; their work is called the Generalized Approximate Message Passing (GAMP) algorithm [9]. Since the derivation of AMP/GAMP is based on a zero-mean Gaussian projection matrix hypothesis, they cannot work well if the projection matrix does not satisfy the hypothesis, as shown in Section 4. This phenomenon motivates us to improve the AMP/GAMP algorithm for a more general projection matrix. On the other hand, although

ℓ_{0}

methods, such as Orthogonal Matching Pursuit (OMP) [10] and Compressive Sampling Matching Pursuit (CoSaMP) [11], are simple and effective, they use Least Squares (LS) to calculate the amplitude of non-zero elements. Therefore, these methods have to face the matrix inversion problem. When the scale of the projection matrix is small, this problem is not serious. However, when the scale becomes larger, the multipliers grow cubically as a function of sparsity K. This drawback stimulates us to find some approach to replace the LS step in

ℓ_{0}

methods and to overcome the disadvantages of OMP and CoSaMP.

In this paper, we propose a new sparse estimation algorithm, termed Random regularized Matching pursuit Generalized Approximate Message Passing (RrMpGAMP), to solve the CS problem. Our method can be regarded as a kind of

ℓ_{0}

optimization algorithm, termed Random regularized Matching Pursuit (RrMP), but the mean and variance estimation of x are the output of a restricted GAMP algorithm, termed the Fixed support GAMP (FsGAMP). In this way, it allows a more general projection matrix and does not calculate the inverse matrix directly. The computational complexity of RrMpGAMP is comparable to GAMP, and the memory occupation of RrMpGAMP is much less than GAMP and other LS-based algorithms.

The rest of this paper is organized as follows. We describe the RrMpGAMP algorithm in Section 2, then analyze the convergence of FsGAMP by the replica method in Section 3 and then perform experiments in Section 4. We draw a conclusion in the last Section 5. The derivation of the FsGAMP algorithm is provided in the Appendix.

2. Random Regularized Matching Pursuit Generalized Approximate Message Passing

As mentioned in the Introduction, the RrMpGAMP algorithm includes two parts: the RrMP algorithm and the FsGAMP algorithm, which is embedded in RrMP. The former is a new

ℓ_{0}

optimization algorithm; the latter is a tiny revised GAMP algorithm.

At the beginning, we assume the support S has been known, which means the probability of every zero element in

x

is one. Therefore, we just need to consider the probability of other non-zero elements in

x

. We suppose that the Probability Distribution Function (PDF) of

x

and the likelihood function of

z = Ax

are separable, i.e.,

\begin{matrix} p_{x} (x) & = 1 \times p_{x_{S}} (x_{S}) = \prod_{j = 1}^{K} p_{x_{S_{j}}} (x_{S_{j}}) \end{matrix}

(2)

\begin{matrix} p_{y | z} (y | z) & = \prod_{m = 1}^{M} p_{y_{m} | z_{m}} (y_{m} | z_{m}) \end{matrix}

(3)

Therefore, the probabilistic form of Equation (1) can be written as:

\begin{matrix} p_{x | y} (x | y) & = 1 \times p_{x_{S} | y} (x_{S} | y) \propto p_{y | x_{S}} (y | x_{S}) p_{x_{S}} (x_{S}) \\ = \prod_{m = 1}^{M} p_{y_{m} | z_{m}} (y_{m} | a_{m S_{k}} x_{S_{k}} + \sum_{j \neq k}^{K} a_{m S_{j}} x_{S_{j}}) \prod_{j = 1}^{K} p_{x_{S_{j}}} (x_{S_{j}}) \end{matrix}

(4)

Actually we do not know S in the first place; a straightforward idea is to find it by some matching pursuit-based method.

2.1. Random Regularized Matching Pursuit Algorithm

A traditional matching pursuit algorithm (MP or OMP) finds an index of the non-zero element of

x

at each iteration. It does not rollback even if an incorrect index has been chosen. Our proposed algorithm sequentially pursues multiple indexes at each iteration and has the ability to undo, like the well-known Regularized OMP (ROMP) algorithm and the CoSaMP algorithm.

We use the subscript l to denote left, r to denote right and

| U |

to designate the cardinality of temporary support U. Define three functions appearing in the algorithm:

FixSuppGAMP (y, A, U)

calculates the mean and variance estimation of

x

;

R_{2 s \ s} (v)

finds the indexes of the

2 s < K

absolutely largest entries of

v

, where s is the probe length, and then shuffles these indexes and bisects them; the outputs are the left and right indexes’ subset and the amplitudes supported on them, respectively; support updating function

H (u, c, Λ, S)

generates a new indexes’ set by a group of rules; we will explain it soon.

The steps of RrMP are listed in Algorithm 1.

Algorithm 1 Random Regularized Matching Pursuit.
Input: $y, A, K, s$
Output: $x^{n + 1}$
1:	$S^{0} \leftarrow \emptyset, x^{0} \leftarrow 0$
2:	$(Λ_{l}, c_{l}, Λ_{r}, c_{r}) \leftarrow R_{2 s \ s} (A^{T} (y - A x^{n}))$
3:	$U_{l} \leftarrow S^{n} \cup Λ_{l}$ , $U_{r} \leftarrow S^{n} \cup Λ_{r}$
4:	$u_{l} \leftarrow FixSuppGAMP (y, A, U_{l})$
5:	$u_{r} \leftarrow FixSuppGAMP (y, A, U_{r})$
6:	$θ = {arg min}_{{l, r}} {∥ y - A u_{l} ∥_{2}, ∥ y - A u_{r} ∥_{2}}$
7:	$Λ^{n + 1} \leftarrow Λ_{θ}, c^{n + 1} \leftarrow c_{θ}, u^{n + 1} \leftarrow u_{θ}, U^{n + 1} \leftarrow U_{θ}$
8:	$S^{n + 1} \leftarrow H (u^{n + 1}, c^{n + 1}, Λ^{n + 1}, S^{n})$
9:	$x^{n + 1} \leftarrow FixSuppGAMP (y, A, S^{n + 1})$

They are separated into two stages: the first stage is Lines 2–5, and the second stage is Lines 6–9. At the first stage, Line 2 obtains two index candidates,

Λ_{l}

and

Λ_{r}

, and the correlation coefficients between the residual and columns of

A

,

c_{l}

and

c_{r}

. It is a key step, since the shuffle and bisecting operations are equivalent to a random search in N dimensional

{0, 1}

space. Line 3 merges the previous support set

S^{n}

with two current candidates, respectively. Lines 4 and 5 roughly estimate

x

by the FsGAMP algorithm. At this stage, the candidate indexes are not replaced/dropped. Therefore, the number of possible incorrect indexes is typically greater than that at the second stage. This is the meaning of “roughly”. At the second stage, Lines 6 and 7 choose the minimal residual branch as a new candidate. This prune operation can be regarded as a regularization since the algorithm chooses a small residual branch and discards a big residual branch. In another words, the algorithm discards half of the candidate indexes pursued by Line 2. Line 8 updates the support set. Line 9 estimates

x

again. Note that Lines 3–5 can be calculated by parallel computing.

We define the precision τ as the ratio of the residual norm over the measurement norm, and set the error bound to

0.01

, which means

99.99 %

of the measurement energy has been recovered. When

| P |

equals sparsity K, if the precision does not reach the error bound, set

S^{n + 1} = P

, keep running RrMP at most

2 s

times and then stop the algorithm.

The idea of the support updating function comes from two observations: (i) not all indexes in the candidate set are correct, so the algorithm should drop or replace a part of the indexes, as shown in Equations (7) and (8); the dropout/replacement operations can be regarded as a regularization; (ii) the candidate indexes corresponding to correlations

c^{n + 1}

and to amplitudes

u^{n + 1}

are not overlapping entirely; some correct indexes belong to the former, and some belong to the latter. It is necessary to merge them, as shown in Equation (9).

Now, we define the support updating function

H (u^{n + 1}, c^{n + 1}, Λ^{n + 1}, S^{n})

as follows. Firstly, define two threshold functions corresponding to the sparse representation coefficients

u

, which are evolved with iterations:

\begin{matrix} a & ≜ min_{i \in S^{n}} {{| (u^{n + 1}) |}_{i}} \end{matrix}

(5)

\begin{matrix} b & ≜ max_{j \in Λ^{n + 1}} {{| (u^{n + 1}) |}_{j}} \end{matrix}

(6)

Secondly, define the indexes’ updating procedure just for

u

:

P ≜ \{\begin{matrix} S^{n} \cup {j : {arg max}_{j \in Λ^{n + 1}} {| {(u^{n + 1})}_{j} |}}, & \frac{a}{2} > b \\ S^{n} \cup {j : j \in Λ^{n + 1}, | {(u^{n + 1})}_{j} | \geq \frac{a}{2}}, & a > b \geq \frac{a}{2} \\ S^{n} \cup {j : j \in Λ^{n + 1}, | {(u^{n + 1})}_{j} | \geq \frac{b}{2}}, & b > a \geq \frac{b}{2} \\ \{i : i \in S^{n}, | {(u^{n + 1})}_{i} | \geq \frac{b}{2}\} \cup \{j \in Λ^{n + 1}, | {(u^{n + 1})}_{j} | \geq \frac{b}{2}\}, & \frac{b}{2} \geq a \end{matrix}

(7)

Thirdly, define the the indexes’ updating procedure just for the correlation coefficients

c

:

Q ≜ {j : j \in Λ^{n + 1}, | {(c^{n + 1})}_{j} | \geq 0.5 ∥ c^{n + 1} ∥_{\infty}}

(8)

Lastly, we merge the two index sets to obtain the output of:

S^{n + 1} = P \cup Q

(9)

These update rules are inspired by the regularized OMP algorithm [12,13]. It seems a bit complex, but the basic rationale is simple. We segment the absolute amplitudes on the pursued support at each iteration to ladders; the height of each ladder is dominated by the minimum of the absolute amplitudes, which is supported on the previous pursued support set

S^{n}

, and by the maximum of absolute amplitudes, which is supported on the current candidate support set

Λ^{n + 1}

. Since the square of the amplitude is energy, the absolute amplitudes in the same ladder step can be explained to obey a similar energy level. For Equation (7),

If $\frac{a}{2} > b$ , this means that the absolute amplitudes of the current candidate indexes are much smaller than the absolute amplitudes of the previous pursued indexes, so the algorithm chooses only one current candidate index. This situation corresponds to a sharp ladder step.
If $a > b \geq \frac{a}{2}$ , this means the absolute amplitudes of the current candidate indexes are just a little smaller than the absolute amplitudes of the previous pursued indexes, so the algorithm chooses those indexes whose absolute amplitudes are greater than $a / 2$ . This situation corresponds to a flattened ladder step.
If $b > a \geq \frac{b}{2}$ , this means the absolute amplitudes of the current candidate indexes are greater than the absolute amplitudes of the previous pursued indexes, but the minimum absolute amplitude of the previous pursued indexes is not a very small value, so the algorithm chooses those indexes whose absolute amplitudes are greater than $b / 2$ . This situation also corresponds to a flattened ladder step.
If $\frac{b}{2} > a$ , that means the minimum absolute amplitude of the previous pursued indexes is very small, so the algorithm drops a part of the previous pursued indexes and adds a part of the current candidate indexes. This situation corresponds to a sharp ladder step.

However, it is not enough to choose indexes just by absolute amplitudes. Some correlations about the residual and the columns of

A

are significant. The algorithm finds these indexes by Line 2 in Algorithm 1. Maybe, the absolute amplitudes supporting them are not significant, such that they are excluded by Equation (7). We can imagine these indexes as free electrons that are escaping to other energy levels. Therefore, the algorithm should retrieve them. This is the meaning of Equation (8). The rules are heuristic, the explanations are analogies to physics.

2.2. Fixed Support GAMP Algorithm

The concept of message passing on a factor graph can be found in [14]. In contrast to AMP/GAMP, the FsGAMP algorithm does not reform Equation (1) to a bipart graph. It restricts the edges to a part of variable nodes, which correspond to the temporary support U, as shown in Figure 1b. Based on this factor graph and the posterior probability Equation (4), firstly, we define the messages from the factor node to the variable node. For the Max-Sum rule, they are:

\begin{matrix} Δ_{f_{m} \to x_{U_{k}}} (t, x_{U_{k}}) & ≜ max_{\begin{matrix} x_{U} \end{matrix}} [log p_{y_{m} | z_{m}} (y_{m} | a_{m U_{k}} x_{U_{k}} + \sum_{j \neq k}^{| U |} a_{m U_{j}} x_{U_{j}}) \\ (10) & + \sum_{j \neq k}^{| U |} Δ_{f_{m} \leftarrow x_{U_{j}}} (t, x_{U_{j}})] + c \end{matrix}

where

c

means constant ; for the Sum-Product rule, they are:

\begin{matrix} Δ_{f_{m} \to x_{U_{k}}} (t, x_{U_{k}}) & ≜ log [\int_{{x_{U_{j}}}_{j \neq k}} p_{y_{m} | z_{m}} (y_{m} | a_{m U_{k}} x_{U_{k}} + \sum_{j \neq k}^{| U |} a_{m U_{j}} x_{U_{j}}) \\ (11) & \times \prod_{j \neq k}^{| U |} exp (Δ_{f_{m} \leftarrow x_{U_{j}}} (t, x_{U_{j}}))] + c \end{matrix}

Secondly, we define the messages from the variable node to the factor node. The form of the Max-Sum rule is the same as the form of the Sum-Product rule:

Δ_{f_{m} \leftarrow x_{U_{k}}} (t + 1, x_{U_{k}}) ≜ log p_{x_{U_{k}}} (x_{U_{k}}) + \sum_{i \neq m}^{M} Δ_{f_{i} \to x_{U_{k}}} (t, x_{U_{k}}) + c

(12)

The derivation of FsGAMP is very similar with GAMP [9]. Due to space limitations, we only emphasize the difference in the main body and left the details to the Appendix. Since

| U |

non-zero elements of

x

have been pursued, we set other elements of

x

to zero directly, such that the

\sum_{j \neq k}^{N} a_{m j} x_{j}

and

\sum_{j \neq k}^{N} {| a_{m j} |}^{2} x_{j}

terms in GAMP reduce to

\sum_{j \neq k}^{| U |} a_{m U_{j}} x_{U_{j}}

and

\sum_{j \neq k}^{| U |} {| a_{m U_{j}} |}^{2} x_{U_{j}}

. Other derivation steps are the same as GAMP. It is worth noting that the derivation of GAMP is primarily based on the Central-Limit Theorem (CLT), but in FsGAMP, this condition is canceled, because we have known the index set of zero elements (though maybe incorrect), such that the mean and variance estimation of these elements are also zeros. This simplification eliminates uncertainty and brings stability to FsGAMP to adapt more types of projection matrices.

Rangan et al. [9,15] investigated the influence of damping in GAMP. They found that damping can induce convergence. Similar steps are used in the FsGAMP algorithm:

\begin{matrix} ν^{p} (t) & = δ ν^{p} (t) + (1 - δ) ν^{p} (t - 1) \end{matrix}

(13)

\begin{matrix} \hat{s} (t) & = δ \hat{s} (t) + (1 - δ) \hat{s} (t - 1) \end{matrix}

(14)

\begin{matrix} ν^{s} (t) & = δ ν^{s} (t) + (1 - δ) ν^{s} (t - 1) \end{matrix}

(15)

\begin{matrix} \hat{x} (t) & = δ \hat{x} (t) + (1 - δ) \hat{x} (t - 1) \end{matrix}

(16)

The steps of FsGAMP are listed in Algorithm 2. Notation t indicates the iteration number;

\hat{x}

and

ν^{x}

denote the mean and variance of

x

; the element-wise product and division are denoted ⊙ and ⊘. Function

g_{out} (\cdot)

and

g_{in} (\cdot)

calculate the Bayesian estimation of

z

and

x

, respectively. Please refer to [9] (Table 1) and the Appendix to find their specific forms. The prior

p_{x} (x_{S_{j}})

can take a Gaussian, Laplacian or spike-and-slab distribution.

Algorithm 2 Fix Support GAMP.
Input: $y; A; U$
Output: $\hat{x}; ν^{x}$
1:	$\hat{x} \leftarrow 0^{N \times 1}; ν^{x} \leftarrow 0^{N \times 1}; \hat{s} (t = 0) \leftarrow 0^{M \times 1}$
2:	${\hat{x}}_{U} (t = 1) \leftarrow 0^{\| U \| \times 1}; ν^{x_{U}} (t = 1) \leftarrow 0^{\| U \| \times 1}$
3:	while $s t o p =$ false do
4:	$ν^{p} (t) \leftarrow {\| A_{:, U} \|}^{2} ν^{x_{U}} (t)$
5:	$\hat{z} (t) \leftarrow A_{:, U} {\hat{x}}_{U} (t)$
6:	$\hat{p} (t) \leftarrow \hat{z} (t) - \hat{s} (t - 1) ⊙ ν^{p} (t)$
7:	$[{\hat{z}}_{0} (t), ν^{z_{0}} (t)] \leftarrow g_{out} (\hat{p} (t), ν^{p} (t))$
8:	$\hat{s} (t) \leftarrow (1 ⊘ ν^{p} (t)) ⊙ ({\hat{z}}_{0} (t) - \hat{p} (t))$
9:	$ν^{s} (t) \leftarrow (1 ⊘ ν^{p} (t)) ⊙ (1 - ν^{z_{0} (t)} ⊘ ν^{p} (t))$
10:	$ν^{r_{U}} (t) \leftarrow 1 ⊘ ({({\| A_{:, U} \|}^{2})}^{T} ν^{s} (t))$
11:	${\hat{r}}_{U} (t) \leftarrow {\hat{x}}_{U} (t) + ν^{r_{U}} (t) ⊙ (A_{:, U}^{T} \hat{s} (t))$
12:	$[{\hat{x}}_{U} (t + 1), ν^{x_{U}} (t + 1)] \leftarrow g_{in} ({\hat{r}}_{U} (t), ν^{r_{U}} (t))$
13:	{damping steps}
14:	if $∥ {\hat{x}}_{U} (t + 1) - {\hat{x}}_{U} {(t) ∥}_{2} / {∥ {\hat{x}}_{U} (t) ∥}_{2} \leq δ$ then
15:	$\hat{x} (U) \leftarrow {\hat{x}}_{U} (t + 1), ν^{x} (U) \leftarrow ν^{x_{U}} (t + 1)$
16:	$s t o p \leftarrow$ TRUE
17:	end if
18:	end while

GAMP calculates all elements of

x

in the loop, while FsGAMP estimates at most

| U |

non-zero elements of

x

in the loop. Since

| U |

increases gradually and

| U | \leq K

, the memory footprint rate of FsGAMP is not more than

K / N

. Because

K ≪ N

, the memory saving is remarkable.

2.3. Computational Complexity Discussion

The computational complexity of the GAMP algorithm is dominated by four matrix-vector multipliers per iteration [9]. The multipliers of FsGAMP coincide with GAMP: Lines 4, 6, 10 and 11. At each multiplication, the FsGAMP algorithm only computes

M \times | S |

multipliers, while the GAMP algorithm needs

M \times N

multipliers.

RrMpGAMP has two nested loops; the outer loop is RrMP; the inner loops are three FsGAMPs. Because the replacement/dropout operations in the RrMP algorithm are random, we cannot exactly calculate how many new indexes are chosen per iteration. However, if we roughly estimate the number of winners in these candidates to

s / 2

per iteration (usually, this number is greater than

s / 2

with the step in observation), the outer loop has

J = ⌈ K / (s / 2) ⌉ + 2 s

iterations. Furthermore,

2 s

can be removed, since it is a small quantity compared to

⌈ K / (s / 2) ⌉

. Because the number of iterations of GAMP (as well as FsGAMP) is also uncertain, we simply use the maximal iteration number B as the worst setting. Firstly, taking no account of M and B, we have:

\begin{matrix} (2 s + \frac{s}{2}) + (2 (\frac{s}{2} + s) + \frac{s}{2}) + (2 (\frac{2 s}{2} + s) + \frac{3 s}{2}) + \dots + (2 (\frac{(J - 1) s}{2} + s) + \frac{J s}{2}) \\ = 2 [s + \frac{s}{2} + s + \frac{2 s}{2} + s + \dots + \frac{(J - 1) s}{2} + s] + (\frac{s}{2} + \frac{2 s}{2} + \frac{3 s}{2} + \dots + \frac{J s}{2}) \end{matrix}

(17)

where the first square brackets correspond to Lines 4–5 in Algorithm 1, and the second square brackets correspond to Line 9 in Algorithm 1. After summation and simplification, we have:

E q u a t i o n (17) = \frac{J s (3 J + 7)}{4}

(18)

Secondly, considering M, B, four matrix-vector multiplies per iteration, and expanding J, we have:

E q u a t i o n (18) \Rightarrow O (4 M B J^{2} s) = O (16 M K B)

(19)

For GAMP, the computational complexity can be estimated as

O (4 M N B)

, If

4 K < N

, the computational complexity of RrMpGAMP is less than GAMP.

Using the matching pursuit-based methods to solve a M-row n-column linear system, the computing time mainly depends on the least squares step, especially when the number of equations M is large and the number of variables n is not small. For example, the Householder-LS method needs

O (2 n^{2} (M - n / 3))

flops ([16], Algorithm 5.3.2). Because OMP finds one column index at a time and n increases from one to K, the computational complexity of OMP using Householder-LS is:

\begin{matrix} \sum_{n = 1}^{K} O (2 n^{2} (M - n / 3)) & = O (\sum_{n = 1}^{K} 2 M n^{2} - \sum_{n = 1}^{K} \frac{2}{3} n^{3}) \\ = O (2 M \frac{K (K + 1) (2 K + 1)}{6} - \frac{2}{3} {(\frac{K (K + 1)}{2})}^{2}) \\ (20) & = O (M K^{3}) \end{matrix}

since

K ≪ M

. We compare two computational complexity results. For RrMpGAMP, it is

O (16 M B K)

; for OMP using Householder-LS, it is

O (M K^{3})

. If

16 B < K^{2}

, the computational complexity of RrMpGAMP is less than Householder-LS OMP.

3. Convergence Discussion

We use the replica method [17,18] to asymptotically describe the evaluation of the logarithm of the partition function of the posterior distribution Equation (4). We concentrate on the Sum-Product FsGAMP algorithm in this discussion. The main advantage is that replica computations give a statistical physical meaning to the linear system. The replica method was used for compressed sensing in recent years [19,20,21,22]; we follow the research of [23].

We use the zero-mean Gaussian matrix assumption in the derivation of FsGAMP and replica analysis, because FsGAMP just works on the RrMP pursued support set, and the cardinality of it is much smaller than the cardinality of a sparse vector, i.e.,

| S | ≪ N

. We know that the brief propagation on the factor graph tends to diverge if there are many cycles on it [14]. When the number of cycles decreases, the brief propagation tends to converge. The factor graph form of compressed sensing is a bipart graph, and there are too many cycles on it. However, if we restrict the connections to the pursued variable nodes and sequentially construct the connections by some kind of pursuit strategy, such as RrMP pursuit, the number of cycles increases from zero to a relatively small integer, compared to the huge number of bipart connection cycles. These relatively small cycles and sequential pursuit method can be seen as a regularization to the non-zero-mean reality. This is our intuition.

3.1. The Replica Method Analysis of FsGAMP

The zero-mean Gaussian measurement noise is assumed an equivalent variance Δ for every entry of

ϵ

, and generally,

Δ \neq Δ_{0}

, where

Δ_{0}

is the true noise variance. This assumption means that the noise variance (inside

ν^{z_{0}} (t)

term), which is estimated by the FsGAMP algorithm, usually does not equal the true noise variance. The influence of their difference is shown in Equation (74). We rewrite the definition of the CS problem as follows:

y_{μ} = \sum_{i = 1}^{N} A_{μ i} s_{i} + ϵ_{μ} μ = 1, \dots, M

(21)

where

s

is the original signal, and we use the notation s to replace notation x in Equation (1) to avoid ambiguity in the following derivation and use μ to replace m in accordance with the replica computations’ convention. We denote

α = M / N

with the number of measurements per variable. In the asymptotic analysis, we are interested in the case of large system limitation

N \to \infty

, while keeping signal density

p_{x^{0}} (x^{0})

and measurement rate α of order one, where we let

x^{0} = s

. It is also necessary to assume that the components of signal

s

and measurement

y

are also order one, such that we can consider the elements of sensing matrix

A

to be zero-mean. It is worth noting that the variance of the order of

A

’s elements is

O (1 / K)

, not

O (1 / N)

, because we have pursued the indexes of non-zero elements. This makes it not necessary for the FsGAMP algorithm to summarize N entries in one row of

A

. In this way, the variance range of

A

’s elements enlarges.

Before the start of the computation, we briefly review the link between a Hamiltonian and linear system estimation problems (we know the form of compressed sensing is a linear system). The posterior distribution of a linear system in the Boltzmann form is:

\begin{matrix} p (x | y, θ) & = \frac{1}{Z (θ)} e^{- E (x | y, θ)} \\ (22) & \Rightarrow & E (x | y, θ) = - log (p (x | θ)) - log (p (y | x, θ)) \end{matrix}

where

p (x | θ)

is the prior,

θ

are the model parameters, such as:

p (x | θ) = \prod_{i = 1}^{N} [(1 - ρ) δ (x_{i}) + ρ N (x_{i} | \tilde{θ})]

(23)

Additionally,

p (y | x, θ)

is likelihood:

p (y | x, θ) = \prod_{μ = 1}^{M} N (y_{μ} | {(Ax)}_{μ}, θ)

(24)

Then, we get the Hamiltonian:

E (x | y, θ) = - \sum_{i = 1}^{N} log (p_{i} (x_{i})) + \frac{1}{Δ} \sum_{μ = 1}^{M} {(y_{μ} - {(Ax)}_{μ})}^{2} + \frac{M}{2} log (2 π Δ)

(25)

Now, we derive the potential of Equation (21) under the pursued

| S |

column indexes’ condition, as shown in Equation (4). The meaning of “potential” in inference theory equals the meaning of “Bethe free entropy” in statistical physics. For the simplification of notation, we ignore the uppercase letter S and just use the subscript, such as i, to indicate

S_{i}

in Equation (4). The aim is to sample a vector

x

from the probability measure:

\begin{matrix} p_{x | y} (x | y) \\ = & \frac{1}{Z} \prod_{i = 1}^{K} p_{x_{i}} (x_{i}) \prod_{μ = 1}^{M} \frac{1}{\sqrt{2 π Δ}} e^{- \frac{{[y_{μ} - \sum_{i = 1}^{K} A_{μ i} x_{i}]}^{2}}{2 Δ}} \\ = & \frac{1}{Z} \prod_{i = 1}^{K} p_{x_{i}} (x_{)} \prod_{μ = 1}^{M} \frac{1}{\sqrt{2 π Δ}} e^{- \frac{{[\sum_{i = 1}^{K} A_{μ i} (x_{i} - s_{i}) + ϵ_{μ}]}^{2}}{2 Δ}} \end{matrix}

Equation (26) can be seen as the Boltzmann measure on a disordered system with Hamiltonian:

H (x) = - \sum_{i = 1}^{K} log [p_{x_{i}} (x_{i})] + \frac{{[\sum_{i = 1}^{K} A_{μ i} (x_{i} - s_{i}) + ϵ_{μ}]}^{2}}{2 Δ}

(27)

where the partition function Z is the normalization constant of the full posterior distribution Equation (26) and

x

is the configuration of the Hamiltonian. The thermodynamic properties of the disordered system are characterized by the average free entropy

E_{A, s, ϵ} (log Z)

, where:

Z (A, s, ϵ) = \int [\prod_{i = 1}^{K} d x_{i} \prod_{i = 1}^{K} p_{x_{i}} (x_{i})] \prod_{μ = 1}^{M} \frac{1}{\sqrt{2 π Δ}} e^{- \frac{{[\sum_{i = 1}^{K} A_{μ i} (x_{i} - s_{i}) + ϵ_{μ}]}^{2}}{2 Δ}}

(28)

Because the expectation of the logarithm function is a hard problem, the average free entropy (also called the Bethe free entropy) can be evaluated via the replica trick as:

Φ \equiv lim_{K \to \infty} \frac{1}{K} E_{A, s, ϵ} {log Z} = lim_{K \to \infty} \frac{1}{K} lim_{n \to 0} \frac{E_{A, s, ϵ} {Z^{n}} - 1}{n}

(29)

where n independent replicas are introduced to reform the original disordered system to a new system (i.e., the name of the method) and

E_{A, s, ϵ}

is the average over all of the sources of disorder in Equation (21). Each configuration of the new system is indexed by an n-tuple

i = (i_{1}, \dots, i_{n})

, where each

i_{a}, a = 1, \dots, n

has a continuous non-zero state value because

i_{a}

corresponds to a non-zero element in

x

, which has been found by the matching pursuit process of the RrMP algorithm.

Now, the problem of computing the free energy is converted into computing the n-th moment of the partition function Z.

{[Z (A, s, ϵ)]}^{n} = \frac{1}{{(2 π Δ)}^{\frac{M n}{2}}} \int [\prod_{i, a} d x_{i}^{a} \prod_{i, a} p_{x_{i}^{a}} (x_{i}^{a})] \prod_{μ} e^{- \frac{\sum_{a = 1}^{n} {[\sum_{i = 1}^{K} A_{μ i} s_{i} + ϵ_{μ} - \sum_{i = 1}^{K} A_{μ i} x_{i}^{a}]}^{2}}{2 Δ}}

(30)

The average replicated partition function can be rearranged as:

E_{A, s, ϵ} {Z^{n}} = \frac{1}{{(2 π Δ)}^{\frac{M n}{2}}} \int [\prod_{i, a} d x_{i}^{a} \prod_{i, a} p_{x_{i}^{a}} (x_{i}^{a})] \prod_{μ} E_{A, s, ϵ} \{e^{- \frac{\sum_{a = 1}^{n} {[\sum_{i = 1}^{K} A_{μ i} s_{i} + ϵ_{μ} - \sum_{i = 1}^{K} A_{μ i} x_{i}^{a}]}^{2}}{2 Δ}}\}

(31)

where

a, b, \dots, n

denote the replica indexes. Define:

X_{μ} = E_{A, ϵ} \{e^{- \frac{\sum_{a = 1}^{n} {[\sum_{i = 1}^{K} A_{μ i} s_{i} + ϵ_{μ} - \sum_{i = 1}^{K} A_{μ i} x_{i}^{a}]}^{2}}{2 Δ}}\}

(32)

Equation (31) can be rewritten to:

E_{A, s, ϵ} {Z^{n}} = \frac{1}{{(2 π Δ)}^{\frac{M n}{2}}} E_{s} \{\int [\prod_{i, a} d x_{i}^{a} \prod_{i, a} p_{x_{i}^{a}} (x_{i}^{a})] \prod_{μ = 1}^{M} X_{μ}\}

(33)

In order to compute

X_{μ}

, firstly, we should define a variable:

v_{μ}^{a} = \sum_{i = 1}^{K} A_{μ i} (x_{i}^{0} - x_{i}^{a}) + ϵ_{μ}, a = 1, \dots, n

(34)

where

x_{i}^{0} ≜ s_{i}

, to help the following derivation. By applying the central limit theorem to

v_{μ}^{a}

, since it is a sum of i.i.d. Gaussian terms of the sensing matrix

A

at a fixed signal

s

and configuration

x

, we conclude that

v_{μ}^{a}

obeys a joint Gaussian distribution. When the matrix

A

has i.i.d. elements with zero-mean and variance

1 / K

, we introduce the order parameters to summarize an amount of microscopic states of the replicas to four macroscopic states as follows:

\begin{matrix} u & = \frac{1}{K} \sum_{i = 1}^{K} {(s_{i})}^{2} \end{matrix}

(35)

\begin{matrix} m^{a} & = \frac{1}{K} \sum_{i = 1}^{K} x_{i}^{a} s_{i}, a = 1, \dots, n \end{matrix}

(36)

\begin{matrix} Q^{a} & = \frac{1}{K} \sum_{i = 1}^{K} {(x_{i}^{a})}^{2}, a = 1, \dots, n \end{matrix}

(37)

\begin{matrix} q^{a b} & = \frac{1}{K} \sum_{i = 1}^{K} x_{i}^{a} x_{i}^{b}, a < b, b = 2, \dots, n \end{matrix}

(38)

These parameters constitute a matrix, called the overlap matrix:

Q = (\begin{matrix} u & m^{1} & m^{2} & \dots & m^{n} \\ m^{1} & Q^{1} & q^{12} & \dots & q^{1 n} \\ m^{2} & q^{12} & Q^{2} & ⋱ & q^{2 n} \\ ⋮ & ⋮ & ⋱ & ⋱ & q^{(n - 1) n} \\ m^{n} & q^{1 n} & \dots & q^{(n - 1) n} & Q^{n} \end{matrix})

(39)

Furthermore, according to the fact that both

A

and

ϵ

are zero-mean, we conclude the first two moments of the joint Gaussian distribution:

\begin{matrix} E_{A, ϵ} {v_{μ}^{a}} & = 0 \end{matrix}

(40)

\begin{matrix} E_{A, ϵ} {{(v_{μ}^{a})}^{2}} & = E_{A, ϵ} \{\sum_{i} A_{μ i}^{2} {(x_{i}^{0} - x_{i}^{a})}^{2}\} + Δ_{0} \\ = \frac{1}{K} \sum_{i} {(x_{i}^{0} - x_{i}^{a})}^{2} + Δ_{0} \\ (41) & = Q^{a} - 2 m^{a} + < s^{2} > + Δ_{0} \end{matrix}

\begin{matrix} E_{A, ϵ} {v_{μ}^{a} v_{μ}^{b}} & = E_{A, ϵ} \{\sum_{i} A_{μ i}^{2} (x_{i}^{0} - x_{i}^{a}) (x_{i}^{0} - x_{i}^{b})\} + Δ_{0} \\ (42) & = q^{a b} - (m^{a} + m^{b}) + < s^{2} > + Δ_{0} \end{matrix}

where

< s >^{2} = \int s^{2} p_{s} (s) d s

. The dependence on the measurement index μ of

v_{μ}

and

X_{μ}

is canceled due to the averaging; therefore, we note

v_{μ} = v

and

X_{μ} = X

for simplicity. In order to further calculate the expectation in Equation (33), we apply the replica symmetric ansatz:

m^{a} = m, \forall a, Q^{a} = Q, \forall a, q^{a b} = q, \forall (a, b : a \neq b)

(43)

which is valid for the inference problem on a locally tree-like or highly dense factor graph under the prior matching condition. Fortunately, by the matching pursuit process in RrMP, we found the positions of non-zero elements in the signal, such that the condition has been satisfied. Based on this ansatz, Equation (39) equals:

Q = (\begin{matrix} u & m & m & \dots & m \\ m & Q & q & \dots & q \\ m & q & Q & ⋱ & q \\ ⋮ & ⋮ & ⋱ & ⋱ & q \\ m & q & \dots & q & Q \end{matrix})

(44)

We want to compute:

X = E_{v} \{e^{- (1 / 2 Δ) \sum_{a = 1}^{n} {(v^{a})}^{2}}\}

(45)

with a probability distribution:

p (v) = \frac{1}{\sqrt{{(2 π)}^{n} \det (G)}} e^{- (1 / 2) \sum_{a, b} v^{a} {(G^{- 1})}_{a b} v^{b}}

(46)

where the covariance matrix

G

of

{v^{a}}

under the replica symmetric ansatz is given by:

\begin{matrix} G_{a a} & = E_{v} {v^{a} v^{a}} = Q - 2 m + < s^{2} > + Δ_{0} \end{matrix}

(47)

\begin{matrix} G_{a b} & = E_{v} {v^{a} v^{b}} = q - 2 m + < s^{2} > + Δ_{0} \end{matrix}

(48)

such that:

G = (< s^{2} > - 2 m + q + Δ_{0}) 1_{n} + (Q - q) I_{n}

(49)

where

1_{n}

is a

n \times n

matrix with elements all equivalent to one and

I_{n}

is an unitary diagonal matrix.

Then, Equation (45) equals:

\begin{matrix} X & = \frac{1}{\sqrt{{(2 π)}^{n} \det (G)}} \int (d v) e^{- \frac{1}{2} v^{T} (G^{- 1} + I_{n} / Δ) v} \\ (50) & = \frac{1}{\sqrt{\det (I_{n} + G / Δ)}} \end{matrix}

The eigenvectors of

G

can be divided into two categories: one eigenvector of the form

(1, 1, \dots, 1)

with associated eigenvalue

Q - q + n (q - 2 m + < s^{2} > + Δ_{0})

, and

n - 1

eigenvectors of the form

(0, \dots, 0, - 1, 1, 0, \dots, 0)

with eigenvalues

Q - q

, where the couple

(- 1, 1)

shifts one by one. Now, we have:

\det (I_{n} + \frac{G}{Δ}) = \frac{1 + \frac{1}{Δ} (Q - q + n (< s^{2} > - 2 m + q + Δ_{0}))}{{(1 + \frac{Q - q}{Δ})}^{1 - n}}

(51)

such that:

lim_{n \to 0} X = e^{- \frac{n}{2} [\frac{< s^{2} > - 2 m + q + Δ_{0}}{Q - q + Δ} + log (\frac{Q - q + Δ}{Δ})]}

(52)

Now, back to Equation (33), we need to guarantee that the order parameters

m, q, Q

coincide with their definition Equation (35), and this guarantee can be obtained by Dirac functions:

\begin{matrix} δ (\sum_{i = 1}^{K} x_{i}^{a} s_{i} - K m^{a}), & a = 1, \dots, n \end{matrix}

(53)

\begin{matrix} δ (\sum_{i = 1}^{K} {(x_{i}^{a})}^{2} - K Q^{a}), & a = 1, \dots, n \end{matrix}

(54)

\begin{matrix} δ (\sum_{i = 1}^{K} x_{i}^{a} x_{i}^{b} - K q^{a b}), & a < b, b = 2, \dots, n \end{matrix}

(55)

Since Dirac functions in the frequency field are the Fourier transform of const

1 / 2 π

in the time field, we have:

\begin{matrix} δ (\sum_{i = 1}^{K} x_{i}^{a} s_{i} - K m^{a}) & = \int d {\tilde{m}}^{a} \frac{1}{2 π} e^{j {\tilde{m}}^{a} [K m^{a} - \sum_{i = 1}^{K} x_{i}^{a} s_{i}]} \end{matrix}

(56)

\begin{matrix} δ (\sum_{i = 1}^{K} {(x_{i}^{a})}^{2} - K Q^{a}) & = \int d {\tilde{Q}}^{a} \frac{1}{2 π} e^{j {\tilde{Q}}^{a} [K Q^{a} - \sum_{i = 1}^{K} {(x_{i})}^{2}]} \end{matrix}

(57)

\begin{matrix} δ (\sum_{i = 1}^{K} x_{i}^{a} x_{i}^{b} - K q^{a b}) & = \int d {\tilde{q}}^{a b} \frac{1}{2 π} e^{j {\tilde{q}}^{a b} [K q^{a b} - \sum_{i = 1}^{K} x_{i}^{a} x_{i}^{b}]} \end{matrix}

(58)

where

j = \sqrt{- 1}

. It is worth noting that

{\tilde{m}}^{a}

,

{\tilde{Q}}^{a}

and

{\tilde{q}}^{a b}

also satisfy the replica symmetric ansatz. Rewriting const one in the time field as the inverse Fourier transform of Dirac function

2 π δ (\cdot)

in the frequency field and substituting the right-hand side of Equations (56)–(58), we obtain:

\begin{matrix} 1 & = \int d m^{a} [\int d {\tilde{m}}^{a} e^{- {\tilde{m}}^{a} [K m^{a} - \sum_{i = 1}^{K} x_{i}^{a} s_{i}]}] \end{matrix}

(59)

\begin{matrix} 1 & = \int d Q^{a} [\int d {\tilde{Q}}^{a} e^{- {\tilde{Q}}^{a} [\frac{K}{2} Q^{a} - \frac{K}{2} \sum_{i = 1}^{K} {(x_{i}^{a})}^{2}]}] \end{matrix}

(60)

\begin{matrix} 1 & = \int d q^{a b} [\int d {\tilde{q}}^{a b} e^{- {\tilde{q}}^{a} [K q^{a b} - \sum_{i = 1}^{K} x_{i}^{a} x_{i}^{b}]}] \end{matrix}

(61)

and consider n replicas:

\begin{matrix} 1 & = \int [\prod_{a}^{n} d {\tilde{Q}}^{a} d Q^{a} d {\tilde{m}}^{a} d m^{a}] [\int \prod_{b, a < b}^{n, (n - 1) / 2} d {\tilde{q}}^{a b} d q^{a b}] \\ (62) & \times exp \{\sum_{a}^{n} {\tilde{m}}^{a} (K m^{a} - \sum_{i}^{K} x_{i}^{a} x^{0}) + \sum_{a}^{n} {\tilde{Q}}^{a} (\frac{K}{2} Q^{a} - \frac{1}{2} \sum_{i}^{K} {(x_{i}^{a})}^{2}) - \sum_{b, a \neq b}^{n, (n - 1)} {\tilde{q}}^{a b} (K q^{a b} - \sum_{i}^{K} x_{i}^{a} x_{i}^{b})\} \end{matrix}

Plugging Equation (62) into Equation (33), we obtain:

\begin{matrix} E_{A, s, ϵ} {Z^{n}} \\ = \frac{1}{{(2 π Δ)}^{\frac{M n}{2}}} \int [\prod_{a}^{n} d {\tilde{Q}}^{a} d Q^{a} d {\tilde{m}}^{a} d m^{a}] \\ [\int \prod_{b, a < b}^{n, (n - 1) / 2} d {\tilde{q}}^{a b} d q^{a b} e^{K [\frac{1}{2} \sum_{a} {\tilde{Q}}^{a} Q^{a} - \frac{1}{2} \sum_{b, a \neq b}^{n, (n - 1)} {\tilde{q}}^{a b} q^{a b} - \sum_{a} {\tilde{m}}^{a} m^{a}]} \prod_{μ}^{M} X] \\ (63) & \times {\{\int d x^{0} p_{x^{0}} (x^{0}) \prod_{a}^{n} d x^{a} p_{x^{a}} (x^{a}) e^{- \frac{1}{2} \sum_{a}^{n} {\tilde{Q}}^{a} {(x^{a})}^{2} + \frac{1}{2} \sum_{b, a \neq b}^{n, (n - 1)} {\tilde{q}}^{a b} x^{a} x^{b} + \sum_{a}^{n} {\tilde{m}}^{a} x^{a} x^{0}}\}}^{K} \end{matrix}

We denote Γ as the integration in the

{\cdot}^{K}

. The exponential part of Γ has many cross terms of n replicas, such that we should decouple them by linearizing the exponent. According to the Hubbard–Stratonovich transform:

e^{\frac{y}{2}} = \frac{1}{\sqrt{2 π}} \int_{R} e^{\pm z \sqrt{y} - \frac{z^{2}}{2}} d z ≜ \int_{R} e^{\pm z \sqrt{y}} D z

(64)

with the Gaussian measure:

D z = \frac{1}{\sqrt{2 π}} e^{- \frac{z^{2}}{2}} d z

(65)

and the square completion:

{(\sum_{a = 1}^{n} x^{a})}^{2} = 2 \sum_{b = a + 1}^{n} x^{a} x^{b} + \sum_{a = 1}^{n} {(x^{a})}^{2}

(66)

we have:

\begin{matrix} e^{\frac{\tilde{q} \sum_{b, a \neq b}^{n, (n - 1)} x^{a} x^{b}}{2}} & = \int e^{z \sqrt{\tilde{q}} (\sum_{a}^{n} x^{a}) - \frac{\tilde{q}}{2} \sum_{a}^{n} {(x^{a})}^{2}} D z \\ (67) & = \int e^{z \sqrt{\tilde{q}} (\sum_{a}^{n} x^{a})} e^{- \frac{\tilde{q}}{2} \sum_{a}^{n} {(x^{a})}^{2}} D z \end{matrix}

Using the replica symmetric ansatz again, we obtain:

Γ = \int d x^{0} p_{x^{0}} (x^{0}) \int D z {(\int d x p_{x} (x) e^{- \frac{1}{2} (\tilde{Q} + \tilde{q}) x^{2} + \tilde{m} x (x^{0}) + z \sqrt{\tilde{q}} x})}^{n}

(68)

Define

f (z, x^{0}) ≜ \int d x p_{x} (x) e^{- \frac{1}{2} (\tilde{Q} + \tilde{q}) x^{2} + \tilde{m} x (x^{0}) + z \sqrt{\tilde{q}} x}

; since

{lim}_{n \to 0} {(f (z, x^{0}))}^{n} = 1 + n log f (z, x^{0})

, we have:

\begin{matrix} lim_{n \to 0} \int {(f (z, x^{0}))}^{n} D z & = \int lim_{n \to 0} {(f (z, x^{0}))}^{n} D z \\ = 1 + n \int log f (z, x^{0}) D z \\ (69) & \approx e^{n \int log f (z, x^{0}) D z} \end{matrix}

such that:

\begin{matrix} lim_{n \to 0} Γ & \approx \int d x^{0} p_{x^{0}} (x^{0}) [1 + n \int log f (z, x^{0}) D z] \\ = \int p_{x^{0}} (x^{0}) d x^{0} + \int d x^{0} p_{x^{0}} (x^{0}) [n \int log f (z, x^{0}) D z] \\ = 1 + n \int [d x^{0} D z] p_{x^{0}} (x^{0}) log f (z, x^{0}) \\ (70) & \approx e^{n \int [d x^{0} D z] p_{x^{0}} (x^{0}) log f (z, x^{0})} \end{matrix}

Now, we combine Equations (52) and (70) with Equation (63) and use the replica symmetric ansatz again:

lim_{n \to 0} E_{A, s, ϵ} {Z^{n}} \approx \int d \tilde{Q} d Q d \tilde{m} d m d \tilde{q} d q e^{n K Φ (\tilde{Q}, Q, \tilde{m}, m, \tilde{q}, q)}

(71)

where:

\begin{matrix} Φ (\tilde{Q}, Q, \tilde{m}, m, \tilde{q}, q) \\ = \frac{1}{2} (\tilde{Q} Q - 2 \tilde{m} m + \tilde{q} q) \\ - \frac{1}{2} \frac{M}{K} (\frac{< s^{2} > - 2 m + q + Δ_{0}}{Q - q + Δ} + log (Q - q + Δ) - log Δ) \\ (72) & + \int [d x^{0} D z] p_{x^{0}} (x^{0}) log \int d x p_{x} (x) e^{- \frac{1}{2} (\tilde{Q} + \tilde{q}) x^{2} + \tilde{m} x (x^{0}) + z \sqrt{\tilde{q}} x} \end{matrix}

The integration Equation (71) is intractable, otherwise. We use the saddle point method to estimate this integration by taking the optimum of

Φ (\tilde{Q}, Q, \tilde{m}, m, \tilde{q}, q)

. It is worth noting that the replica trick needs

{lim}_{n \to 0} {lim}_{K \to \infty} (\cdot)

, but the saddle point estimation needs

{lim}_{K \to \infty} {lim}_{n \to 0} (\cdot)

; furthermore, Equation (71) is obtained under the condition

{lim}_{n \to 0} (\cdot)

. Therefore, we assume that the limits can be exchanged, and the saddle point estimation of the integration Equation (71) should be done before we really let

{lim}_{n \to 0} (\cdot)

. These assumptions are not rigorous, but verified in the inference problem. Using the saddle point method and making derivatives for m, Q, q, respectively, we get:

\begin{matrix} \frac{\partial Φ}{\partial m} & = 0 \Rightarrow \tilde{m} = \frac{M}{K} \frac{1}{Q - q + Δ} \end{matrix}

(73)

\begin{matrix} \frac{\partial Φ}{\partial Q} & = 0 \Rightarrow \tilde{Q} = \frac{M}{K} \frac{- < s^{2} > + 2 m - 2 q + Q + Δ - Δ_{0}}{{(Q - q + Δ)}^{2}} \end{matrix}

(74)

\begin{matrix} \frac{\partial Φ}{\partial q} & = 0 \Rightarrow \tilde{q} = \frac{M}{K} \frac{< s^{2} > - 2 m + q + Δ_{0}}{{(Q - q + Δ)}^{2}} \end{matrix}

(75)

and:

\tilde{m} = \tilde{Q} + \tilde{q}

(76)

Using the saddle point method and making derivatives for

\tilde{m}

,

\tilde{Q}

,

\tilde{q}

, respectively, and substituting Equation (76) for the derivatives, we get:

\begin{matrix} \frac{\partial Φ}{\partial \tilde{m}} & = 0 \Rightarrow m = \int d x^{0} [x^{0} p_{x^{0}} (x^{0}) \int g_{mean} (x^{0} + z \frac{\sqrt{\tilde{q}}}{\tilde{m}}, \frac{1}{\tilde{m}}) D z] \end{matrix}

(77)

\begin{matrix} \frac{\partial Φ}{\partial \tilde{q}} & = 0 \Rightarrow q = \int d x^{0} [p_{x^{0}} (x^{0}) \int {(g_{mean} (x^{0} + z \frac{\sqrt{\tilde{q}}}{\tilde{m}}, \frac{1}{\tilde{m}}))}^{2} D z] \end{matrix}

(78)

\begin{matrix} \frac{\partial Φ}{\partial \tilde{Q}} & = 0 \Rightarrow Q = \{\int d x^{0} [p_{x^{0}} (x^{0}) \int g_{var} (x^{0} + z \frac{\sqrt{\tilde{q}}}{\tilde{m}}, \frac{1}{\tilde{m}}) D z]\} + q \end{matrix}

(79)

where the input end probability distribution is defined as:

p_{x | r} (x | \hat{r}, ν^{r}) ≜ \frac{p_{x} (x) N (x; \hat{r}, ν^{r})}{\int p_{x} (x) N (x; \hat{r}, ν^{r}) d x}

(80)

Then, the mean function is given by:

g_{mean} (\hat{r}, ν^{r}) = \int x p_{x | r} (x | \hat{r}, ν^{r}) d x

(81)

and the variance function is given by:

g_{var} (\hat{r}, ν^{r}) = [\int x^{2} p_{x | r} (x | \hat{r}, ν^{r}) d x] - {(g_{mean} (\hat{r}, ν^{r}))}^{2}

(82)

In fact, Equations (81) and (82) correspond to the results of the

g_{in} (\cdot)

function in Algorithm 2.

3.2. The Prior Matching Conditions and Nishimori Conditions

The replica symmetric ansatz means that all of the replicas belong to the same state configuration, such that every replica has the same statistical properties. This state configuration is given by the macroscopic order parameters m, Q and q. Based on the replica symmetric ansatz, m is a correlation between the original sparse signal

s

and its estimation

x

about all of the replicas; then, we have:

m = E_{y} {s_{i} E_{x | y} {x_{i}}}

(83)

Similarly, Q is a self-correlation of these replicas; then, we have:

Q = E_{y} {E_{x | y} {x_{i}^{2}}} = E_{s} {s_{i}^{2}}

(84)

Another parameter q is the correlation between these replicas. As soon as the measurement and the original signal satisfy the correct reconstruction condition, such as the Restricted Isometry Property (RIP), the differences between these replicas are reflected in the measurement noise. Therefore, averaging the correlations with an infinitely large number of the replicas will eliminate fluctuations caused by the noise and retain the energy of the signals; then, we have:

q = E_{y} {E_{x | y} {x_{i}} E_{x | y} {x_{i}}} = E_{y} {s_{i} E_{x | y} {x_{i}}} = m

(85)

Now, we consider the average variance

V (t)

and the mean-squared error

E (t)

:

\begin{matrix} V (t) & ≜ \frac{1}{K} \sum_{i}^{K} g_{var} (\hat{r} (t), ν^{r} (t)) \end{matrix}

(86)

\begin{matrix} E (t) & ≜ \frac{1}{K} \sum_{i}^{K} {(g_{mean} (\hat{r} (t), ν^{r} (t)) - s_{i})}^{2} \end{matrix}

(87)

According to the derivation of belief propagation and Equation (86), the probability distribution of

x_{i}

at the

(t + 1)

-th iteration is:

p (x_{i}, t + 1) \approx \frac{1}{Z} p_{x_{i}} (x_{i}) e^{- \frac{M}{K} \frac{x_{i} - s_{i} - z \sqrt{E (t) + \frac{M}{K} Δ_{0}}}{2 (V (t) + Δ)}}

(88)

where Z is a normalization const. Then, the average variance

V (t + 1)

and the mean-squared error

E (t + 1)

are:

\begin{matrix} V (t + 1) & = \int d s D z p_{s} (s) g_{var} (s + z \sqrt{\frac{E (t) + Δ_{0}}{M / K}}, \frac{V (t) + Δ}{M / K}) \end{matrix}

(89)

\begin{matrix} E (t + 1) & = \int d s D z p_{s} (s) {[g_{mean} (s + z \sqrt{\frac{E (t) + Δ_{0}}{M / K}}, \frac{V (t) + Δ}{M / K}) - s]}^{2} \end{matrix}

(90)

Combining Equations (73), (75), (79), (89) and (90), and noticing that

s = x^{0}

, we get:

\begin{matrix} V (t + 1) & = Q - q \end{matrix}

(91)

\begin{matrix} E (t + 1) & = < s^{2} > - 2 m + q \end{matrix}

(92)

Furthermore, as soon as we assume:

Q = < s^{2} >,

(93)

and remember:

q = m,

(94)

and if the two conditions are satisfied at the fix point, then we have:

V (t + 1) = E (t + 1),

(95)

which means the average variance and the mean-squared error are equivalent. We call Equations (94) and (93) the prior matching conditions and Equations (95), (83), (84) and (85) the Nishimori conditions. When these conditions are satisfied, the convergence of the FsGAMP algorithm can be guaranteed.

4. Experiments

We compare multiple probe length RrMpGAMP, e.g., RrMpGAMP-L4, with five well-known algorithms: OMP, CoSaMP, basis pursuit (BP) with the interior point method to solve linear programming, AMP and GAMP, where the notation s in Section 2 is replaced with L to avoid confusion. We use MATLAB (Version R2014b)’s pinv function to implement the LS calculation in OMP and CoSaMP and modify CoSaMP by adding a new LS step before the prune step to improve the estimation accuracy. Notice that pinv is the matrix Moore–Penrose pseudoinverse based on an SVD decomposition. It is suitable for the case that the matrix has more rows than columns and is not of full rank; then, the overdetermined least squares problem does not have a unique solution. The implementation of OMP and CoSaMP comes from Needell’s work [24]; BP comes from the software package

ℓ_{1}

-MAGIC [25]; AMP comes from Kamilov’s work [26]; and GAMP comes from gamplab [27].

We do

Q = 60

trials in each experiment and use the relative mean square error (RMSE):

ξ = 1 / Q \times \sum_{l = 1}^{Q} ∥ {\hat{x}}^{l} - x^{l} ∥_{2} / {∥ x^{l} ∥}_{2}

as the performance metric. Set

M = 128, N = 256

in these experiments, except the last experiment. Define the sparsity-measurement ratio

ρ ≜ K / M

and choose K locations at random as the support of

x

. The amplitude of non-zero entry

x_{i}

is independently drawn from

N (x; 0, 1)

. The signal-to-noise ratios (SNR) are calculated by

S N R = 10 {log}_{10} {(∥ A x ∥}^{2} / (M ν))

.

4.1. Zero-Mean Gaussian Projection Matrix Cases

In the first two experiments, the elements of

A

are independently drawn from

N (a; 0, 1 / N)

, and the columns are normalized. The first experiment, termed the ν test, investigates the reconstruction performance versus noise power ν. Fixing

K = 20

, the range of ν is

[10^{- 6}, 10^{- 5}, 10^{- 4}, 5 \times 10^{- 4}, 10^{- 3}, 5 \times 10^{- 3}, 10^{- 2}]

. Figure 2a shows that four types

(L = 2 / 4 / 6 / 8)

of RrMpGAMP perform the same as GAMP, better than OMP and CoSaMP and much better than BP and AMP.

The second experiment, termed the sparsity test, investigates the reconstruction performance versus sparsity K. Fixing

S N R = 40

dB, the range of K is

[20, 25, 30, 35, 40, 45, 50]

. Figure 2b shows that four types

(L = 2 / 4 / 6 / 8)

of RrMpGAMP all perform the same as GAMP when

K \leq 45

or equally

ρ \leq 0.35

. Notice that CoSaMP degrades obviously when

K \geq 40

, and RrMpGAMP-L4 is the best one in the

(M = 128, N = 256)

setting, so we choose RrMpGAMP-L4 for the next three experiments.

4.2. More General Projection Matrix Cases

The third experiment investigates the reconstruction performance of various sparse projection matrices. Define the sparsity ratio of matrix

A

as η; set the range of η to

[0.1, 0.2, \dots, 0.9]

; and fix

K = 20

,

S N R = 20

dB. Figure 3a shows that RrMpGAMP-L4 performs the same as GAMP when

η > 0.2

and evidently better than the other four competitors.

For AMP and GAMP, although good performance can be achieved by zero-mean i.i.d. matrix

A

, it tends to drastically decline even for a small positive bias. The fourth experiment, termed the γ test, shows this phenomenon. The elements of

A

are independently drawn from a Gaussian distribution

N (a; γ / N, 1 / N)

, where the mean is controlled by a positive parameter γ. Set the range of γ to

[1, 1.6, 1.8, 2, 2.2, 2.4, 3.4, 3.6, 3.8, 4, 5, 10, 20, 40, 60, 80, 100]

, and fix

K = 20, S N R = 20

dB, Figure 3b shows that GAMP violently diverges at

γ = 2

; AMP diverges at

γ = 4

; but RrMpGAMP-L4 maintains good performance until

γ = 40

. Although OMP, CoSaMP and BP also work well, RrMpGAMP-L4 performs a little better than OMP and CoSaMP when

γ < 40

and obviously better than BP.

The fifth experiment, termed the α test, considers an even more troublesome setup for strong correlated

A

, and the elements of

A

are neither normal nor i.i.d. distributed. Fixing

K = 12

,

S N R = 30

dB, the projection matrix is constructed by

A = (1 / N) P Q

. The elements of

P

and

Q

i.i.d. obey Gaussian distributions

p_{m r} \sim N (p; 0, 1)

and

q_{r n} \sim N (q; 0, 1)

, where

P \in R^{M \times R}, Q \in R^{R \times N}

and

R ≜ ⌊ α N ⌋

. The variation of α changes the size of

P

and

Q

and changes the rank of matrix

A

. This means

A

is low rank for

α < M / N

. Set the range of α to

[0.2, 0.3, \dots, 1.0]

. Since GAMP totally diverges (

ξ \approx 10^{6}

) at every α, it is not visible in Figure 3c. This figure shows that RrMpGAMP-L4 keeps stable even in low rank scenario

α \approx 0.3

, obviously better than AMP.

4.3. TomoSAR Imaging Application

The last experiment shows four TomoSAR imaging results. TomoSAR imaging is a spatial scatterer distribution reconstruction problem [28]. SAR imaging algorithms can be categorized into two classes: finding a dense solution and finding a sparse solution. The former is based on Nyquist sampling theory and discrete Fourier transform, e.g., the Polar Format Algorithm (PFA) [29,30]; the latter is based on compressed sampling theory and a sparse-induced algorithm, such as OMP. Tomography in the SAR imaging is especially used in inferring forest structure from several acquisitions taken at different view angles. The images are not sparse in that case. However, there are some reasons to motivate a compressed sensing TomoSAR radar, e.g., a target in a wild dry lake bed or the scatters of a target are very sparse. We simply describe the TomoSAR imaging model. The two-dimensional spatial spectrum of the TomoSAR echo is given by:

Y (f, θ) = \int_{x} \int_{y} g (x, y) exp {- 2 j f (x cos θ + y sin θ)}

(96)

where

f = 2 π / λ

is the spatial frequency,

(x, y)

is the position of a scatterer in the target coordinate, θ is the rotation angle between the radar coordinate and the target coordinate and

g (x, y)

is the scattering coefficient. Using summation to replace integration, the discrete variables are: spatial frequency

f \in C^{P}

, where P is frequency sampling number; rotation angle

θ \in C^{Q}

, where Q is the angle sampling number; scattering coefficient vector

x \in C^{N}

, which is composed of

g (x, y)

; where

N \geq P \times Q

is the pixel number of a TomoSAR image; received echo signal

y \in C^{M}

; and complex Gaussian noise

ϵ \in C^{M}

, where

M = P \times Q

. The linear system form of Equation (96) is the same as Equation (1). Define

F (p, q, n) ≜ exp {- 2 j f_{p} (x_{n} cos θ_{q} + y_{n} sin θ_{q})}

, and then:

A = (\begin{matrix} F (1, 1, 1) & F (1, 1, 2) & \dots & F (1, 1, N) \\ \dots & \dots & \dots & \dots \\ F (P, 1, 1) & F (P, 1, 2) & \dots & F (P, 1, N) \\ \dots & \dots & \dots & \dots \\ F (1, Q, 1) & F (1, Q, 2) & \dots & F (1, Q, N) \\ \dots & \dots & \dots & \dots \\ F (P, Q, 1) & F (P, Q, 2) & \dots & F (P, Q, N) \end{matrix})

Because the scatterers are very sparse, it is feasible to randomly draw some rows from

A

to decrease P and Q, and to transform the TomoSAR imaging into a CS problem.

In this experiment, the TomoSAR echo data comes from a real-world crawler crane.

P = 101

frequency samples are spaced drawn from the

1 (GHz)

bandwidth centered at carrier frequency

f_{c} = 9 (GHz)

;

Q = 101

angle samples are spaced drawn from

87 . 5^{\circ} \sim 92 . 5^{\circ}

centered at

90^{\circ}

azimuth. The total pixels number is

N = P \times Q

. We draw

M = ⌊ 0.5 N ⌋

rows from

A

randomly. Since the measurements number M is usually chosen to be

O (K log N)

, K can be estimated roughly as

τ M / log N

; we set

K = 50

. Note that the elements of

A

and

y

are complex, so the prior of non-zero elements in

x

should be complex, as well.

Use the PFA imaging result in Figure 4a as the reference, and compare OMP, CoSaMP and GAMP to RrMpGAMP-L6. Figure 4b shows that CoSaMP and GAMP cannot recover the TomoSAR image; OMP and RrMpGAMP-L6 work well, but RrMpGAMP-L6 performs a little better than OMP; see the reconstructed bottom margin. RrMpGAMP-L6 recovers a more continuous segment than OMP.

5. Conclusions

While the AMP/GAMP algorithm has been shown to be a very good approach for sparse signal recovery, it is also sensitive to problems that deviate from its assumption. In this paper, we propose the stable RrMpGAMP algorithm, which matches GAMP’s accuracy, performs better than AMP and GAMP, and remains robust to various projection matrices, with a small memory footprint. We use the replica method to analyze the FsGAMP algorithm, which is embedded in the RrMpGAMP algorithm, and find that enlarging the variance range of the elements of the sensing matrix does not break the convergence property of FsGAMP. We also obtain the convergence conditions of FsGAMP. Experiments confirm that our proposed algorithm performs very well in simulation and practical problems for which AMP and GAMP cannot be applied. In all cases, RrMpGAMP provides better performance as compared to BP, OMP and CoSaMP. Exact analysis of the RrMP algorithm remains an open problem for future work.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grants 61401501 and 61401069.

Author Contributions

Qun Wan conceived of the TomoSAR imaging experiment and provided the original data. Xunchao Cong and Yongjie Luo performed this experiment. Guan Gui designed the first two experiments and analyzed the data with Yongjie Luo. Qun Wan suggested the idea of combine RrMP and GAMP algorithms. Yongjie Luo raised the idea of RrMP. Yongjie Luo and Guan Gui raised the idea of FsGAMP, and analyzed the computational complexity; Yongjie Luo completed the replica computation, and designed the three non-zero mean sensing matrix experiments. Yongjie Luo and Guan Gui wrote the paper, Xunchao Cong and Qun Wan checked the manuscript and contributed to the rearrangement of the materials. All of the authors read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix

We derive Sum-Product FsGAMP briefly. The message from the factor node to the variable node has been defined as Equation (11), and the message from the variable node to the factor node has been defined as Equation (12). For the simplicity of description, we rewrite the symbols as follows:

\begin{matrix} Δ_{m \to k} (t, x_{k}) & ≜ log [\int_{{x_{j}}_{j \neq k}} p_{y_{m} | z_{m}} (y_{m} | a_{m k} x_{k} + \sum_{j \neq k}^{| U |} a_{m j} x_{j}) \times \prod_{j \neq k}^{| U |} exp (Δ_{m \leftarrow j} (t, x_{j}))] + c \end{matrix}

(A1)

\begin{matrix} Δ_{m \leftarrow k} (t + 1, x_{k}) & ≜ log p_{x_{k}} (x_{k}) + \sum_{i \neq m}^{M} Δ_{i \to k} (t, x_{k}) + c \end{matrix}

(A2)

where

m : = f_{m}

,

k : = x_{U_{k}}

,

x_{k} : = x_{U_{k}}

,

x_{j} : = x_{U_{j}}

,

a_{m k} : = a_{m U_{k}}

,

a_{m j} : = a_{m U_{j}}

,

j : = x_{U_{j}}

,

p_{x_{k}} : = p_{x_{U_{k}}}

,

i : = f_{i}

,

T : = | U |

. Furthermore, we omit the superscripts

| U |

and M in the follow up of the derivation.

Appendix A1. Message from the Factor Node to the Variable Node

Define:

z_{m} ≜ a_{m k} x_{k} + \sum_{j \neq k} a_{m j} x_{j}

(A3)

For the GAMP algorithm, the Central-Limit Theorem (CLT) makes

z_{m}

conditionally Gaussian for large N, i.e.,

z_{m} | x_{k} \sim N (z_{m}; a_{m k} x_{k} + {\hat{p}}_{m k} (t), ν_{m k}^{p} (t))

(A4)

where:

\begin{matrix} {\hat{p}}_{m k} (t) & ≜ \sum_{j \neq k} a_{m j} {\hat{x}}_{m j} (t) \end{matrix}

(A5)

\begin{matrix} ν_{m k}^{p} (t) & ≜ \sum_{j \neq k} {| a_{m j} |}^{2} ν_{m j}^{x} (t) \end{matrix}

(A6)

In contrast to GAMP, FsGAMP not only works for a large N, but also works for a small N, because we embed FsGAMP into the RrMP algorithm, and the RrMP’s pursuit is sequential. Although CLT is not be satisfied, the RrMP’s pursuit means that all of the amplitudes that are supported on the pursued indexes are non-zero. As long as we assume that these amplitudes obey a Gaussian distribution independently, the linear weighted sum of these amplitudes also obeys a Gaussian distribution. Therefore, Equations (A4)–(A6) are still available. Now, we have:

Δ_{m \to k} (t, x_{k}) \approx log [\int_{z_{m}} p_{y_{m} | z_{m}} (y_{m} | z_{m}) N (z_{m}; a_{m k} x_{k} + {\hat{p}}_{m k} (t), ν_{m k}^{p} (t))] + c

(A7)

and we define:

H (a_{m k} x_{k} + {\hat{p}}_{m k} (t), y_{m}, ν_{m k}^{p} (t)) ≜ log [\int_{z_{m}} p_{y_{m} | z_{m}} (y_{m} | z_{m}) N (z_{m}; a_{m k} x_{k} + {\hat{p}}_{m k} (t), ν_{m k}^{p} (t))]

(A8)

In order to eliminate the dependence on the subscript m of the

H (\cdot, \cdot, \cdot)

function, define:

\begin{matrix} {\hat{p}}_{m} (t) & ≜ \sum_{j} a_{m j} {\hat{x}}_{m j} (t) \end{matrix}

(A9)

\begin{matrix} ν_{m}^{p} (t) & ≜ \sum_{j} {| a_{m j} |}^{2} ν_{m j}^{x} (t) \end{matrix}

(A10)

and then plug them into Equation (A7):

\begin{matrix} Δ_{m \to k} (t, x_{k}) & \approx H (a_{m k} (x_{k} - {\hat{x}}_{m k} (t)) + {\hat{p}}_{m} (t), y_{m}, ν_{m k}^{p} (t)) + c \end{matrix}

(A11)

\begin{matrix} = H (a_{m k} (x_{k} - {\hat{x}}_{k} (t)) + {\hat{p}}_{m} (t) + O (1 / T), y_{m}, ν_{m}^{p} (t) + O (1 / T)) + c \end{matrix}

(A12)

where we assume

x_{k}

is

O (1)

,

a_{m k}

is

O (1 / \sqrt{T})

and i.i.d. drawn from a zero-mean distribution, such that

z_{m}

is

O (1)

. Based on these assumptions, other scales are defined for the sequential derivation. We list all the scales in Table A1.

Table A1. FsGAMP variable scales.

**Table A1.** FsGAMP variable scales.
Variable	Order	Variable	Order	Variable	Order
${\hat{x}}_{m k} (t)$	$O (1)$	$ν_{m k}^{x} (t)$	$O (1)$	${\hat{x}}_{m k} (t) - {\hat{x}}_{k} (t)$	$O (1 / \sqrt{T})$
${\hat{x}}_{k} (t)$	$O (1)$	$ν_{k}^{x} (t)$	$O (1)$	$ν_{m k}^{r} (t) - ν_{k}^{r} (t)$	$O (1 / T)$
${\hat{r}}_{m k} (t)$	$O (1)$	$ν_{m k}^{r} (t)$	$O (1)$	${\hat{p}}_{m k} (t) - {\hat{p}}_{m} (t)$	$O (1 / \sqrt{T})$
${\hat{r}}_{k} (t)$	$O (1)$	$ν_{k}^{r} (t)$	$O (1)$	$ν_{m k}^{p} (t) - ν_{m}^{p} (t)$	$O (1 / T)$
${\hat{z}}_{m} (t)$	$O (1)$	$ν_{m}^{z} (t)$	$O (1)$	$a_{m k} ({\hat{x}}_{k} (t) - {\hat{x}}_{m k} (t))$	$O (1 / T)$
${\hat{p}}_{m} (t)$	$O (1)$	$ν_{m}^{p} (t)$	$O (1)$
${\hat{s}}_{m} (t)$	$O (1)$	$ν_{m}^{s} (t)$	$O (1)$

It is worth noting that the assumption of

x_{k}

and

a_{m k}

can be adjusted to other orders, leading to other derivation forms.

Applying the second order Taylor-series expansion to Equation (A12), dropping higher order terms and dropping

O (1 / T)

perturbations of the

H^{'} (\cdot, \cdot, \cdot)

and

H^{''} (\cdot, \cdot, \cdot)

, since

{\hat{p}}_{m} (t)

and

ν_{m}^{p} (t)

are

O (1)

, we get:

\begin{matrix} Δ_{m \to k} (t, x_{k}) & \approx H ({\hat{p}}_{m}, y_{m}, ν_{m}^{p} (t)) \\ + a_{m k} (x_{k} - {\hat{x}}_{k} (t)) H^{'} ({\hat{p}}_{m} (t), y_{m}, ν_{m}^{p} (t)) \\ (A13) & + \frac{1}{2} {| a_{m k} |}^{2} {(x_{k} - {\hat{x}}_{k} (t))}^{2} H^{''} ({\hat{p}}_{m} (t), y_{m}, ν_{m}^{p} (t)) + c \end{matrix}

where

H^{'} (\cdot, \cdot, \cdot)

and

H^{''} (\cdot, \cdot, \cdot)

are the first derivative and second derivative of function

H (\cdot, \cdot, \cdot)

with respect to the first argument.

Define:

\begin{matrix} {\hat{s}}_{m} (t) & ≜ H^{'} ({\hat{p}}_{m} (t), y_{m}, ν_{m}^{p} (t)) \end{matrix}

(A14)

\begin{matrix} ν_{m}^{s} (t) & ≜ H^{''} ({\hat{p}}_{m} (t), y_{m}, ν_{m}^{p} (t)) \end{matrix}

(A15)

and notice that

H ({\hat{p}}_{m} (t), y_{m}, ν_{m}^{p} (t))

is a constant w.r.t. variable

x_{k}

, such that it can be dropped; then, Equation (A13) can be rewritten to:

Δ_{m \to k} (t, x_{k}) \approx [{\hat{s}}_{m} (t) a_{m k} + ν_{m}^{s} (t) {| a_{m k} |}^{2} {\hat{x}}_{k} (t)] x_{k} - \frac{1}{2} ν_{m}^{s} (t) {| a_{m k} |}^{2} x_{k}^{2} + c

(A16)

It can be seen that Equation (A16) is a quadratic form, which means the PDF function

\frac{1}{Z} exp (Δ_{m \to k} (t, x_{k}))

can be approximated as a Gaussian distribution.

Now, simplify Equations (A14) and (A15) further. Denote:

H (\hat{p}, y, ν^{p}) ≜ log [\int_{z} p_{y | z} (y | z) N (z; \hat{p}, ν^{p})]

(A17)

and then:

\begin{matrix} H^{'} (\hat{p}, y, ν^{p}) & = \frac{\partial}{\partial \hat{p}} log \int_{z} p_{y | z} (y | z) \frac{1}{\sqrt{2 π ν^{p}}} e^{- \frac{{(z - \hat{p})}^{2}}{2 ν^{p}}} \\ = \frac{\partial}{\partial \hat{p}} \{- \frac{{\hat{p}}^{2}}{2 ν^{p}} + log \int_{z} e^{(log p_{y | z} (y | z)) - \frac{z^{2}}{2 ν^{p}} + \frac{\hat{p} z}{ν^{p}}}\} \\ = - \frac{\hat{p}}{ν^{p}} + \int_{z} \frac{z}{ν^{p}} \frac{e^{(log p_{y | z} (y | z)) - \frac{z^{2}}{2 ν^{p}} + \frac{\hat{p} z}{ν^{p}}}}{Z (\hat{p})} \frac{1}{ν^{p}} \\ (A18) & = - \frac{\hat{p}}{ν^{p}} + \frac{1}{ν^{p}} \int_{z} z \frac{p_{y | z} (y | z) N (z; \hat{p}, ν^{p})}{\int_{u} p_{y | z} (y | u) N (u; \hat{p}, ν^{p})} \end{matrix}

\begin{matrix} = \frac{1}{ν^{p}} (E {z | y, \hat{p}, ν^{p}} - \hat{p}) \end{matrix}

(A19)

where the

\frac{p_{y | z} (y | z) N (z; \hat{p}, ν^{p})}{\int_{u} p_{y | z} (y | u) N (u; \hat{p}, ν^{p})}

term in Equation (A18) is a probability density function.

In the similar way, and using the property of variance

var {u} = E {u^{2}} - {(E {u})}^{2}

, we obtain:

H^{''} (\hat{p}, y, ν^{p}) = - \frac{1}{ν^{p}} (1 - \frac{var {z | y, \hat{p}, ν^{p}}}{ν^{p}})

(A20)

Recover the time index

(t)

and subscript m; we denote the first two moments of intermediate random variable

z_{m}

as:

\begin{matrix} {\hat{z}}_{m} (t) & ≜ E {z_{m} | y_{m}, {\hat{p}}_{m} (t), ν_{m}^{p} (t)} \end{matrix}

(A21)

\begin{matrix} ν_{m}^{z} (t) & ≜ var {z_{m} | y_{m}, {\hat{p}}_{m} (t), ν_{m}^{p} (t)} \end{matrix}

(A22)

They are the output function

g_{out} (\cdot, \cdot)

in Line 7 of Algorithm 2.

Appendix A2. Message from the Variable Node to the Factor Node

Plug Equation (A16) into Equation (A2); we have:

\begin{matrix} Δ_{m \leftarrow k} (t + 1, x_{k}) & = log p_{x_{k}} (x_{k}) + \sum_{i \neq m} Δ_{i \to k} (t, x_{k}) + c \\ \approx log p_{x_{k}} (x_{k}) + \sum_{i \neq m} [{\hat{s}}_{i} (t) a_{i k} + ν_{i}^{s} (t) {| a_{i k} |}^{2} {\hat{x}}_{k} (t)] x_{k} - \frac{1}{2} ν_{i}^{s} (t) {| a_{i k} |}^{2} x_{k}^{2} + c \\ (A23) & = log p_{x_{k}} (x_{k}) - \frac{1}{2 ν_{m k}^{r} (t)} {(x_{k} - {\hat{r}}_{m k} (t))}^{2} + c \end{matrix}

where:

\begin{matrix} ν_{m k}^{r} (t) & ≜ \frac{1}{\sum_{i \neq m} {| a_{i k} |}^{2} ν_{i}^{s} (t)} \end{matrix}

(A24)

\begin{matrix} {\hat{r}}_{m k} (t) & ≜ {\hat{x}}_{k} (t) + ν_{m k}^{r} (t) \sum_{i \neq m} a_{i k} {\hat{s}}_{i} (t) \end{matrix}

(A25)

they are

O (1)

quantities. Observe the form of Equation (A23); it implies a PDF function

\frac{p_{x} (x) N (x; {\hat{r}}_{m k} (t), ν_{m k}^{r} (t))}{\int_{u} p_{x} (u) N (u; {\hat{r}}_{m k} (t), ν_{m k}^{r} (t))}

. We can write the first moment of this PDF function:

{\hat{x}}_{m k} (t + 1) ≜ E {x_{k} | {\hat{r}}_{m k} (t), ν_{m k}^{r} (t)} \approx \int_{x} x \frac{p_{x} (x) N (x; {\hat{r}}_{m k} (t), ν_{m k}^{r} (t))}{\int_{u} p_{x} (u) N (u; {\hat{r}}_{m k} (t), ν_{m k}^{r} (t))}

(A26)

This form is very similar to the integration in Equation (A18). Therefore, if we define another function:

G (\hat{r}, ν^{r}) ≜ log \int_{x} p_{x} (x) N (x; \hat{r}, ν^{r})

(A27)

it is very similar to Equation (A8), and we can obtain:

\begin{matrix} G^{'} (\hat{r}, ν^{r}) & = \frac{1}{ν^{r}} (E {x | \hat{r}, ν^{r}} - \hat{r}) \end{matrix}

(A28)

\begin{matrix} G^{''} (\hat{r}, ν^{r}) & = \frac{1}{ν^{r}} (\frac{var {x | \hat{r}, ν^{r}}}{ν^{r}} - 1) \end{matrix}

(A29)

from the same derivation as Appendix A1.

Denote the form of Equation (A26) as:

g_{in} (\hat{r}, ν^{r}) ≜ \int_{x} x \frac{p_{x} (x) N (x; \hat{r}, ν^{r})}{\int_{u} p_{x} (u) N (u; \hat{r}, ν^{r})}

(A30)

then, from Equation (A28), we have:

g_{in} (\hat{r}, ν^{r}) = \hat{r} + ν^{r} G^{'} (\hat{r}, ν^{r})

(A31)

For Equation (A31), differentiate w.r.t.

\hat{r}

and plug in Equation (A29); we have:

g_{in}^{'} (\hat{r}, ν^{r}) = 1 + ν^{r} G^{''} (\hat{r}, ν^{r}) = \frac{1}{ν^{r}} var {x | \hat{r}, ν^{r}}

(A32)

Now, we get:

var {x | \hat{r}, ν^{r}} = ν^{r} g_{in}^{'} (\hat{r}, ν^{r})

(A33)

Recover the time index

(t)

and subscript

m k

, and rewrite Equation (A26) and (A33) as:

\begin{matrix} E {x_{k} | {\hat{r}}_{m k} (t), ν_{m k}^{r} (t)} & = {\hat{x}}_{m k} (t + 1) \approx g_{in} ({\hat{r}}_{m k} (t), ν_{m k}^{r} (t)) \end{matrix}

(A34)

\begin{matrix} var {x_{k} | {\hat{r}}_{m k} (t), ν_{m k}^{r} (t)} & = ν_{m k}^{x} (t + 1) \approx ν_{m k}^{r} (t) g_{in}^{'} ({\hat{r}}_{m k} (t), ν_{m k}^{r} (t)) \end{matrix}

(A35)

In order to eliminate the dependence on the subscript m of Equation (A34), we define:

\begin{matrix} ν_{k}^{r} (t) & ≜ \frac{1}{\sum_{i} {| a_{i k} |}^{2} ν_{i}^{s} (t)} \end{matrix}

(A36)

\begin{matrix} {\hat{r}}_{k} (t) & ≜ {\hat{x}}_{k} (t) + ν_{k}^{r} (t) \sum_{i} a_{i k} {\hat{s}}_{i} (t) \end{matrix}

(A37)

and assume

ν_{m k}^{r} (t) \approx ν_{k}^{r} (t)

; using the first order Taylor-series expansion, we have:

\begin{matrix} {\hat{x}}_{m k} (t + 1) & \approx g_{in} ({\hat{r}}_{m k} (t), ν_{m k}^{r} (t)) \\ \approx g_{in} ({\hat{r}}_{k} (t) - a_{m k} {\hat{s}}_{m} (t) ν_{k}^{r} (t), ν_{k}^{r} (t)) \\ (A38) & \approx g_{in} ({\hat{r}}_{k} (t), ν_{k}^{r} (t)) - a_{m k} {\hat{s}}_{m} (t) ν_{k}^{r} (t) g_{in}^{'} ({\hat{r}}_{k} (t), ν_{k}^{r} (t)) \end{matrix}

For simplicity, define:

\begin{matrix} {\hat{x}}_{k} (t + 1) & ≜ g_{in} ({\hat{r}}_{k} (t), ν_{k}^{r} (t)) \end{matrix}

(A39)

\begin{matrix} ν_{k}^{x} (t + 1) & ≜ ν_{k}^{r} (t) g_{in}^{'} ({\hat{r}}_{k} (t), ν_{k}^{r} (t)) \end{matrix}

(A40)

then, Equation (A26) is

{\hat{x}}_{m k} (t + 1) \approx {\hat{x}}_{k} (t + 1) - a_{m k} {\hat{s}}_{m} (t) ν_{k}^{x} (t + 1)

.

Finally,

{\hat{p}}_{m} (t)

in Equation (A9) and

ν_{m}^{p} (t)

in Equation (A10) can be rewritten as:

{\hat{p}}_{m} (t + 1) = \sum_{j} a_{m j} {\hat{x}}_{m j} (t + 1) \approx \sum_{j} a_{m j} {\hat{x}}_{j} (t + 1) - {\hat{s}}_{m} (t) \sum_{j} {| a_{m j} |}^{2} ν_{j}^{x} (t + 1)

(A41)

and define

ν_{j}^{p} (t + 1) ≅ \sum_{j} {| a_{m j} |}^{2} ν_{j}^{x} (t + 1)

. We close the loop.

References

Wu, Z.; Peng, S. Proportionate Minimum Error Entropy Algorithm for Sparse System Identification. Entropy 2015, 17, 5995–6006. [Google Scholar] [CrossRef]
Ma, W.; Qu, H. Maximum correntropy criterion based sparse adaptive filtering algorithms for robust channel estimation under non-Gaussian environments. J. Frankl. Inst. 2015, 352, 2708–2727. [Google Scholar] [CrossRef]
Ma, W.; Chen, B. Sparse least logarithmic absolute difference algorithm with correntropy induced metric penality. Circuit Syst. Signal Process. 2015, 35, 1077–1089. [Google Scholar] [CrossRef]
Chen, B.; Zhao, S. Online efficient learning with quantized KLMS and L1 regularization. In Proceedings of the International Joint Conference on Neural Networks, (IJCNN 2012), Brisbane, Australia, 10–15 June 2012; pp. 1–6.
Wang, W.X.; Yang, R.; Lai, Y.C.; Kovanis, V.; Grebogi, C. Predicting Catastrophes in Nonlinear Dynamical Systems by Compressive Sensing. Phys. Rev. Lett. 2011, 106. [Google Scholar] [CrossRef] [PubMed]
Brunton, S.L.; Tu, J.H.; Bright, I.; Kutz, J.N. Compressive Sensing and Low-Rank Libraries for Classification of Bifurcation Regimes in Nonlinear Dynamical Systems. SIAM J. Appl. Dyn. Syst. 2014, 13, 1716–1732. [Google Scholar] [CrossRef]
Donoho, D.; Maleki, A.; Motanari, A. Message Passing algorithms for compressed sensing. Proc. Natl. Acad. Sci. USA 2009, 106, 18914–18919. [Google Scholar] [CrossRef] [PubMed]
Donoho, D.; Maleki, A.; Montanari, A. Message passing algorithms for compressed sensing: I. motivation and construction. In Proceedings of the 2010 IEEE Information Theory Workshop (ITW), Cairo, Egypt, 6–8 January 2010; pp. 1–5.
Rangan, S. Generalized approximate message passing for estimation with random linear mixing. In Proceedings of the 2011 IEEE International Symposium on Information Theory Proceedings (ISIT), St. Petersburg, Russia, 31 July–5 August 2011; pp. 2168–2172.
Pati, Y.; Rezaiifar, R.; Krishnaprasad, P. Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. In Proceedings of the 1993 Conference Record of the Twenty-Seventh Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 1–3 November 1993; Volume 1, pp. 40–44.
Needell, D.; Tropp, J.A. CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmonic Anal. 2009, 26, 301–321. [Google Scholar] [CrossRef]
Needell, D.; Vershynin, R. Uniform uncertainty principle and signal recovery via regularized orthogonal matching pursuit. Found. Comput. Math. 2009, 9, 317–334. [Google Scholar] [CrossRef]
Needell, D.; Vershynin, R. Signal recovery from incomplete and inaccurate measurements via regularized orthogonal matching pursuit. IEEE J. Sel. Top. Signal Process. 2010, 4, 310–316. [Google Scholar] [CrossRef]
Wainwright, M.J.; Jordan, M.I. Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 2008, 1, 1–305. [Google Scholar] [CrossRef]
Rangan, S.; Schniter, P.; Fletcher, A. On the convergence of approximate message passing with arbitrary matrices. In Proceedings of the 2014 IEEE International Symposium on Information Theory (ISIT), Honolulu, HI, USA, 29 June–4 July 2014; pp. 236–240.
Golub, G.H.; van Loan, C.F. Matrix Computations, 3rd ed.; The Johns Hopkins University Press: Baltimore, MD, USA, 2012. [Google Scholar]
Ḿezard, M.; Parisi, G.; Virasoro, M.A. Spin-Glass Theory and Beyond. In World Scientific Lecture Notes in Physics; World Scientific: Singapore, Singapore, 1987; Volume 9. [Google Scholar]
Ḿezard, M.; Montanari, A. Information, Physics, and Computation; Oxford University Press: Oxford, UK, 2009. [Google Scholar]
Rangan, S.; Goyal, V.; Fletcher, A.K. Asymptotic analysis of map estimation via the replica method and compressed sensing. IEEE Trans. Inform. Theory 2012, 58, 1902–1923. [Google Scholar] [CrossRef] [Green Version]
Kabashima, Y.; Wadayama, T.; Tanaka, T. Statistical mechanical analysis of a typical reconstruction limit of compressed sensing. In Proceedings of the 2010 IEEE International Symposium on Information Theory (ISIT), Austin, TX, USA, 13–18 June 2010; pp. 1533–1537.
Ganguli, S.; Sompolinsky, H. Statistical mechanics of compressed sensing. Phys. Rev. Lett. 2010, 104. [Google Scholar] [CrossRef] [PubMed]
Solomon, A.T.; Bruhtesfa, E.G. Compressed Sensing Performance Analysis via Replica Method Using Bayesian framework. In Proceedings of the 17th UKSIM-AMSS International Conference on Modelling and Simulationm, Cambridge, UK, 25–27 March 2015; pp. 281–289.
Krzakala, F.; Mézard, M.; Sausset, F.; Sun, Y.; Zdeborová, L. Probabilistic reconstruction in compressed sensing: Algorithms, phase diagrams, and threshold achieving matrices. J. Stat. Mech. Theory Exp. 2012, 2012. [Google Scholar] [CrossRef]
Needell, D. Available online: http://www.cmc.edu/pages/faculty/DNeedell (accessed on 23 May 2016).
ℓ₁-MAGIC. Available online: http://users.ece.gatech.edu/ justin/l1magic (accessed on 23 May 2016).
Kamilov, U.S. Available online: http://www.ukamilov.com (accessed on 23 May 2016).
GAMP. Available online: http://gampmatlab.wikia.com/wiki/Generalized_Approximate_Message_Passing (accessed on 23 May 2016).
Zhu, X.X.; Bamler, R. Superresolving SAR Tomography for Multidimensional Imaging of Urban Areas: Compressive Sensing-Based TomoSAR inversion. IEEE Signal Process. Mag. 2014, 31, 51–58. [Google Scholar] [CrossRef]
Carrara, W.G.; Goodman, R.S.; Majewski, R.M. Spotlight Synthetic Aperture Radar: Signal Processing Algorithms; Artech House: Norwood, MA, USA, 1995. [Google Scholar]
Cong, X.; Liu, J.; Long, K.; Liu, Y.; Zhu, R.; Wan, Q. Millimeter-wave spotlight circular synthetic aperture radar (SCSAR) imaging for Foreign Object Debris on airport runway. In Proceedings of the 12th International Conference on Signal Processing (ICSP), HangZhou, China, 26–30 October 2014; pp. 1968–1972.

Figure 1. The factor graph description of a linear system. (a) Bipart factor graph; (b) Restricted factor graph.

Figure 2. Reconstruction performance comparison using the zero-mean Gaussian projection matrix. (a) Noise power test; (b) Sparsity range test.

Figure 3. Reconstruction performance comparison using three general form projection matrices. (a) Sparse matrix test; (b) Non-zero-mean matrix test; (c) Strong correlated matrix test.

Figure 4. TomoSAR imaging results comparison using four Compressed Sensing (CS) algorithms and Polar Format Algorithm (PFA) algorithm as a reference. (a) PFA reconstructed image; (b) CS reconstructed images.

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Luo, Y.; Gui, G.; Cong, X.; Wan, Q. Sparse Estimation Based on a New Random Regularized Matching Pursuit Generalized Approximate Message Passing Algorithm. Entropy 2016, 18, 207. https://doi.org/10.3390/e18060207

AMA Style

Luo Y, Gui G, Cong X, Wan Q. Sparse Estimation Based on a New Random Regularized Matching Pursuit Generalized Approximate Message Passing Algorithm. Entropy. 2016; 18(6):207. https://doi.org/10.3390/e18060207

Chicago/Turabian Style

Luo, Yongjie, Guan Gui, Xunchao Cong, and Qun Wan. 2016. "Sparse Estimation Based on a New Random Regularized Matching Pursuit Generalized Approximate Message Passing Algorithm" Entropy 18, no. 6: 207. https://doi.org/10.3390/e18060207

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Sparse Estimation Based on a New Random Regularized Matching Pursuit Generalized Approximate Message Passing Algorithm

Abstract

1. Introduction

2. Random Regularized Matching Pursuit Generalized Approximate Message Passing

2.1. Random Regularized Matching Pursuit Algorithm

2.2. Fixed Support GAMP Algorithm

2.3. Computational Complexity Discussion

3. Convergence Discussion

3.1. The Replica Method Analysis of FsGAMP

3.2. The Prior Matching Conditions and Nishimori Conditions

4. Experiments

4.1. Zero-Mean Gaussian Projection Matrix Cases

4.2. More General Projection Matrix Cases

4.3. TomoSAR Imaging Application

5. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

Appendix

Appendix A1. Message from the Factor Node to the Variable Node

Appendix A2. Message from the Variable Node to the Factor Node

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI