Heavy-Ball-Based Hard Thresholding Pursuit for Sparse Phase Retrieval Problems

Yingying Li; Jinchuan Zhou; Zhongfeng Sun; Jingyong Tang

doi:10.3390/math11122744

,

and

¹

School of Mathematics and Statistics, Shandong University of Technology, Zibo 255000, China

²

School of Mathematics and Statistics, Xinyang Normal University, Xinyang 464000, China

^*

Author to whom correspondence should be addressed.

Mathematics2023, 11(12), 2744;https://doi.org/10.3390/math11122744

This article belongs to the Special Issue Optimization Theory, Method and Application

Version Notes

Order Reprints

Abstract

We introduce a novel iterative algorithm, termed the Heavy-Ball-Based Hard Thresholding Pursuit for sparse phase retrieval problem (SPR-HBHTP), to reconstruct a sparse signal from a small number of magnitude-only measurements. Our algorithm is obtained via a natural combination of the Hard Thresholding Pursuit for sparse phase retrieval (SPR-HTP) and the classical Heavy-Ball (HB) acceleration method. The robustness and convergence for the proposed algorithm were established with the help of the restricted isometry property. Furthermore, we prove that our algorithm can exactly recover a sparse signal with overwhelming probability in finite steps whenever the initialization is in the neighborhood of the underlying sparse signal, provided that the measurement is accurate. Extensive numerical tests show that SPR-HBHTP has a markedly improved recovery performance and runtime compared to existing alternatives, such as the Hard Thresholding Pursuit for sparse phase retrieval problem (SPR-HTP), the SPARse Truncated Amplitude Flow (SPARTA), and Compressive Phase Retrieval with Alternating Minimization (CoPRAM).

Keywords:

sparse phase retrieval; Heavy-Ball method; Hard Thresholding Pursuit; restricted isometry property

MSC:

90C26; 65K05; 49M37

1. Introduction

In many engineering problems, one wishes to reconstruct signals from the (squared) modulus of its Fourier (or any linear) transform. This is called phase retrieval (PR). Particularly, if the target signal is sparse, this is referred to a sparse phase retrieval problem. The corresponding mathematical model is used to recover a sparse vector

x \in R^{n}

from a system of phaseless quadratic equations taking the form

y_{i} = | ⟨ a_{i}, x ⟩ {|, i = 1, 2, \dots, m, subject to ∥ x ∥}_{0} \leq s,

(1)

where

{\{α_{i}\}}_{i = 1}^{m}

are a set of n-dimensional sensing vectors,

{\{y_{i}\}}_{i = 1}^{m}

for

i = 1, 2, \dots, m

are observed modulus data, and s is the sparsity level (s is much less than n and is assumed to be known a priori for theoretical analysis purposes). As a matter of fact, solving the above problem is conducted to solve the following optimization system with the optimal value equal to zero

\begin{matrix} min \frac{1}{2} ‖ y - |A x| ‖_{2}^{2} s . t . {∥ x ∥}_{0} \leq s, \end{matrix}

(2)

where the measurement matrix A and observed data y are processed as follows:

\begin{matrix} A : = \frac{1}{\sqrt{m}} {[\begin{matrix} a_{1} & a_{2} & \dots & a_{m} \end{matrix}]}^{T} \in R^{m \times n}, y : = \frac{1}{\sqrt{m}} {[\begin{matrix} y_{1} & y_{2} & \dots & y_{m} \end{matrix}]}^{T} . \end{matrix}

(3)

The phase retrieval problem arises naturally in many important applications, such as X-ray crystallography, microscopy, and astronomical imaging, etc. Interested readers are referred to [1] and its references for a more detailed discussion of the scientific and engineering background of the model.

Although several heuristic methods [2,3,4,5] are commonly used to solve (1), it is generally accepted that (1) is a very challenging, ill-posed, nonlinear inverse problem in both theory and practice. In particular, (1) is an NP-hard problem [6]. At present, phase retrieval approaches can be mainly categorized as convex and nonconvex. A popular class of nonconvex approaches is based on alternating projections, e.g., the groundbreaking works by Gerchberg and Saxton [2], Fienup [3], Chen et al. [7], and Waldspurger [8]. The convex alternatives either rely on the so-called Shor’s relaxation to obtain a solver based on semidefinite programming (SDP), known simply as PhaseLift [9] and PhaseCut [10], or solve the problem of basis pursuit in dual domains, as conducted in PhaseMax [11,12]. Next, a series of breakthrough results [9,13,14,15] provided provably valid algorithmic processes for special cases, where the measurement vectors are randomly derived from some certain multivariate probability distribution, such as the Gaussian distributions. Phase retrieval problems [9,13,14] require the number of observations m to exceed the problem dimension n. However, for the sparse phase retrieval problem, the true signal can be successfully recovered, even if the number of measurements m is less than the length of the signal n. In particular, a recent paper [16] utilized the random sampling technique to achieve the best empirical sampling complexity; in other words, it requires less measurements than state-of-the-art algorithms for sparse phase retrieval problems. For more on phase recovery, interested readers can refer to [17,18,19,20,21,22,23,24,25,26,27].

There is a close relationship between the phase recovery problem and the compressed sensing problem [28,29,30]. Compressed sensing, also known as compressed sampling, is a new information technology used to find sparse solutions to underdetermined linear systems. Its corresponding mathematical model can be expressed as finding a sparse vector x from the following linear problem with sparsity constraints

y_{i} = ⟨ a_{i}, x ⟩, i = 1, 2, \dots, m, subject to {∥ x ∥}_{0} \leq s .

(4)

Some mainstream algorithms for solving this problem have been proposed, including Iterative Hard Thresholding (IHT) [31], the Orthogonal Matching Pursuit (OMP) [32], the Compressive Sampling Matching Pursuit (CoSaMP) [33], the Subspace Pursuit (SP) [34], and the Hard Thresholding Pursuit (HTP) [29]. The optimization model for the problem (4) is

\begin{matrix} min \frac{1}{2} {‖ y - A x ‖}_{2}^{2} s . t . {∥ x ∥}_{0} \leq s . \end{matrix}

(5)

Comparing two problems, (1) and (4), we see that problem (1) has an extra absolute value over problem (4). This makes the objective function of system (2) nonsmooth compared to system (5). Accordingly, this leads to a huge difference in the design of our algorithms. How can we naturally modify the compressed sensing algorithm to solve the phase recovery problem? Recently, a breakthrough in this direction was the algorithm (we call it SPR-HTP) proposed in [15], which is a modification of HTP from linear measurements to phaseless measurements. Inspired by [15], we think that an acceleration method could be used to accelerate SPR-HTP. The Heavy-Ball was first introduced by Polyak [35], and it is an old as well as efficient way to speed things up. More recently, in [36], the authors combined the Heavy-Ball acceleration method with the classic HTP to obtain the HBHTP for solving compressed sensing. Their numerical experiment shows that the HBHTP performs better than the classical HTP in terms of the recovery capability and runtime.

Considering the above motivation, we propose, in this paper, a novel nonconvex algorithm to offset the disadvantages of existing algorithms. The new algorithm is called the Heavy-Ball-Based Hard Thresholding Pursuit for sparse phase retrieval problem (SPR-HBHTP). Similar to the most of the existing nonconvex algorithms, our proposed algorithm is divided into two stages: the initialization part and the iteration part. The optimization model (2) could have multiple local minimizers due to its nonconvexity. Hence, the initialization step is crucial to ensure the initial point can fall within a certain neighborhood of the real signal. Common initialization methods are orthogonality-promoting initialization, spectral initialization, variant of the spectral initialization, etc. In our algorithm, we adopt off-the-shelf spectral initialization as the initial step. For more details, please refer to reference [15]. For the iterative refinement stage, the search direction for SPR-HTP is

α A^{T} (y ⊙ sgn (A x^{n}) - A x^{n}),

where ⊙ represents the Hadamard product (the definition is given below) of two vectors, and

α > 0

is a positive parameter. Our algorithm SPR-HBHTP is a combination of SPR-HTP and the Heavy-Ball method, and the search direction is represented as

α A^{T} (y ⊙ sgn (A x^{n}) - A x^{n}) + β (x^{n} - x^{n - 1})

with two parameters

α > 0

,

β \geq 0

. Clearly SPR-HBHTP reduces to SPR-HTP as

β = 0

. The following theoretical and numerical experiments show that the modified algorithm (SPR-HBHTP) using the momentum term

x^{n} - x^{n - 1}

matches the best available state-of-the-art sparse phase retrieval methods. With the help of the restricted isometry property (RIP) [37], the convergence of our algorithm is established, and our algorithm is proved to have robust sparse signal recovery under inaccurate measurement conditions. Moreover, our algorithm can exactly recover a s-sparse signal in finite steps if the measurement is accurate.

Our contributions in this paper are as follows: we propose a new class of HTP-type algorithms called the Heavy-Ball-Based Hard Thresholding Pursuit for sparse phase retrieval problem (SPR-HBHTP), which is a natural combination of the Hard Thresholding Pursuit for sparse phase retrieval (SPR-HTP) and the classical Heavy-Ball (HB) acceleration method. In the theoretical analysis, a new theoretical analysis framework is introduced to establish the theoretical performance results of the proposed algorithm by resorting to the restricted isometry property of measurement matrices. The local convergence of our algorithm is established regardless of whether there is noise in the measurement. An estimation of the iteration number is established under the framework of accurate measurements. For the aspect of numerical experiments, the phase transition curves, grayscale maps, and algorithm selection maps demonstrate that the new algorithm SPR-HBHTP is numerically much more efficient than existing alternatives, such as SPR-HTP, SPARTA, and CoPRAM in terms of both the recovery success rate and the recovery time.

The rest of the paper is organized as follows: we describe the SPR-HBHTP in Section 2. A theoretical analysis of the proposed algorithm is conducted in Section 3. Numerical experiments to illustrate the performance of the algorithm are given in Section 4, and conclusions are drawn in the last section.

2. Preliminary and Algorithms

2.1. Notations

We introduce some notations that are used throughout the paper. Let

[N]

denote the set

\{1, 2, \dots, N\}

and

|S|

be the cardinality of a set S. Denote by

\bar{S}

the complement

[N] \ S

of a set S in

[N]

. At

x \in R^{n}

, the signum function

sgn (x) \in R^{n}

is defined as

{[sgn (x)]}_{i} : = \{\begin{matrix} 1 & x_{i} > 0, \\ 0 & x_{i} = 0, \\ - 1 & x_{i} < 0 . \end{matrix}

The Hard Thresholding operator

H_{s} (x)

keeps the s largest absolute entries and sets other ones to zeros. For an index set

Ω \subseteq [N]

,

A_{Ω}

denotes the matrix obtained from

A \in R^{m \times n}

by keeping only the columns indexed by

Ω

, and

x_{Ω}

stands for the vector obtained by retaining the components of x indexed by

Ω

and zeroing out the remaining components of x. The set

supp (x) : = \{i \in [N] : x_{i} \neq 0\}

is called the support of x, and

L_{s} (x)

stands for the support of

H_{s} (x)

. For two sets, S and

\tilde{S}

,

S △ \tilde{S} : = (S \ \tilde{S}) ⋃ (\tilde{S} \ S)

is the symmetric difference between S and

\tilde{S}

. For

x, z \in R^{n}

, the distance between x and z is defined as

dist (x, z) : = \min \{{∥ x - z ∥}_{2}, {∥ x + z ∥}_{2}\} .

(6)

The notation ⊙ represents the Hadamard product of two vectors, i.e.,

a ⊙ b : = {(a_{1} b_{1}, a_{2} b_{2}, \dots, a_{n} b_{n})}^{T} .

2.2. New Algorithms

The alternative initialization methods include orthogonality-promoting initialization [38], spectral initialization [39], or the other initialization [40], etc. In this paper, we use spectral initialization to constitute the initial step of the algorithm.

Algorithm 1: Heavy-Ball-Based Hard Thresholding Pursuit for sparse phase retrieval (SPR-HBHTP).

Input: a matrix

{\{a_{i}\}}_{i = 1}^{m}

, a vector

{\{y_{i}\}}_{i = 1}^{m}

, and two parameters

α > 0

and

β \geq 0

.

Initialization:

ϕ^{2} : = \frac{1}{m} \sum_{i = 1}^{m} y_{i}^{2}, v_{j} : = \frac{1}{m} \sum_{i = 1}^{m} y_{i}^{2} a_{i j}^{2}, j = 1, \dots n, \tilde{S} : = L_{S} (v),

W : = \frac{1}{m} \sum_{i = i}^{m} y_{i}^{2} {[a_{i}]}_{\tilde{S}} {[a_{i}]}_{\tilde{S}}^{T}, x^{0} : = ϕ \tilde{u},

where

\tilde{u}

is the unit principal eigenvector of W. Set

x^{1} = x^{0}

.

Repeat:

z^{n + 1} = A x^{n}, y^{n + 1} = y ⊙ sgn (z^{n + 1}),

S^{n + 1} = supp (H_{s} (x^{n} + α A^{T} (y^{n + 1} - z^{n + 1}) + β (x^{n} - x^{n - 1}))),

x^{n + 1} = arg \min \{\frac{1}{2} {∥ A z - y^{n + 1} ∥}_{2}^{2} : supp (z) \subseteq S^{n + 1}\} .

Output: the s-sparse vector

x .

The following is a brief explanation of initialization in the Algorithm 1. For a more rigorous and in-depth discussion of this aspect, please refer to [15,39].

Estimate the support: Since

E [\frac{1}{m} \sum_{i = 1}^{m} y_{i}^{2} a_{i j}^{2}]

=

{∥ x ∥}_{2}^{2} + 2 x_{j}^{2}

,

\tilde{S} = L_{s} ({\{\frac{1}{m} \sum_{i = 1}^{m} y_{i}^{2} a_{i j}^{2}\}}_{j = 1}^{n})

contains a mass of correct support, it could be a good approximation of the support of x.

Compute the signal: Note that

E [\frac{1}{m} \sum_{i = i}^{m} y_{i}^{2} a_{i} a_{i}^{T}] = (I + 2 \frac{x}{{∥ x ∥}_{2}} \cdot \frac{x^{T}}{{∥ x ∥}_{2}}) {∥ x ∥}_{2}^{2}, E [\frac{1}{m} \sum_{i = 1}^{m} y_{i}^{2}] = E [{∥ y ∥}_{2}] = {∥ x ∥}_{2} .

Hence, the principal eigenvector of the matrix W in Algorithm 1 gives a good initial direction guess of the true signal x. We select the principal eigenvector vector with a length of

{∥ y ∥}_{2}

as the initial point

x^{0}

to ensure that the power of the initial estimate

x^{0}

is close to that of the true signal x.

The most important conclusion of the initialization process is that the initial point can fall within a certain neighborhood of the real signal relative to the distance measurement defined in (6).

Lemma 1

([39] [Theorem IV.1]). Let

{\{a_{i}\}}_{i = 1}^{m}

be i.i.d Gaussian random vectors with a mean of 0 and variance matrix of I. Let

x_{0}

be generated by initialization in Algorithm 1 with the input

y_{i} = |a_{i}^{T} x|

for

i = 1, \dots, m

, where

x \in R^{n}

is a signal satisfying

{∥ x ∥}_{0} \leq s

. Then, for any

λ_{0} \in (0, 1)

, there exists a positive constant C depending only on

λ_{0} \in (0, 1)

, such that if

m \geq C s^{2} log n

, we have

dist (x_{0}, x) \leq λ_{0} {∥ x ∥}_{2}

with a probability of at least

1 - 8 m^{- 1}

.

3. Convergence Analysis

We first list some of the main Lemmas and present our results on the local convergence of the proposed algorithm (SPR-HBHTP).

Lemma 2

([30] [Theorem 9.27]). Let

{\{a_{i}\}}_{i = 1}^{m}

be i.i.d Gaussian random vectors with a mean of 0 and variance matrix of I. Let A be defined in (3). Some universal positive constants

C_{1}

,

C_{2}

exist, such that for any natural number

r \leq n

and any

δ_{r} \in (0, 1)

, if

m \geq C_{1} δ_{r}^{- 2} r log (n / r)

, then A satisfies the following r-RIP with a probability of at least

1 - e^{- C_{2} m}

,

\begin{matrix} (1 - δ_{r}) {∥ x ∥}_{2}^{2} \leq {∥ A x ∥}_{2}^{2} \leq (1 + δ_{r}) {∥ x ∥}_{2}^{2}, \forall {∥ x ∥}_{0} \leq r . \end{matrix}

(7)

Lemma 3

([36] [Lemma 3.1]). Suppose that the non-negative sequence

\{τ^{n}\} \subseteq R

(n = 0, 1, . . .)

satisfies

\begin{matrix} τ^{n + 1} \leq b_{1} τ^{n} + b_{2} τ^{n - 1} + b_{3}, n \geq 1, \end{matrix}

(8)

where

b_{1}, b_{2}, b_{3} \geq 0

and

b_{1} + b_{2} < 1

. Then,

\begin{matrix} τ^{n} \leq θ^{n - 1} [τ^{1} + (θ - b_{1}) τ^{0}] + \frac{b_{3}}{1 - θ} \end{matrix}

with

0 \leq θ : = \frac{b_{1} + \sqrt{b_{1}^{2} + 4 b_{2}}}{2} < 1 .

Lemma 4

([29,41]). Let

u \in R^{n}

,

v \in R^{m}

and

S \subseteq {1, \dots, n}

. Assume that the condition (7) holds.

(i).: If $| S \cup s u p p (u) | \leq r$ , then

$\begin{matrix} {∥ {((I - A^{T} A) u)}_{S} ∥}_{2} \leq δ_{r} {∥ u ∥}_{2} . \end{matrix}$

(9)
(ii).: If $| S | \leq r$ , then

${∥ {(A^{T} v)}_{S} ∥}_{2} \leq \sqrt{1 + δ_{r}} {∥ v ∥}_{2} .$

(10)

Lemma 5

([15] [Lemma 2]). Let

{\{a_{i}\}}_{i = 1}^{m}

be i.i.d. Gaussian random vectors with a mean of 0 and variance matrix of I. Let

λ_{0}

be any constant in

(0, \frac{1}{8}]

. After fixing any given

ε_{0} > 0

, the universal positive constants

C_{3}

,

C_{4}

exist. If

m \geq C_{3} s log (n / s),

then with a probability of at least

1 - e^{- C_{4} m}

, it holds that

\begin{matrix} \frac{1}{m} \sum_{i = 1}^{m} | a_{i}^{T} x^{♮} |^{2} \cdot 1_{\{(a_{i}^{T} x) (a_{i}^{T} x^{♮}) \leq 0\}} \leq \frac{1}{{(1 - λ_{0})}^{2}} {(ε_{0} + λ_{0} \sqrt{\frac{21}{20}})}^{2} {∥ x - x^{♮} ∥}_{2}^{2}, \end{matrix}

(11)

whenever

x, x^{♮}

satisfies

{∥ x ∥}_{0} \leq s

and

∥ x - x^{♮} ∥_{2} \leq λ_{0} {∥ x^{♮} ∥}_{2}

.

Lemma 6.

Let

{\{a_{i}\}}_{i = 1}^{m}

be i.i.d. Gaussian random vectors with a mean of 0 and variance matrix of I. Let

z, x, \hat{y}, λ_{0}

be given as

\hat{y} = {| A x | ⊙ sgn (A z), ∥ z - x ∥}_{2} \leq λ_{0} {∥ x ∥}_{2}, {∥ z ∥}_{0} \leq s, λ_{0} \in (0, \frac{1}{8}] .

For any

ε_{0} > 0

, the universal positive constants

C_{5}

and

C_{6}

exist, such that if

\begin{matrix} m \geq C_{5} s log (n / s), \end{matrix}

(12)

then the following inequality holds with a probability of at least

1 - e^{- C_{6} m} :

∥ {(A^{T} (\hat{y} - A x))}_{S} ∥_{2} \leq θ (λ_{0}) \sqrt{1 + δ_{s}} {∥ z - x ∥}_{2},

where

θ (λ_{0}) : = \frac{2}{1 - λ_{0}} (ε_{0} + λ_{0} \sqrt{\frac{21}{20}})

and

S \subset \{1, 2, \dots, m\}

with

| S | \leq s

.

Proof of Lemma 6.

Pick

C_{5} : = max {C_{1} δ_{s}^{- 2}, C_{3}}

, where

C_{1}, C_{3}

comes from Lemmas 2 and 5. The condition (12) ensures the validity of (7) with a probability of at least

1 - e^{- C_{2} m}

and (11) with a probability of at least

1 - e^{- C_{4} m}

, respectively. Let

C_{6} : = \frac{ln (e^{- C_{2} m} + e^{- C_{4} m} - e^{- (C_{2} + C_{4}) m})}{- m} .

It is easy to see that

(1 - e^{- C_{2} m}) (1 - e^{- C_{4} m}) = 1 - e^{- C_{6} m}

. In other words, the probability of (7) and (11) being true is at least

1 - e^{- C_{6} m}

. By replacing

y_{k + 1}

,

x_{k}

and

x^{♮}

used in [15] [Equation (16)] by

\hat{y}

, z and x, we obtain

∥ \hat{y} {- A x ∥}_{2}^{2} \leq \frac{4}{{(1 - λ_{0})}^{2}} {(ε_{0} + λ_{0} \sqrt{\frac{21}{20}})}^{2} {∥ z - x ∥}_{2}^{2} = θ {(λ_{0})}^{2} {∥ z - x ∥}_{2}^{2} .

This, together with (10) in Lemma 4, leads to

∥ {(A^{T} (\hat{y} - A x))}_{S} ∥_{2} \leq \sqrt{1 + δ_{s}} ∥ \hat{y} {- A x ∥}_{2} \leq θ (λ_{0}) \sqrt{1 + δ_{s}} {∥ z - x ∥}_{2} .

□

The local convergence property of Algorithm 1 is established in the following result.

Theorem 1.

(Local convergence). Let

{\{a_{i}\}}_{i = 1}^{m}

be i.i.d. Gaussian random vectors with a mean of 0 and variance matrix of I. Let

λ_{0}

be any constant in

(0, \frac{1}{8}]

. Suppose that the RIC,

δ_{3 s}

, of matrix A and the parameters α and β obey

\begin{matrix} 0 < δ_{3 s} < δ^{♯}, 0 \leq β < \frac{\frac{1}{η} + 1 - \frac{θ (λ_{0}) \sqrt{1 + δ_{s}}}{η (1 - δ_{2 s})}}{1 + δ_{3 s} + θ (λ_{0}) \sqrt{1 + δ_{2 s}}} - 1, \end{matrix}

(13)

\begin{matrix} \frac{1 + 2 β - \frac{1}{η} + \frac{θ (λ_{0}) \sqrt{1 + δ_{s}}}{η (1 - δ_{2 s})}}{1 - δ_{3 s} - θ (λ_{0}) \sqrt{1 + δ_{2 s}}} < α < \frac{\frac{1}{η} + 1 - \frac{θ (λ_{0}) \sqrt{1 + δ_{s}}}{η (1 - δ_{2 s})}}{1 + δ_{3 s} + θ (λ_{0}) \sqrt{1 + δ_{2 s}}}, \end{matrix}

(14)

where

\begin{matrix} θ (λ_{0}) : = \frac{2}{1 - λ_{0}} (ε_{0} + λ_{0} \sqrt{\frac{21}{20}}), ε_{0} : = 10^{- 3}, η : = \frac{\sqrt{2}}{\sqrt{1 - δ_{2 s}^{2}}}, \end{matrix}

(15)

and

δ^{♯}

is the unique root in the interval

(0, 1)

of the equation

γ (x) = θ (λ_{0})

with

γ (x) : = \frac{(1 - x) \sqrt{1 + x} - x \sqrt{2 (1 - x)}}{1 + x + \sqrt{2 (1 - x^{2})}} .

For a s-sparse signal

x \in R^{n}

, the universal positive constants

C_{7}

,

C_{8}

exist, such that if

m \geq C_{7} s log (n / s), dist (x^{0}, x) \leq λ_{0} {∥ x ∥}_{2},

then the sequence

\{x^{n}\}

generated by Algorithm 1 with input measured data

y = |A x|

satisfies

\begin{matrix} dist (x^{n + 1}, x) \leq τ^{n} (min {∥ x^{1} - x ∥_{2} + (τ - b) ∥ x^{0} - x ∥_{2}, ∥ x^{1} + x ∥_{2} + (τ - b) ∥ x^{0} + x ∥_{2}}) \end{matrix}

(16)

with a probability of at least

{(1 - e^{- C_{8} m})}^{n}

, where

\begin{matrix} τ : = \frac{b + \sqrt{b^{2} + 4 η β}}{2}, b : = η (|1 - α + β| + α (δ_{3 s} + θ (λ_{0}) \sqrt{1 + δ_{2 s}})) + \frac{θ (λ_{0}) \sqrt{1 + δ_{s}}}{1 - δ_{2 s}}, \end{matrix}

(17)

and

τ < 1

is guaranteed under the conditions (13) and (14).

Proof of Theorem 1.

Note that

x_{0} = x_{1}

in the design of Algorithm 1 and

x_{0}

is generated by utilizing the initialization process. Hence,

dist (x^{0}, x) = dist (x^{1}, x) \leq λ_{0} {∥ x ∥}_{2}

by Lemma 1. Since

dist (x^{0}, x) = min {∥ x^{0} {- x ∥}_{2}, ∥ x^{0} + x ∥}

, we can assume without a loss of generalization that

dist (x^{0}, x) = {∥ x^{0} - x ∥}_{2}

(the case of

dist (x^{0}, x) = {∥ x^{0} + x ∥}_{2}

can be proved by following a similar argument). Hence,

∥ x^{0} {- x ∥}_{2} \leq λ_{0} {∥ x ∥}_{2}

,

∥ x^{1} {- x ∥}_{2} \leq λ_{0} {∥ x ∥}_{2}

. Our proof is based on mathematical induction, and hence, we further assume that

∥ x^{n - 1} {- x ∥}_{2} \leq λ_{0} {∥ x ∥}_{2}

,

∥ x^{n} {- x ∥}_{2} \leq λ_{0} {∥ x ∥}_{2}

.

The first step of the proof is a consequence of the pursuit step of Algorithm 1. Recall that

\begin{matrix} x^{n + 1} = argmin \{\frac{1}{2} {∥ y^{n + 1} - A z ∥}_{2}^{2} : supp (z) \subset S^{n + 1}\} . \end{matrix}

As the best

l_{2}

-approximation to

y^{n + 1}

from the space

\{A z | supp (z) \subset S^{n + 1}\}

, the vector

x^{n + 1}

is characterized by

⟨ y^{n + 1} - A x^{n + 1}, A z ⟩ = 0 whenever supp (z) \subset S^{n + 1},

i.e.,

⟨ A^{T} (y^{n + 1} - A x^{n + 1}), z ⟩ = 0

whenever supp (z) \subset S^{n + 1}

or

{(A^{T} (y^{n + 1} - A x^{n + 1}))}_{S^{n + 1}} = 0

. We derive, in particular,

\begin{matrix} ∥ {(x^{n + 1} - x)}_{S^{n + 1}} ∥_{2}^{2} \\ = ⟨ {(x^{n + 1} - x)}_{S^{n + 1}}, {(x^{n + 1} - x)}_{S^{n + 1}} ⟩ \\ = ⟨ {(x^{n + 1} - x)}_{S^{n + 1}}, {(x^{n + 1} - x + A^{T} (y^{n + 1} - A x^{n + 1}))}_{S^{n + 1}} ⟩ \\ = ⟨ {(x^{n + 1} - x)}_{S^{n + 1}}, {[(I - A^{T} A) (x^{n + 1} - x)]}_{S^{n + 1}} ⟩ + ⟨ {(x^{n + 1} - x)}_{S^{n + 1}}, {(A^{T} (y^{n + 1} - A x))}_{S^{n + 1}} ⟩ \\ \leq ∥ {(x^{n + 1} - x)}_{S^{n + 1}} ∥_{2} \{∥ {[(I - A^{T} A) (x^{n + 1} - x)]}_{S^{n + 1}} ∥_{2} + {∥ {(A^{T} (y^{n + 1} - A x))}_{S^{n + 1}} ∥}_{2}\} . \end{matrix}

(18)

Due to

supp (x^{n + 1}) \subseteq S^{n + 1}

,

|supp (x^{n + 1} - x) \cup S^{n + 1}| \leq 2 s

and

| S^{n + 1} | \leq s

. It follows from Lemmas 4 and 6 that if

m \geq max {C_{1} δ_{2 s}^{- 2} (2 s) \log (n / 2 s), C_{5} s \log (n / s)}

, then

\begin{matrix} ∥ {((I - A^{T} A) (x^{n + 1} - x))}_{S^{n + 1}} ∥_{2} \leq δ_{2 s} {∥ x^{n + 1} - x ∥}_{2} \end{matrix}

(19)

with a probability of at least

1 - e^{- C_{2} m}

and

\begin{matrix} ∥ {(A^{T} (y^{n + 1} - A x))}_{S^{n + 1}} ∥_{2} \leq θ (λ_{0}) \sqrt{1 + δ_{s}} {∥ x^{n} - x ∥}_{2} \end{matrix}

(20)

with a probability of at least

1 - e^{- C_{6} m}

. Combining (18), (19) with (20) leads to

∥ {(x^{n + 1} - x)}_{S^{n + 1}} ∥_{2} \leq δ_{2 s} ∥ x^{n + 1} {- x ∥}_{2} + θ (λ_{0}) \sqrt{1 + δ_{s}} {∥ x^{n} - x ∥}_{2} .

(21)

Hence,

\begin{matrix} ∥ x^{n + 1} {- x ∥}_{2}^{2} & = ∥ {(x^{n + 1} - x)}_{\bar{S^{n + 1}}} ∥_{2}^{2} + {∥ {(x^{n + 1} - x)}_{S^{n + 1}} ∥}_{2}^{2} \\ \leq ∥ {(x^{n + 1} - x)}_{\bar{S^{n + 1}}} ∥_{2}^{2} + {(δ_{2 s} ∥ x^{n + 1} {- x ∥}_{2} + θ (λ_{0}) \sqrt{1 + δ_{s}} {∥ x^{n} - x ∥}_{2})}^{2} . \end{matrix}

Denote

ϑ : = θ (λ_{0}) \sqrt{1 + δ_{s}} {∥ x^{n} - x ∥}_{2}

. This reads as

g (∥ x^{n + 1} - x ∥_{2}) \leq 0

for the quadratic polynomial defined by

g (t) : = (1 - δ_{2 s}^{2}) t^{2} - 2 ϑ δ_{2 s} t - (∥ {(x^{n + 1} - x)}_{\bar{S^{n + 1}}} ∥_{2}^{2} + ϑ^{2}) .

Hence,

∥ x^{n + 1} {- x ∥}_{2}

is bounded by the largest root of g, i.e.,

∥ x^{n + 1} {- x ∥}_{2} \leq \frac{ϑ δ_{2 s} + \sqrt{(1 - δ_{2 s}^{2}) {∥ {(x^{n + 1} - x)}_{\bar{S^{n + 1}}} ∥}_{2}^{2} + ϑ^{2}}}{1 - δ_{2 s}^{2}} .

Based on

\sqrt{a^{2} + b^{2}} \leq a + b

for

a, b \geq 0

, we obtain

\begin{matrix} ∥ x^{n + 1} {- x ∥}_{2} \leq \frac{1}{\sqrt{1 - δ_{2 s}^{2}}} {∥ {(x^{n + 1} - x)}_{\bar{S^{n + 1}}} ∥}_{2} + \frac{ϑ}{1 - δ_{2 s}} . \end{matrix}

(22)

The second step of the proof is a consequence of the Hard Thresholding step of Algorithm 1. Denote

\begin{matrix} u^{n} : = x^{n} + α A^{T} (y^{n + 1} - A x^{n}) + β (x^{n} - x^{n - 1}) . \end{matrix}

(23)

Since

S^{n + 1} = supp (H_{s} (u^{n}))

and

S : = supp (x)

, we have

∥ {(u^{n})}_{S^{n + 1}} ∥_{2}^{2} \geq {∥ {(u^{n})}_{S} ∥}_{2}^{2} .

According to

S^{n + 1} \ (S \cap S^{n + 1}) = S^{n + 1} \ S

and

S \ (S \cap S^{n + 1}) = S \ S^{n + 1}

, we yield

∥ {(u^{n})}_{S^{n + 1} \ S} ∥_{2}^{2} \geq {∥ {(u^{n})}_{S \ S^{n + 1}} ∥}_{2}^{2} .

Taking

{(x)}_{S^{n + 1} \ S} = 0

and

{(x^{n + 1})}_{S \ S^{n + 1}} = 0

into consideration, we write

∥ {(u^{n} - x)}_{S^{n + 1} \ S} ∥_{2} \geq ∥ {(x - x^{n + 1} + u^{n} - x)}_{S \ S^{n + 1}} ∥_{2} \geq ∥ {(x - x^{n + 1})}_{\bar{S^{n + 1}}} ∥_{2} - {∥ {(u^{n} - x)}_{S \ S^{n + 1}} ∥}_{2} .

Hence,

\begin{matrix} ∥ {(x - x^{n + 1})}_{\bar{S^{n + 1}}} ∥_{2} & \leq ∥ {(u^{n} - x)}_{S \ S^{n + 1}} ∥_{2} + {∥ {(u^{n} - x)}_{S^{n + 1} \ S} ∥}_{2} \\ \leq \sqrt{2 (∥ {(u^{n} - x)}_{S \ S^{n + 1}} ∥_{2}^{2} + {∥ {(u^{n} - x)}_{S^{n + 1} \ S} ∥}_{2}^{2})} \\ = \sqrt{2} {∥ {(u^{n} - x)}_{S^{n + 1} △ S} ∥}_{2} . \end{matrix}

(24)

Note from (23) that

\begin{matrix} u^{n} - x \\ = x^{n} + α A^{T} (y^{n + 1} - A x^{n}) + β (x^{n} - x^{n - 1}) - x \\ = (1 - α + β) (x^{n} - x) - β (x^{n - 1} - x) + α (I - A^{T} A) (x^{n} - x) + α A^{T} (y^{n + 1} - A x) . \end{matrix}

Then,

\begin{matrix} ∥ {(u^{n} - x)}_{S^{n + 1} △ S} ∥_{2} \\ \leq |1 - α + β| \cdot ∥ {(x^{n} - x)}_{S^{n + 1} △ S} ∥_{2} + α {∥ {[(I - A^{T} A) (x^{n} - x)]}_{S^{n + 1} △ S} ∥}_{2} \\ + α ∥ {(A^{T} (y^{n + 1} - A x))}_{S^{n + 1} △ S} ∥_{2} + β {∥ {(x^{n - 1} - x)}_{S^{n + 1} △ S} ∥}_{2} \\ \leq (|1 - α + β| + α δ_{3 s} + α θ (λ_{0}) \sqrt{1 + δ_{2 s}}) ∥ x^{n} {- x ∥}_{2} + β {∥ x^{n - 1} - x ∥}_{2}, \end{matrix}

(25)

where the second inequality is based on the fact that

∥ {((I - A^{T} A) (x^{n} - x))}_{S^{n + 1} △ S} ∥_{2} \leq δ_{3 s} {∥ x^{n} - x ∥}_{2}

(26)

with a probability of at least

1 - e^{C_{2} m}

as

m \geq C_{1} δ_{3 s}^{- 1} (3 s) \log (n / 3 s)

by (9) in Lemma 4 due to

| S^{n + 1} Δ S \cup supp (x^{n} - x) | \leq 3 s

, and

∥ {(A^{T} (y^{n + 1} - A x))}_{S^{n + 1} △ S} ∥_{2} \leq θ (λ_{0}) \sqrt{1 + δ_{2 s}} {∥ x^{n} - x ∥}_{2}

(27)

with a probability of at least

1 - e^{C_{6} m}

as

m \geq C_{5} (2 s) \log (n / 2 s)

by Lemma 6 due to

| S^{n + 1} △ S | \leq 2 s

. Combining (24) with (25), we have

\begin{matrix} ∥ {(x - x^{n + 1})}_{\bar{S^{n + 1}}} ∥_{2} & \leq \sqrt{2} [(|1 - α + β| + α δ_{3 s} + α θ (λ_{0}) \sqrt{1 + δ_{2 s}}) {∥ x^{n} - x ∥}_{2} \\ + β ∥ x^{n - 1} {- x ∥}_{2}] . \end{matrix}

(28)

Putting (22) and (28) together yields

\begin{matrix} ∥ x^{n + 1} {- x ∥}_{2} \\ \leq (\frac{\sqrt{2} (|1 - α + β| + α δ_{3 s} + α θ (λ_{0}) \sqrt{1 + δ_{2 s}})}{\sqrt{1 - δ_{2 s}^{2}}} + \frac{θ (λ_{0}) \sqrt{1 + δ_{s}}}{1 - δ_{2 s}}) {∥ x^{n} - x ∥}_{2} \\ + \frac{\sqrt{2} β}{\sqrt{1 - δ_{2 s}^{2}}} {∥ x^{n - 1} - x ∥}_{2} \\ = b ∥ x^{n} {- x ∥}_{2} + η β {∥ x^{n - 1} - x ∥}_{2}, \end{matrix}

(29)

where b is given by (17). The recursive inequality (29) is in the form (8) in Lemma 3 by letting

b_{3} = 0

. Note that, to ensure the validity of (19), (20), (26) and (27) with a corresponding probability of at least

{(1 - e^{- C_{2} m})}^{2} {(1 - e^{- C_{6} m})}^{2}

, the number of measurements must satisfy

m \geq max {C_{1} δ_{2 s}^{- 2} (2 s) \log (n / 2 s), C_{1} δ_{3 s}^{- 1} (3 s) \log (n / 3 s), C_{5} s \log (n / s), C_{5} (2 s) \log (n / 2 s)} .

Hence, we can pick

C_{7}

and

C_{8}

, satisfying

C_{7} \geq \max \{C_{5}, 2 C_{5} (1 - \frac{log 2}{log (n / s)}), 2 C_{1} δ_{2 s}^{- 2} (1 - \frac{log (2)}{log (n / s)}), 3 C_{1} δ_{3 s}^{- 2} (1 - \frac{log (3)}{log (n / s)})\}

and

C_{8} \leq \frac{1}{m} ln [\frac{1}{1 - {(1 - e^{- C_{2} m})}^{2} {(1 - e^{- c_{6} m})}^{2}}] .

In other words, the above choices of

C_{7}

and

C_{8}

guarantee that (19), (20), (26), and (27) hold with a probability of at least

(1 - e^{- C_{8} m})

as

m \geq C_{7} s \log (n / s)

. This, in turn, means that the (29) holds accordingly.

In the following, we need to show that the coefficients of the right-hand side of the (29) satisfy the conditions required for Lemma 3, i.e.,

b + η β < 1

.

First, we clarify that the parameters

α, β

appearing in (13) and (14) are well-defined. On the interval

(0, 1)

, define

γ (x) : = \frac{(1 - x) \sqrt{1 + x} - x \sqrt{2 (1 - x)}}{1 + x + \sqrt{2 (1 - x^{2})}} and g (x) : = 1 + x + \sqrt{2 (1 - x^{2})} .

Note that

g^{'} (x) = 1 - (2 x / \sqrt{2 (1 - x^{2})})

and the root of

g^{'} (x) = 0

is

x = \sqrt{3} / 3

. It is easy to see

max {g (x) : x \in (0, 1)} = g (\sqrt{3} / 3) = 1 + \sqrt{3} .

Thus,

γ (x) = \frac{(1 - x) \sqrt{1 + x} - x \sqrt{2 (1 - x)}}{1 + x + \sqrt{2 (1 - x^{2})}} \geq \frac{(1 - x) \sqrt{1 + x} - x \sqrt{2 (1 - x)}}{1 + \sqrt{3}} = : h (x) .

(30)

Note that

h^{'} (x) = \frac{1}{1 + \sqrt{3}} (\frac{- 1 - 3 x}{2 \sqrt{1 + x}} - \frac{2 - 3 x}{\sqrt{2 (1 - x)}}) .

(31)

Then,

h^{'} (x) = 0

is equivalent to saying that

(- 1 - 3 x) \sqrt{2 (1 - x)} = 2 (2 - 3 x) \sqrt{(1 + x)}

. Taking the square on both sides yields

(x - 1 / 3) (27 x^{2} - 21) = 27 x^{3} - 9 x^{2} - 21 x + 7 = 0

. The roots of the above equation in the interval

(0, 1)

are

x = 1 / 3, \sqrt{7} / 3

. By substituting these into (31), we obtain the root of

h^{'} (x) = 0

in (0,1) is

\sqrt{7} / 3

. It is easy to see that h is monotonically decreasing on

(0, \sqrt{7} / 3)

and increasing on (

\sqrt{7} / 3

,1). Due to

h (1) = 0

,

h (x) < 0

as

x \in [\sqrt{7} / 3, 1)

by the monotonicity of h. On the other hand,

θ

is monotonically increasing and

0 \leq θ (λ_{0}) \leq θ (1 / 8) \approx 0.295 < 0.366 \approx h (0)

as

λ_{0} \in (0, 1 / 8]

. Hence for the given

λ_{0} \in (0, 1 / 8]

and

θ (λ_{0})

, there is a unique solution in the interval

(0, \sqrt{7} / 3)

, denoted by

δ^{♯}

, satisfying

h (x) = θ (λ_{0})

. In addition according to the decreasing monotonicity of h on this interval, we know

h (x) > θ (λ_{0})

for all

x \in (0, δ^{♯})

. Thus,

h (δ_{3 s}) > θ (λ_{0})

as

δ_{3 s} < δ^{♯}

. This, together with (30), yields

γ (δ_{3 s}) > θ (λ_{0})

, i.e.,

\frac{\sqrt{1 - δ_{3 s}^{2}} - \sqrt{2} δ_{3 s}}{\frac{1 + δ_{3 s}}{\sqrt{1 - δ_{3 s}}} + \sqrt{2 (1 + δ_{3 s})}} = \frac{(1 - δ_{3 s}) \sqrt{1 + δ_{3 s}} - δ_{3 s} \sqrt{2 (1 - δ_{3 s})}}{1 + δ_{3 s} + \sqrt{2 (1 - δ_{3 s}^{2})}} > θ (λ_{0}) .

(32)

It follows from (32) that

\begin{matrix} \frac{\sqrt{1 - δ_{3 s}^{2}}}{\sqrt{2}} - δ_{3 s} & > θ (λ_{0}) \frac{1 + δ_{3 s}}{\sqrt{2 (1 - δ_{3 s})}} + θ (λ_{0}) \sqrt{1 + δ_{3 s}} \\ = θ (λ_{0}) \frac{\sqrt{1 - δ_{3 s}^{2}}}{\sqrt{2}} \frac{\sqrt{1 + δ_{3 s}}}{1 - δ_{3 s}} + θ (λ_{0}) \sqrt{1 + δ_{3 s}}, \end{matrix}

(33)

where the second step comes from

\frac{1 + δ_{3 s}}{\sqrt{2 (1 - δ_{3 s})}} = \frac{\sqrt{1 - δ_{3 s}^{2}}}{\sqrt{2}} \frac{\sqrt{1 + δ_{3 s}}}{1 - δ_{3 s}} .

Multiplying

\frac{\sqrt{2}}{\sqrt{1 - δ_{3 s}^{2}}}

on both sides of (33) yields

1 - θ (λ_{0}) \frac{\sqrt{1 + δ_{3 s}}}{1 - δ_{3 s}} > \frac{\sqrt{2}}{\sqrt{1 - δ_{3 s}^{2}}} (δ_{3 s} + θ (λ_{0}) \sqrt{1 + δ_{3 s}}) \geq η (δ_{3 s} + θ (λ_{0}) \sqrt{1 + δ_{3 s}}),

(34)

where the last step comes from

\frac{\sqrt{2}}{\sqrt{1 - δ_{3 s}^{2}}} \geq \frac{\sqrt{2}}{\sqrt{1 - δ_{2 s}^{2}}} = η

(35)

due to

δ_{2 s} \leq δ_{3 s}

. Note that (35) can be rewritten as

\begin{matrix} \frac{1}{η} - \frac{θ (λ_{0}) \sqrt{1 + δ_{3 s}}}{η (1 - δ_{3 s})} > δ_{3 s} + θ (λ_{0}) \sqrt{1 + δ_{3 s}} . \end{matrix}

(36)

Using the fact

δ_{s} \leq δ_{2 s} \leq δ_{3 s}

again, (36) rearranges into

\begin{matrix} 0 < \frac{\frac{1}{η} + 1 - \frac{θ (λ_{0}) \sqrt{1 + δ_{s}}}{η (1 - δ_{2 s})}}{1 + δ_{3 s} + θ (λ_{0}) \sqrt{1 + δ_{2 s}}} - 1 . \end{matrix}

This shows that the range for

β

in (13) is well-defined. Since

β < \frac{\frac{1}{η} + 1 - \frac{θ (λ_{0}) \sqrt{1 + δ_{s}}}{η (1 - δ_{2 s})}}{1 + δ_{3 s} + θ (λ_{0}) \sqrt{1 + δ_{2 s}}} - 1

, then

\begin{matrix} 1 + 2 β - (1 - δ_{3 s} - θ (λ_{0}) \sqrt{1 + δ_{2 s}}) β & = 1 + [2 - (1 - δ_{3 s} - θ (λ_{0}) \sqrt{1 + δ_{2 s}})] β \\ < \frac{1}{η} - \frac{θ (λ_{0}) \sqrt{1 + δ_{s}}}{η (1 - δ_{2 s})} + (1 - δ_{3 s} - θ (λ_{0}) \sqrt{1 + δ_{2 s}}), \end{matrix}

implying

\frac{1 + 2 β - \frac{1}{η} + \frac{θ (λ_{0}) \sqrt{1 + δ_{s}}}{η (1 - δ_{2 s})}}{1 - δ_{3 s} - θ (λ_{0}) \sqrt{1 + δ_{2 s}}} < 1 + β < \frac{\frac{1}{η} + 1 - \frac{θ (λ_{0}) \sqrt{1 + δ_{s}}}{η (1 - δ_{2 s})}}{1 + δ_{3 s} + θ (λ_{0}) \sqrt{1 + δ_{2 s}}} .

This means that the range for

α

in (14) is also well-defined.

To show

b + η β < 1

, let us consider the following two cases.

Case 1:: if

$\begin{matrix} \frac{1 + 2 β - \frac{1}{η} + \frac{θ (λ_{0}) \sqrt{1 + δ_{s}}}{η (1 - δ_{2 s})}}{1 - δ_{3 s} - θ (λ_{0}) \sqrt{1 + δ_{2 s}}} < α \leq 1 + β, \end{matrix}$

then

$α (1 - δ_{3 s} - θ (λ_{0}) \sqrt{1 + δ_{2 s}}) > 1 + 2 β - \frac{1}{η} + \frac{θ (λ_{0}) \sqrt{1 + δ_{s}}}{η (1 - δ_{2 s})},$

and hence

$\begin{matrix} b & = η (|1 - α + β| + α (δ_{3 s} + θ (λ_{0}) \sqrt{1 + δ_{2 s}})) + \frac{θ (λ_{0}) \sqrt{1 + δ_{s}}}{1 - δ_{2 s}} \\ = η (1 + β - α (1 - δ_{3 s} - θ (λ_{0}) \sqrt{1 + δ_{2 s}})) + \frac{θ (λ_{0}) \sqrt{1 + δ_{s}}}{1 - δ_{2 s}} \\ < 1 - η β . \end{matrix}$
Case 2:: if

$\begin{matrix} 1 + β < α < \frac{\frac{1}{η} + 1 - \frac{θ (λ_{0}) \sqrt{1 + δ_{s}}}{η (1 - δ_{2 s})}}{1 + δ_{3 s} + θ (λ_{0}) \sqrt{1 + δ_{2 s}}}, \end{matrix}$

then

$α (1 + δ_{3 s} + θ (λ_{0}) \sqrt{1 + δ_{2 s}}) < \frac{1}{η} + 1 - \frac{θ (λ_{0}) \sqrt{1 + δ_{s}}}{η (1 - δ_{2 s})},$

and hence

$\begin{matrix} b & = η (|1 - α + β| + α (δ_{3 s} + θ (λ_{0}) \sqrt{1 + δ_{2 s}})) + \frac{θ (λ_{0}) \sqrt{1 + δ_{s}}}{1 - δ_{2 s}} \\ = η (- 1 - β + α (1 + δ_{3 s} + θ (λ_{0}) \sqrt{1 + δ_{2 s}})) + \frac{θ (λ_{0}) \sqrt{1 + δ_{s}}}{1 - δ_{2 s}} \\ < 1 - η β . \end{matrix}$

On both sides, we always obtain

b < 1 - η β

, i.e.,

b + η β < 1

. Therefore,

τ < 1

by Lemma 3.

According to the recursive relation (29) and

b + η β < 1

, as shown above, we obtain

∥ x^{n + 1} {- x ∥}_{2} \leq b ∥ x^{n} {- x ∥}_{2} + η β ∥ x^{n - 1} {- x ∥}_{2} \leq λ_{0} {(b + η β) ∥ x ∥}_{2} < λ_{0} {∥ x ∥}_{2} .

Hence, it follows from (29) and Lemma 3 that

∥ x^{n + 1} {- x ∥}_{2} \leq τ^{n} (∥ x^{1} {- x ∥}_{2} + (τ - b) {∥ x^{0} - x ∥}_{2}) .

(37)

Following the above proof process, we can show that (37) holds true by replacing x by

- x

, i.e.,

∥ x^{n + 1} {+ x ∥}_{2} \leq τ^{n} (∥ x^{1} {+ x ∥}_{2} + (τ - b) {∥ x^{0} + x ∥}_{2}) .

(38)

By putting (37) and (38) together and taking into account the definition of ’dist’ given in (6), the desired result (16) follows. □

The above analysis assumes that the measurements are noiseless. The following result demonstrates that the Algorithm 1 is robust in the presence of noise.

Theorem 2.

(The noisy case): Let

{\{a_{i}\}}_{i = 1}^{m}

be i.i.d. Gaussian random vectors with a mean of 0 and variance matrix of I. Suppose that the RIC,

δ_{3 s}

, of matrix A and the parameters α and β satisfy the conditions (13) and (14). Take a s-sparse signal

x \in R^{n}

and

e \in R^{m}

, satisfying

{∥ e ∥}_{2} \leq \frac{λ_{0} (1 - δ^{♮}) (1 - b - η β)}{\sqrt{2 (1 - δ^{♮})} + \sqrt{1 + δ^{♮}}} {∥ x ∥}_{2},

where

λ_{0} \in (0, 1 / 8]

, b, η,

δ^{♯}

are given in Theorem 1. The universal positive constants

C_{7}

,

C_{8}

exist, such that if

\begin{matrix} m \geq C_{7} s log (n / s), dist (x^{0}, x) \leq λ_{0} {∥ x ∥}_{2}, \end{matrix}

then the sequence

\{x^{n}\}

generated by Algorithm 1 with the input measured data

y = |A x| + e

satisfies

\begin{matrix} dist (x^{n + 1}, x) & \leq & τ^{n} (min {∥ x^{1} - x ∥_{2} + (τ - b) ∥ x^{0} - x ∥_{2}, ∥ x^{1} + x ∥_{2} + (τ - b) ∥ x^{0} + x ∥_{2}}) \\ + \frac{1}{1 - τ} (α η \sqrt{1 + δ_{2 s}} + \frac{\sqrt{1 + δ_{s}}}{1 - δ_{2 s}}) {∥ e ∥}_{2}, \forall n, \end{matrix}

with a probability of at least

{(1 - e^{- C_{8} m})}^{n}

, where

τ < 1

is given by (17).

Proof of Theorem 2.

Let

{\{(x^{n}, y^{n})\}}_{n \geq 1}

be the iterate sequence generated by Algorithm 1 in the framework of noisy data. Thus, in this case

y^{n + 1} = (| A x | + e) ⊙ sgn (A x^{n})

. Following a similar argument to that in [15] [Equation (27)], we obtain from Lemma 6 that

\begin{matrix} ∥ {[A^{T} (y^{n + 1} - A x)]}_{S^{n + 1}} ∥_{2} \\ = ∥ {[A^{T} ((| A x | + e) ⊙ sgn (A x^{n}) - A x)]}_{S^{n + 1}} ∥_{2} \\ \leq ∥ {[A^{T} (| A x | ⊙ sgn (A x^{n}) - A x)]}_{S^{n + 1}} ∥_{2} + {∥ {[A^{T} (e ⊙ sgn (A x^{n}))]}_{S^{n + 1}} ∥}_{2} \\ \leq θ (λ_{0}) \sqrt{1 + δ_{s}} ∥ x^{n} {- x ∥}_{2} + \sqrt{1 + δ_{s}} {∥ e ∥}_{2} . \end{matrix}

Now, we need to modify the proof of Theorem 1 for the noisy case, i.e., some formula appearing in the proof of Theorem 1 should be replaced. Precisely, (21) and (22) take the following forms:

∥ {(x^{n + 1} - x)}_{S^{n + 1}} ∥_{2} \leq δ_{2 s} ∥ x^{n + 1} {- x ∥}_{2} + θ (λ_{0}) \sqrt{1 + δ_{s}} ∥ x^{n} {- x ∥}_{2} + \sqrt{1 + δ_{s}} {∥ e ∥}_{2}

and

\begin{matrix} ∥ x^{n + 1} {- x ∥}_{2} \leq \frac{1}{\sqrt{1 - δ_{2 s}^{2}}} {∥ {(x^{n + 1} - x)}_{\bar{S^{n + 1}}} ∥}_{2} + \frac{ϑ + ω}{1 - δ_{2 s}}, \end{matrix}

(39)

where

ω : = \sqrt{1 + δ_{s}} {∥ e ∥}_{2}

and

ϑ : = θ (λ_{0}) \sqrt{1 + δ_{s}} {∥ x^{n} - x ∥}_{2}

. In addition, (25) is modified to

\begin{matrix} ∥ {(u^{n} - x)}_{S^{n + 1} △ S} ∥_{2} & \leq (|1 - α + β| + α δ_{3 s} + α θ (λ_{0}) \sqrt{1 + δ_{2 s}}) {∥ x^{n} - x ∥}_{2} \\ + β ∥ x^{n - 1} {- x ∥}_{2} + α \sqrt{1 + δ_{2 s}} {∥ e ∥}_{2} . \end{matrix}

(40)

Combining (24) with (40), the inequality (28) is revised as

\begin{matrix} ∥ {(x^{n + 1} - x)}_{\bar{S^{n + 1}}} ∥_{2} & \leq \sqrt{2} [(|1 - α + β| + α δ_{3 s} + α θ (λ_{0}) \sqrt{1 + δ_{2 s}}) {∥ x^{n} - x ∥}_{2} \\ + β ∥ x^{n - 1} {- x ∥}_{2} + α \sqrt{1 + δ_{2 s}} {∥ e ∥}_{2}] . \end{matrix}

(41)

Putting (39) and (41) together and recalling the definitions of

ϑ

and

ω

above, we then have

\begin{matrix} ∥ x^{n + 1} {- x ∥}_{2} \\ \leq (\frac{\sqrt{2} (|1 - α + β| + α δ_{3 s} + α θ (λ_{0}) \sqrt{1 + δ_{2 s}})}{\sqrt{1 - δ_{2 s}^{2}}} + \frac{θ (λ_{0}) \sqrt{1 + δ_{s}}}{1 - δ_{2 s}}) {∥ x^{n} - x ∥}_{2} \\ + \frac{\sqrt{2} β}{\sqrt{1 - δ_{2 s}^{2}}} ∥ x^{n - 1} {- x ∥}_{2} + (\frac{α \sqrt{2 (1 + δ_{2 s})}}{\sqrt{1 - δ_{2 s}^{2}}} + \frac{\sqrt{1 + δ_{s}}}{1 - δ_{2 s}}) {∥ e ∥}_{2} \\ = b ∥ x^{n} {- x ∥}_{2} + η β ∥ x^{n - 1} {- x ∥}_{2} + (α η \sqrt{1 + δ_{2 s}} + \frac{\sqrt{1 + δ_{s}}}{1 - δ_{2 s}}) {∥ e ∥}_{2}, \end{matrix}

(42)

where b,

η

,

β

are given exactly as shown in Theorem 1. Hence,

b + η β < 1

, as

α

,

β

satisfy conditions (13) and (14). This further ensures that

τ < 1

by applying Lemma 3 to the recursive Formula (42).

It remains to be shown that the iterative sequence

x^{n}

belongs into a neighbor of x. First,

∥ x^{0} {- x ∥}_{2} \leq λ_{0} {∥ x ∥}_{2}

,

∥ x^{1} {- x ∥}_{2} \leq λ_{0} {∥ x ∥}_{2}

. Now, assume

∥ x^{n - 1} {- x ∥}_{2} \leq λ_{0} {∥ x ∥}_{2}

,

∥ x^{n} {- x ∥}_{2} \leq λ_{0} {∥ x ∥}_{2}

. We claim that

∥ x^{n + 1} {- x ∥}_{2} \leq λ_{0} {∥ x ∥}_{2}

. Indeed, recall that

{∥ e ∥}_{2} \leq \frac{λ_{0} (1 - δ^{♮}) (1 - b - η β)}{α \sqrt{2 (1 - δ^{♮})} + \sqrt{1 + δ^{♮}}} {∥ x ∥}_{2},

i.e.,

\begin{matrix} \frac{α \sqrt{2 (1 - δ^{♮})} + \sqrt{1 + δ^{♮}}}{1 - δ^{♮}} {∥ e ∥}_{2} \leq λ_{0} (1 - b - η β) {∥ x ∥}_{2} . \end{matrix}

(43)

Since

0 \leq δ_{s} \leq δ_{2 s} \leq δ_{3 s} < δ^{♮}

and

η = \sqrt{2} / \sqrt{1 - δ_{2 s}^{2}}

, we obtain

\begin{matrix} \frac{α \sqrt{2 (1 - δ^{♮})} + \sqrt{1 + δ^{♮}}}{1 - δ^{♮}} & = \frac{\sqrt{2} α}{\sqrt{1 - δ^{♮}}} + \frac{\sqrt{1 + δ^{♮}}}{1 - δ^{♮}} > \frac{\sqrt{2} α}{\sqrt{1 - δ_{3 s}}} + \frac{\sqrt{1 + δ_{3 s}}}{1 - δ_{3 s}} \\ \geq \frac{\sqrt{2} α}{\sqrt{1 - δ_{2 s}}} + \frac{\sqrt{1 + δ_{s}}}{1 - δ_{2 s}} = α η \sqrt{1 + δ_{2 s}} + \frac{\sqrt{1 + δ_{s}}}{1 - δ_{2 s}} . \end{matrix}

(44)

It follows from (43), (44) and the recursive relation (42) that

\begin{matrix} ∥ x^{n + 1} {- x ∥}_{2} & \leq b ∥ x^{n} {- x ∥}_{2} + η β ∥ x^{n - 1} {- x ∥}_{2} + (α η \sqrt{1 + δ_{2 s}} + \frac{\sqrt{1 + δ_{s}}}{1 - δ_{2 s}}) {∥ e ∥}_{2} \\ < λ_{0} {(b + η β) ∥ x ∥}_{2} + (\frac{α \sqrt{2 (1 - δ^{♮})} + \sqrt{1 + δ^{♮}}}{1 - δ^{♮}}) {∥ e ∥}_{2} \\ \leq λ_{0} {(b + η β) ∥ x ∥}_{2} + λ_{0} (1 - b - η β) {∥ x ∥}_{2} \\ = λ_{0} {∥ x ∥}_{2} . \end{matrix}

Taking into account (42) and applying Lemma 3, we obtain

∥ x^{n + 1} {- x ∥}_{2} \leq τ^{n} (∥ x^{1} {- x ∥}_{2} + (τ - b) {∥ x^{0} - x ∥}_{2}) + \frac{1}{1 - τ} (α η \sqrt{1 + δ_{2 s}} + \frac{\sqrt{1 + δ_{s}}}{1 - δ_{2 s}}) {∥ e ∥}_{2} .

Moreover, this inequality holds true by following symmetrical arguments, such as replacing x by

- x

. Thus, according to the definition of ’dist’ given in (6), the desired result follows. □

Finite Termination

In the framework of accurate measurements, the true signal can be recovered after a finite number of iterations. An estimation of the iteration number is given below.

Theorem 3.

Let

{\{a_{i}\}}_{i = 1}^{m}

be i.i.d. Gaussian random vectors with a mean of 0 and variance matrix of I. Suppose that the RIC,

δ_{3 s}

, of the matrix A and the parameters α and β satisfy the conditions (13) and (14) with

λ_{0} \in (0, \frac{1}{8}]

. For a s-sparse signal

x \in R^{n}

and an initial point

x^{0}

given in Algorithm 1, the universal positive constants

C_{7}

,

C_{8}

exist, such that if

m \geq C_{7} s log (n / s), dist (x^{0}, x) \leq λ_{0} {∥ x ∥}_{2},

then the s-sparse signal

sgn (∥ x + x_{0} ∥_{2} - ∥ x - x_{0} ∥_{2}) x

is recovered by Algorithm 1 with

y = | A x |

in, at most,

p : = max \{⌈\frac{log (\sqrt{2} λ_{0} (τ - b + 1) {∥ x ∥}_{2} / η ξ_{1})}{log (1 / τ)}⌉, ⌈\frac{log (λ_{0} θ (λ_{0}) (1 + δ_{s}) {∥ x ∥}_{2}^{2} / {|ξ_{2}|}^{2})}{log (1 / τ)}⌉\} + 1

iterations with a probability of at least

{(1 - e^{- C_{8} m})}^{p}

, where

θ (λ_{0})

, η and τ, b are given by (15) and (17), respectively, in Theorem 1, and

ξ_{1}

,

ξ_{2}

are the smallest nonzero entries of x, y in the modulus.

Proof of Theorem 3.

The case of

x = 0

is trivial. Indeed, under this case, the initial point

x^{0}

generated by the initial step of the Algorithm 1 is 0. This means that the true signal x is recovered in, at most, one step. Therefore, we only need to consider the case of

x \neq 0

.

We first assume that

dist (x^{0}, x) = {∥ x^{0} - x ∥}_{2}

. Now, we need to determine an integer n such that

S^{n + 1} = S

. Recall that

u^{n}

is given as shown in (23). According to the definitions of

S^{n + 1}

, for all

j \in S

and

l \in \bar{S}

, we have

\begin{matrix} | {(u^{n})}_{j} | \geq | {(u^{n})}_{l} | . \end{matrix}

(45)

Note that

\begin{matrix} | {(u^{n})}_{j} | & = | x_{j} + (1 - α + β) {(x^{n} - x)}_{j} + α {[(I - A^{T} A) (x^{n} - x)]}_{j} + α {[A^{T} (y^{n + 1} - A x)]}_{l} \\ - β {(x^{n - 1} - x)}_{j} | \\ \geq ξ_{1} - | 1 - α + β | \cdot | {(x^{n} - x)}_{j} | - α | {[(I - A^{T} A) (x^{n} - x)]}_{j} | \\ - α | {[A^{T} (y^{n + 1} - A x)]}_{j} | - β | {(x^{n - 1} - x)}_{j} |, \end{matrix}

where

ξ_{1}

is the smallest nonzero entry of x in the modulus, and

\begin{matrix} | {(u^{n})}_{l} | & = | x_{l} + (1 - α + β) {(x^{n} - x)}_{l} + α {[(I - A^{T} A) (x^{n} - x)]}_{l} + α {[A^{T} (y^{n + 1} - A x)]}_{l} \\ - β {(x^{n - 1} - x)}_{l} | \\ \leq | 1 - α + β | \cdot | {(x^{n} - x)}_{l} | + α | {[(I - A^{T} A) (x^{n} - x)]}_{l} | + α | {[A^{T} (y^{n + 1} - A x)]}_{l} | \\ + β | {(x^{n - 1} - x)}_{l} |, \end{matrix}

where

x_{l} = 0

due to

l \in \bar{S}

. Combining the above two inequalities yields

\begin{matrix} | {(u^{n})}_{l} | - | {(u^{n})}_{j} | \\ \leq | 1 - α + β | \cdot (| {(x^{n} - x)}_{l} | + | {(x^{n} - x)}_{j} |) + α (| {[(I - A^{T} A) (x^{n} - x)]}_{l} | \\ + | {[(I - A^{T} A) (x^{n} - x)]}_{j} |) + α (| {[A^{T} (y^{n + 1} - A x)]}_{l} | + | {[A^{T} (y^{n + 1} - A x)]}_{j} |) \\ + β (| {(x^{n - 1} - x)}_{l} | + | {(x^{n - 1} - x)}_{j} |) - ξ_{1} \\ \leq \sqrt{2} (| 1 - α + β | \cdot ∥ {(x^{n} - x)}_{\{l, j\}} ∥_{2} + α {∥ {[(I - A^{T} A) (x^{n} - x)]}_{\{l, j\}} ∥}_{2} \\ + α ∥ {[A^{T} (y^{n + 1} - A x)]}_{\{l, j\}} ∥_{2} + β {∥ {(x^{n - 1} - x)}_{\{l, j\}} ∥}_{2}) - ξ_{1} . \end{matrix}

(46)

It follows from (17) that

η (| 1 - α + β | + α (δ_{3 s} + θ (λ_{0}) \sqrt{1 + δ_{2 s}})) < b, τ (τ - b) = η β, and b < τ .

Note that

supp (x^{n} - x) \cup {l, j} \subseteq S \cup \{l\}

due to

j \in S

. Hence, by (9), (17), and Lemma 6, we have

\begin{matrix} | {(u^{n})}_{l} | - | {(u^{n})}_{j} | \\ \leq \sqrt{2} (| 1 - α + β | + α (δ_{3 s} + θ (λ_{0}) \sqrt{1 + δ_{2 s}}) ∥ x^{n} {- x ∥}_{2} + β {∥ x^{n - 1} - x ∥}_{2}) - ξ_{1} \\ < \frac{\sqrt{2}}{η} (b ∥ x^{n} {- x ∥}_{2} + τ (τ - b) {∥ x^{n - 1} - x ∥}_{2}) - ξ_{1} \\ < \frac{\sqrt{2}}{η} τ (∥ x^{n} {- x ∥}_{2} + (τ - b) {∥ x^{n - 1} - x ∥}_{2}) - ξ_{1} . \end{matrix}

(47)

According to (29), we have

\begin{matrix} ∥ x^{n} {- x ∥}_{2} + (τ - b) {∥ x^{n - 1} - x ∥}_{2} \\ \leq b ∥ x^{n - 1} {- x ∥}_{2} + η β ∥ x^{n - 2} {- x ∥}_{2} + (τ - b) {∥ x^{n - 1} - x ∥}_{2} \\ = τ (∥ x^{n - 1} {- x ∥}_{2} + (τ - b) {∥ x^{n - 2} - x ∥}_{2}) \\ \leq τ^{n - 1} (∥ x^{1} {- x ∥}_{2} + (τ - b) {∥ x^{0} - x ∥}_{2}) \\ \leq τ^{n - 1} (λ_{0} {∥ x ∥}_{2} + (τ - b) λ_{0} {∥ x ∥}_{2}) \\ = τ^{n - 1} λ_{0} (τ - b + 1) {∥ x ∥}_{2} . \end{matrix}

Taking into account the definition of

u^{n}

in (23), we know that

S^{n + 1} = S

is satisfied as soon as (45) holds. Due to (47), this can be ensured, provided that

\frac{\sqrt{2} λ_{0} (τ - b + 1) {∥ x ∥}_{2}}{η} τ^{n} \leq ξ_{1}, i . e ., n \geq \frac{log (\sqrt{2} λ_{0} (τ - b + 1) {∥ x ∥}_{2} / η ξ_{1})}{log (1 / τ)} .

We next demonstrate that

y = | A x | \neq 0

by contradiction. If

y = | A x | = 0

, i.e.,

A x = 0

, according to Lemma 2, then A satisfies

(1 - δ_{s}) {∥ x ∥}_{2}^{2} \leq {∥ A x ∥}_{2}^{2}

with a probability of at least

1 - e^{- C_{2} m}

as

m \geq C_{1} δ_{s}^{- 2} s log (n / s)

. Due to

A x = 0

and

δ_{3 s} < 1

, we can deduce that

x = 0 .

This contradicts the hypothesis.

Now, we also need to determine an integer n, such that

y^{n + 1} = A x

. Denote

n_{1} : = ⌈\frac{log (\sqrt{2} λ_{0} (τ - b + 1) {∥ x ∥}_{2} / η ξ_{1})}{log (1 / τ)}⌉ + 1

and

n_{2} : = ⌈\frac{log (λ_{1} θ (λ_{0}) (1 + δ_{s}) {∥ x ∥}_{2}^{2} / {|ξ_{2}|}^{2})}{log (1 / τ)}⌉ + 1,

where

θ (λ_{0})

and

τ

are given by (15) and (17), respectively, in Theorem 1, and

ξ_{2}

is the minimum nonzero entry of y. According to the argument of proof given in part (b) of [15] [Theorem 1], we can obtain that

y^{n + 1} = A x

as

n \geq max \{n_{1}, n_{2}\}

. Since it is already known from the previous proof that

n_{1}

is the smallest positive integer, such that

S^{n + 1} = S

, then, as

n \geq max \{n_{1}, n_{2}\}

, we have both

S^{n + 1} = S

and

y^{n + 1} = A x

. This ensures that, in the Algorithm 1,

x^{n + 1} = arg \min \{\frac{1}{2} {∥ A z - A x ∥}_{2}^{2} ∣ supp (z) \subseteq S\} .

(48)

Thus,

x^{n + 1} = x = sgn (∥ x + x^{0} ∥_{2} - ∥ x - x^{0} ∥_{2}) x

due to

dist (x^{0}, x) = ∥ x - x^{0} ∥_{2} \leq {∥ x + x^{0} ∥}_{2}

. Therefore, Algorithm 1 successfully recovers the s-sparse signal

sgn (∥ x + x^{0} ∥_{2} - ∥ x - x^{0} ∥_{2}) x

after a finite number of iterations.

Similarly, if

dist (x^{0}, x) = {∥ x + x^{0} ∥}_{2}

, we can obtain that

S^{n + 1} = S

and

y^{n + 1} = A (- x)

as

n \geq max \{n_{1}, n_{2}\}

. In this case, (48) takes the form

x^{n + 1} = arg \min \{\frac{1}{2} {∥ A z + A x ∥}_{2}^{2} ∣ supp (z) \subseteq S\} .

Thus,

x^{n + 1} = - x = sgn (∥ x + x^{0} ∥_{2} - ∥ x - x^{0} ∥_{2}) x

due to

dist (x^{0}, x) = ∥ x + x^{0} ∥_{2} \leq {∥ x - x^{0} ∥}_{2}

. This completes the proof. □

4. Numerical Experiments

In order to make our paper more convenient and clear to read, we give Table 1 before the numerical experiments.

Table 1. The descriptions of the parameters and the abbreviations of the algorithm names.

All experiments were performed on a laptop with an Apple M1 processor and 8GB memory by using MATLAB R2021a. In terms of the recovery capability and average runtime, a comparison of SPR-HBHTP with three popular algorithms, including SPR-HTP [15] (an extension of HTP from traditional compressed sensing to sparse phase recovery), CoPRAM [39] (a combination of the classical alternating minimization approach for phase retrieval with the CoSaMP algorithm for sparse recovery) and SPARTA [38] (a combination of TWF and TAF) for the sparse phase retrieval problem is shown in this section.

In the following experiments, the s-sparse vector

x \in R^{n}

with a fixed dimension of

n = 3000

is randomly generated, whose nonzero entries are independent identically distributed (i.i.d) and follow the standard normal distribution

N (0, 1)

. Particularly, the indices in support of x follow a uniform distribution. The measurement matrix A is the i.i.d Gaussian matrix whose elements follow

N (0, 1 / m)

, which satisfies the RIP with a high probability as m is large enough (Chapter 9, [30]). On the other hand, the observed modulus data y are expressed as follows:

y = |A x|

in noiseless environments, and

y = |A x| + κ e

in noisy environments, where e is the noise vector with elements following

N (0, 1 / m)

and

κ > 0

is the noise level.

The maximum number of iterations for all algorithms was set as 50. All initial points of the algorithms were generated by initialization in SPR-HBHTP, in which two initial points

x^{1} = x^{0}

were produced for SPR-HBHTP, while the other algorithms just needed an initial point

x^{0}

. The algorithmic parameters of SPARTA were

γ = 0.7

,

μ = 1

, and

I = ⌊m / 6⌋

[38], and the stepsize of SPR-HTP was set as

μ = 0.75

[15]. The choice of algorithmic parameters for SPR-HBHTP is discussed in Section 4.1. For each sparsity level s, 100 independent trials were used to test the success rates of the algorithms. The trial was said to be a ’success’ if the following recovery condition

\frac{dist (x, x^{n})}{{∥ x ∥}_{2}} \leq 10^{- 3}

was satisfied, where x is the target signal,

x^{n}

is the corresponding approximated value generated by the algorithm, and the distance

dist (x, x^{n})

is given by (6).

4.1. Choices of Parameters $α$ and $β$

In order to choose suitable algorithmic parameters

(α, β)

in terms of the recovery capability, we considered the success rates of SPR-HBHTP for recovery with a fixed value of

m = 3000

in noiseless environments. The numerical results are displayed in Figure 1, in which the sparsity level s ranges from 40 to 260 with a stepsize of 10. Figure 1a corresponds to the case with

β \in {0, 0.1, 0.3, 0.5, 0.7, 0.9}

and

α = 8

, while Figure 1b corresponds to the case with

α \in {1, 2, 3, 4, 6, 8}

and

β = 0.7

. Figure 1a shows that the recovery ability of SPR-HBHTP becomes stronger with an increase in the coefficient

β

of the momentum. Note that SPR-HBHTP reduces to SPR-HTP as

β = 0

, which indicates that an important role played by the momentum is to enlarge the range of the stepsize

α

of SPR-HBHTP. From Figure 1b, we can see that SPR-HBHTP is sensitive to the stepsize

α

, and its recovery capability with

α = 2

is stronger than that of

α = 1, 3, 8

. However, the recovery effect of SPR-HBHTP with

α = 2

is worse than that of

α = 4, 6

for large sparsity levels s, while the former is slightly better than the latter for small s. Thus, we use

α = 2

and

β = 0.7

for the rest of the article.

Figure 1. Comparison of the success rates of SPR-HBHTP with different parameters. (a) Different momentum coefficients

β

. (b) Different stepsizes

α

.

4.2. Phase Transition

In this section, we use the phase transition curve (PTC) [42,43] and grayscale map to compare the recovery capabilities of the algorithms.

4.2.1. Phase Transition Curves

The phase transition curve is a logistic regression curve identifying a 50% success rate for the given algorithms in this paper. Indeed, a ratio of 50% can be replaced by other values in the interval

(0, 1)

based on the practical background. Denote

δ = m / n

and

ρ = s / m .

The

(δ, ρ)

-plane is separated by the PTC of an algorithm into success and failure regions (see Figure 2). The former corresponds to the region below the PTC, wherein the algorithm can reconstruct the target sparse signal successfully; the latter corresponds to the region above the PTC. In other words, the higher the PTC, the stronger the recovery capability of the algorithm.

Figure 2. Comparison of the PTCs for the algorithms with accurate/inaccurate measurements. (a) Accurate measurements. (b) Inaccurate measurements with

κ

= 0.01.

To generate the PTC,

(δ, ρ)

are taken as follows

\begin{matrix} δ \in \{0.05, 0.07, 0.09, 0.1, 0.1445, \dots, 0.99\}, ρ = s / m, \end{matrix}

(49)

where the interval

[0.1, 0.99]

is equally divided into 20 parts. For each

δ

given in (49), the ‘glmfit’ function in Matlab is used to generate the logistic regression curve based on the success rates with different sparsity levels s. The PTC is obtained directly by identifying the point in the logistic regression curve with a 50% success rate. This technique is almost the same as that shown in [36], except that the ‘glmfit’ function is replaced by the logistic regression model. Thus, the process of the generation of the PTC is omitted here, and the interested reader can consult [36] for detailed information.

The comparison of the PTCs for algorithms including SPR-HBHTP, SPR-HTP, CoPRAM, and SPARTA is shown in Figure 2, wherein Figure 2a,b correspond to accurate and inaccurate measurements, respectively. From Figure 2a, we see that SPR-HBHTP has the highest PTC as

δ > 0.2

, which indicates that the recovery capability of SPR-HBHTP is stronger than that of the other three algorithms, especially for larger

δ

. The PTCs in Figure 2b are similar to those in Figure 2a as

δ > 0.1

; that is, all algorithms are stable under small disturbances. However, when

δ \leq 0.1

, all PTCs in Figure 2b decrease rapidly with a decrease in

δ

, which is different from the phenomena shown in Figure 2a. This indicates that all algorithms require more measurements to ensure the recovery effect under the disturbance for the sparse phase retrieval problem. Finally, it should be noted that the PTCs of SPR-HTP, CoPRAM, and SPARTA are close to each other in Figure 2a,b, which means that the reconstruction capability of these three algorithms is almost the same. Comparatively speaking, the recovery capability of SPR-HTP is slightly better than those of CoPRAM and SPARTA.

4.2.2. Greyscale Maps

A grayscale map is an image that contains only brightness information and does not contain color information. In grayscale maps, the successful recovery rate of the algorithms is expressed by the different gray levels of the corresponding blocks, where black indicates a successful recovery rate of 0%, white is 100%, and gray is 0% to 100%.

Grayscale maps for different algorithms, such as SPR-HBHTP, SPR-HTP, CoPRAM and SPARTA, with signal lengths of n = 3000 are displayed in Figure 3. In our experiments, the sample size m ranged from 250 to 3000 with a stepsize of 250 and the sparsity s ranged from 20 to 100 with a stepsize of 5. By comparing the four graphs in Figure 3, it can be seen that when the sparsity is relatively large, the recovery ability of SPR-HBHTP is stronger than that of other algorithms. For smaller sparsity s values, SPR-HTP and CoPRAM have almost the same recovery ability, and similarly, SPARTA and SPR-HBHTP have comparable recovery abilities. This is consistent with our results for phase transition curves.

Figure 3. Grayscale map for different algorithms with n = 3000. (a) SPR-HBHTP. (b) SPR-HTP. (c) CoPRAM. (d) SPARTA.

4.3. Algorithm Selection Maps

In practice, the algorithm that consumes the least amount of time should be selected naturally when more than one algorithm can reconstruct the target signal successfully. Hence, it is meaningful to choose the fastest algorithm at the intersection of the success regions of the algorithms, which builds the algorithm selection maps (ASM) proposed in [42,43]. Note that the algorithm will be selected automatically if it is the only one that can recover the signal successfully.

Denote

δ = m / n

and

ρ = s / m .

Next, we establish the ASM in the

(δ, ρ)

-plane with

(δ, ρ)

given as follows

\begin{matrix} δ \in \{0.05, 0.07, 0.09, 0.1, 0.1445, \dots, 0.99\}, ρ \in \{0.02, 0.022, \dots, 0.1\}, \end{matrix}

where the intervals

[0.1, 0.99]

and

[0.02, 1]

are equally divided into 20 and 40 parts, respectively. For each

δ

, we tested 10 problem instances at

(δ, ρ)

for each algorithm with an increase in

ρ

until the success frequency was less than 50%. The ASM and average runtime of the fastest algorithm with accurate measurements are summarized in Figure 4. Figure 4a indicates that SPR-HBHTP or SPR-HTP is the fastest algorithm in most areas, and CoPRAM is a slower algorithm relatively, since it does not appear in the ASM. Figure 4b shows us that the average runtime of the fastest algorithm is less than one second in most regions, and it increases by up to 3–7 s for larger

δ

and

ρ

. This demonstrates that all algorithms will take more time, as the sparsity level s becomes larger.

Figure 4. ASM and the shortest average runtime with accurate measurements. (a) ASM. (b) Average runtime of the fastest algorithm.

For comparing the algorithms thoroughly, we collected detailed information about their average runtimes, as shown in Figure 5, and the ratios of the average runtimes for the algorithms against the fastest one are shown in Figure 4b. Figure 5a,b reveal that the ratios of SPR-HBHTP and SPR-HTP are close to 1 in most areas, which indicates that they are faster than CoPRAM and SPARTA. In particular, the advantage value of SPR-HBHTP lies in the region with a larger

ρ

compared to SPR-HTP. In Figure 5c, the ratio of CoPRAM is about 5–20 in most regions, and the minimum value is 5, which means that it is slower than the other three algorithms. Finally, by comparing Figure 5d with Figure 5a–c, we find that SPARTA is slower than SPR-HBHTP and SPR-HTP in most cases, but it is much faster than CoPRAM.

Figure 5. The ratios of average runtimes for the algorithms against the fastest one. (a) SPR-HBHTP. (b) SPR-HTP. (c) CoPRAM. (d) SPARTA.

5. Conclusions

We introduced a second-order accelerated method for the Hard Thresholding Pursuit, named the Heavy-Ball-Based Hard Thresholding Pursuit, to reconstruct a sparse signal from phaseless measurements. Under the restricted isometry property, SPR-HBHTP enjoys an exact provable recovery in the finite steps as soon as the number of noiseless Gaussian measurements exceeds a certain bound. It is remarkably different from the existing phase recovery algorithms in terms of theoretical analysis. Moreover, numerical experiments on random problem instances indicate that our algorithm outperforms the state-of-the-art algorithms, such as SPR-HTP, CoPRAM, and SPARTA, in terms of its recovery success rate and computational times through the phase transition analysis. Conducting experiments on realistic situations to truly verify the practical performance of SPR-HBHTP is worthy of further research. In addition, studying the random alternating minimization method based on Heavy-Ball acceleration is an interesting research topic.

Author Contributions

Conceptualization, J.Z. and Z.S.; Methodology, Z.S. and J.T.; Investigation, Y.L. and J.Z.; writing—original draft, Y.L. and J.T.; Supervision, J.Z. and Z.S.; Funding acquisition, J.Z. and J.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (11771255), the Young Innovation Teams of Shandong Province (2019KJI013), the Shandong Province Natural Science Foundation (ZR2021MA066), the Natural Science Foundation of Henan Province (222300420520), and the Key Scientific Research Projects of Higher Education of Henan Province (22A110020).

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

References

Shechtman, Y.; Eldar, Y.C.; Cohen, O.; Chapman, H.N.; Miao, J.W.; Segev, M. Phase retrieval with application to optical imaging: A contemporary overview. IEEE Signal Process Mag. 2015, 32, 87–109. [Google Scholar] [CrossRef]
Gerchberg, R.; Saxton, W. A practical algorithm for the determination of plane from image and diffraction pictures. Optik 1972, 35, 237–246. [Google Scholar]
Fienup, J.R. Phase retrieval algorithms: A comparison. Appl. Opt. 1982, 21, 2758–2769. [Google Scholar] [CrossRef] [PubMed]
Marchesini, S. Phase retrieval and saddle-point optimization. J. Opt. Soc. Am. A 2007, 24, 3289–3296. [Google Scholar] [CrossRef] [PubMed]
Nugent, K.; Peele, A.; Chapman, H.; Mancuso, A. Unique phase recovery for nonperiodic objects. Phys. Rev. Lett. 2003, 91, 203902. [Google Scholar] [CrossRef] [PubMed]
Fickus, M.; Mixon, D.G.; Nelson, A.A.; Wan, Y. Phase retrieval from very few measurements. Linear Algebra Appl. 2014, 449, 475–499. [Google Scholar] [CrossRef]
Chen, P.; Fannjiang, A.; Liu, G.R. Phase retrieval with one or two diffraction patterns by alternating projections with the null initialization. J. Fourier Anal. Appl. 2018, 24, 719–758. [Google Scholar] [CrossRef]
Waldspurger, I. Phase retrieval with random gaussian sensing vectors by alternating projections. IEEE Trans. Inf. Theory 2018, 64, 3301–3312. [Google Scholar] [CrossRef]
Candès, E.J.; Strohmer, T.; Voroninski, V. Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming. Commun. Pure Appl. Math. 2013, 66, 1241–1274. [Google Scholar] [CrossRef]
Waldspurger, I.; d’Aspremont, A.; Mallat, S. Phase recovery, maxcut and complex semidefinite programming. Math. Program. 2015, 149, 47–81. [Google Scholar] [CrossRef]
Goldstein, T.; Studer, C. Phasemax: Convex phase retrieval via basis pursuit. IEEE Trans. Inf. Theory 2018, 64, 2675–2689. [Google Scholar] [CrossRef]
Hand, P.; Voroninski, V. An elementary proof of convex phase retrieval in the natural parameter space via the linear program PhaseMax. Commun. Math. Sci. 2018, 16, 2047–2051. [Google Scholar] [CrossRef]
Candès, E.J.; Li, X.; Soltanolkotabi, M. Phase retrieval via wirtinger flow: Theory and algorithms. IEEE Trans. Inf. Theory 2015, 61, 1985–2007. [Google Scholar] [CrossRef]
Netrapalli, P.; Jain, P.; Sanghavi, S. Phase retrieval using alternating minimization. IEEE Trans. Signal Process 2015, 18, 4814–4826. [Google Scholar] [CrossRef]
Cai, J.F.; Li, J.; Lu, X.; You, J. Sparse signal recovery from phaseless measurements via hard thresholding pursuit. Appl. Comput. Harmon. Anal. 2022, 56, 367–390. [Google Scholar] [CrossRef]
Cai, J.F.; Jiao, Y.L.; Lu, X.L.; You, J.T. Sample-efficient sparse phase retrieval via stochastic alternating minimization. IEEE Trans. Signal Process 2022, 70, 4951–4966. [Google Scholar] [CrossRef]
Yang, M.H.; Hong, Y.W.P.; Wu, J.Y. Sparse affine sampling: Ambiguity-free and efficient sparse phase retrieval. IEEE Trans. Inf. Theory 2022, 68, 7604–7626. [Google Scholar] [CrossRef]
Bakhshizadeh, M.; Maleki, A.; Jalali, S. Using black-box compression algorithms for phase retrieval. IEEE Trans. Inf. Theory 2020, 66, 7978–8001. [Google Scholar] [CrossRef]
Cha, E.; Lee, C.; Jang, M.; Ye, J.C. DeepPhaseCut: Deep relaxation in phase for unsupervised fourier phase retrieval. IEEE Trans. Pattern. Anal. Mach. Intell. 2022, 44, 9931–9943. [Google Scholar] [CrossRef]
Wang, B.; Fang, J.; Duan, H.; Li, H. PhaseEqual: Convex phase retrieval via alternating direction method of multipliers. IEEE Trans. Signal Process 2020, 68, 1274–1285. [Google Scholar] [CrossRef]
Liu, T.; Tillmann, A.M.; Yang, Y.; Eldar, Y.C.; Pesavento, M. Extended successive convex approximation for phase retrieval with dictionary learning. IEEE Trans. Signal Process 2022, 70, 6300–6315. [Google Scholar] [CrossRef]
Fung, S.W.; Di, Z.W. Multigrid optimization for large-scale ptychographic phase retrieval. SIAM J. Imaging Sci. 2020, 13, 214–233. [Google Scholar] [CrossRef]
Chen, Y.; Chi, Y.; Fan, J.; Ma, C. Gradient descent with random initialization: Fast global convergence for nonconvex phase retrieval. Math. Program. 2019, 176, 5–37. [Google Scholar] [CrossRef]
Cai, J.F.; Huang, M.; Li, D.; Wang, Y. Solving phase retrieval with random initial guess is nearly as good as by spectral initialization. Appl. Comput. Harmon. Anal. 2022, 58, 60–84. [Google Scholar] [CrossRef]
Soltanolkotabi, M. Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization. IEEE Trans. Inf. Theory 2019, 65, 2374–2400. [Google Scholar] [CrossRef]
Jaganathan, K.; Oymak, S.; Hassibi, B. Sparse phase retrieval: Uniqueness guarantees and recovery algorithms. IEEE Trans. Signal Process. 2017, 65, 2402–2410. [Google Scholar] [CrossRef]
Killedar, V.; Seelamantula, C.S. Compressive phase retrieval based on sparse latent generative priors. In Proceedings of the ICASSP 2022—2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23–27 May 2022; pp. 1596–1600. [Google Scholar]
Wen, J.; He, H.; He, Z.; Zhu, F. A pseudo-inverse-based hard thresholding algorithm for sparse signal recovery. IEEE Trans. Intell. Transp. Syst. 2022. [Google Scholar] [CrossRef]
Foucart, S. Hard thresholding pursuit: An algorithm for compressive sensing. SIAM J. Numer. Anal. 2011, 49, 2543–2563. [Google Scholar] [CrossRef]
Foucart, S.; Rauhut, H. An Invitation to Compressive Sensing. In A Mathematical Introduction to Compressive Sensing; Springer: Berlin/Heidelberg, Germany, 2013; pp. 1–39. [Google Scholar]
Blumensath, T.; Davies, M.E. Iterative hard thresholding for compressed sensing. Appl. Comput. Harmon. Anal. 2009, 27, 265–274. [Google Scholar] [CrossRef]
Tropp, J.A.; Gilbert, A.C. Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inf. Theory 2007, 53, 4655–4666. [Google Scholar] [CrossRef]
Needell, D.; Tropp, J. CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal. 2009, 26, 301–321. [Google Scholar] [CrossRef]
Dai, W.; Milenkovic, O. Subspace pursuit for compressive sensing signal reconstruction. IEEE Trans. Inf. Theory 2009, 55, 2230–2249. [Google Scholar] [CrossRef]
Polyak, B.T. Some methods of speeding up the convergence of iteration methods. Comp. Math. Math. Phys. 1964, 4, 1–17. [Google Scholar] [CrossRef]
Sun, Z.F.; Zhou, J.C.; Zhao, Y.B.; Meng, N. Heavy-ball-based hard thresholding algorithms for sparse signal recovery. J. Comput. Appl. Math. 2023, 430, 115264. [Google Scholar] [CrossRef]
Candès, E.J. The restricted isometry property and its implications for compressed sensing. Crendus. Math. 2008, 346, 589–592. [Google Scholar] [CrossRef]
Wang, G.; Zhang, L.; Giannakis, G.B.; Akçakaya, M.; Chen, J. Sparse phase retrieval via truncated amplitude flow. IEEE Trans. Signal Process 2018, 66, 479–491. [Google Scholar] [CrossRef]
Jagatap, G.; Hegde, C. Sample-efficient algorithms for recovering structured signals from magnitude-only measurements. IEEE Trans. Inf. Theory 2019, 65, 4434–4456. [Google Scholar] [CrossRef]
Cai, T.T.; Li, X.; Ma, Z. Optimal rates of convergence for noisy sparse phase retrieval via thresholded wirtinger flow. Ann. Stat. 2016, 44, 2221–2251. [Google Scholar] [CrossRef]
Zhao, Y.B. Optimal k-thresholding algorithms for sparse optimization problems. SIAM J. Optim. 2020, 30, 31–55. [Google Scholar] [CrossRef]
Blanchard, J.D.; Tanner, J. Performance comparisons of greedy algorithms in compressed sensing. Numer. Linear Algebra Appl. 2015, 22, 254–282. [Google Scholar] [CrossRef]
Blanchard, J.D.; Tanner, J.; Wei, K. CGIHT: Conjugate gradient iterative hard thresholding for compressed sensing and matrix completion. Inf. Inference 2015, 4, 289–327. [Google Scholar] [CrossRef]

Figure 1. Comparison of the success rates of SPR-HBHTP with different parameters. (a) Different momentum coefficients

β

. (b) Different stepsizes

α

.

Figure 2. Comparison of the PTCs for the algorithms with accurate/inaccurate measurements. (a) Accurate measurements. (b) Inaccurate measurements with

κ

= 0.01.

Figure 3. Grayscale map for different algorithms with n = 3000. (a) SPR-HBHTP. (b) SPR-HTP. (c) CoPRAM. (d) SPARTA.

Figure 4. ASM and the shortest average runtime with accurate measurements. (a) ASM. (b) Average runtime of the fastest algorithm.

Figure 5. The ratios of average runtimes for the algorithms against the fastest one. (a) SPR-HBHTP. (b) SPR-HTP. (c) CoPRAM. (d) SPARTA.

Table 1. The descriptions of the parameters and the abbreviations of the algorithm names.

$α$	The stepsize of the algorithm
$β$	The coefficient of the momentum term
HTP	Hard thresholding pursuit
HBHTP	Heavy-Ball-Based Hard Thresholding Pursuit
CoSaMP	Compressive Sampling Matching Pursuit
SPARTA	SPARse Truncated Amplitude Flow
TWF	Thresholded Wirtinger Flow
TAF	Truncated Amplitude Flow
SPR-HTP	Hard Thresholding Pursuit for sparse phase retrieval
SPR-HBHTP	Heavy-Ball-Based Hard Thresholding Pursuit for sparse phase retrieval
CoPRAM	Compressive Phase Retrieval with Alternating Minimization

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Heavy-Ball-Based Hard Thresholding Pursuit for Sparse Phase Retrieval Problems

Abstract

1. Introduction

2. Preliminary and Algorithms

2.1. Notations

2.2. New Algorithms

3. Convergence Analysis

Finite Termination

4. Numerical Experiments

4.1. Choices of Parameters $α$ and $β$

4.2. Phase Transition

4.2.1. Phase Transition Curves

4.2.2. Greyscale Maps

4.3. Algorithm Selection Maps

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

Heavy-Ball-Based Hard Thresholding Pursuit for Sparse Phase Retrieval Problems

Abstract

1. Introduction

2. Preliminary and Algorithms

2.1. Notations

2.2. New Algorithms

3. Convergence Analysis

Finite Termination

4. Numerical Experiments

4.1. Choices of Parameters α and β

4.2. Phase Transition

4.2.1. Phase Transition Curves

4.2.2. Greyscale Maps

4.3. Algorithm Selection Maps

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

4.1. Choices of Parameters $α$ and $β$