A Composite Initialization Method for Phase Retrieval

Qi Luo; Shijian Lin; Hongxia Wang

doi:10.3390/sym13112006

Abstract

Phase retrieval is a classical inverse problem with respect to recovering a signal from a system of phaseless constraints. Many recently proposed methods for phase retrieval such as PhaseMax and gradient-descent algorithms enjoy benign theoretical guarantees on the condition that an elaborate estimate of true solution is provided. Current initialization methods do not perform well when number of measurements are low, which deteriorates the success rate of current phase retrieval methods. We propose a new initialization method that can obtain an estimate of the original signal with uniformly higher accuracy which combines the advantages of the null vector method and maximal correlation method. The constructed spectral matrix for the proposed initialization method has a simple and symmetrical form. A lower error bound is proved theoretically as well as verified numerically.

Keywords:

phase retrieval; spectral method; non-convex reconstruction

1. Introduction

Phase retrieval (PR) has many applications in science and engineering, which include X-ray crystallography [1], molecular imaging [2], biological imaging [3], and astronomy [4]. Mathematically, phase retrieval is a problem about finding a signal

x \in R^{n}

or

C^{n}

satisfying the following phaseless constraints:

b_{i} = | ⟨a_{i}, x⟩ | + ϵ_{i}, i = 1, \dots, m .

(1)

where

b_{i} \in R

is the observed measurement,

a_{i} \in R^{n}

or

C^{n}

is the measuring vector, and

ϵ_{i}

denotes noise. In optics and quantum physics,

a_{i}

is related to the Fourier transform or the Fractional Fourier transform [5]. To remove the theoretical barrier, recent works also focus on the generic measurement vectors, namely

a_{i} \in R

independently sampled from

N (0, I_{n \times n})

.

Current methods for solving phase retrieval can be generally divided into two groups: convex and non-convex approaches. Convex methods, e.g., PhaseLift [6] and PhaseCut [7], utilize a semidefinite relaxation to transfer the original problem into a convex one. Nevertheless, the heavy computational cost of these methods hinders its application to practical scenarios. On the other hand, non-convex methods including the Gerchberg–Saxton (GS) [8,9], the (truncated) Wirtinger flow (WF/TWF) [10,11], and the (truncated) amplitude flow (AF/TAF) have a significant advantage in lower computational cost and sampling complexity (i.e., the minimal required

m / n

to achieve successful recovery) [12]. More recently, two variants of AF using the reweighting [13] and the smoothing technique [14,15] have been proposed to further lower sampling complexity. However, for the mentioned nonconvex methods, the rate of successful reconstruction relies upon elaborate initialization. Moreover, an initialization point nearby the true solution is also a necessity for convergency property in theoretical analysis.

1.1. Prior Art

Current initialization methods for PR problem include spectral method [16], the orthogonality promoting method [12], null vector method [17], and the (weighted) maximal correlation method [13]. These methods first construct a spectral matrix in the form of

M = \sum_{i = 1}^{m} w_{i} a_{i} a_{i}^{H},

where the weight

w_{i}

is a positive real number. The estimate is then given as the (scaled) eigenvector corresponding to the largest or smallest eigenvalue of

M

. The spectral method and the (weighted) maximal correlation method fall into one category as they seek the leading vector of

M

as the estimate. We categorize these methods as correlation-focused methods, since their spectral matrices endow more weight to the measuring vectors more correlated to the original signal. Conversely, the null vector and orthogonality promoting method can be regarded as orthogonality-focused methods. For the vector

a_{i}

with less correlation to the original signal, the corresponding weight

w_{i}

is larger in the spectral matrix.

These two types of initialization methods each have their own merits and demerits. The correlation-focused methods perform better when the oversalting rate

m / n

is low. While the situation is contrary as

m / n

is large enough, the orthogonality-focused method provides a more accurate estimate.

1.2. This Work

In this paper, we propose a method combining the advantages of the above two types of initialization methods. Our intuition is to construct a composite spectral matrix, which is a weighted sum of spectral matrices of correlation-focused methods and orthogonality-focused methods. Hence, the new method is termed as the composite initialization method.

1.3. Article Organization and Notations

This paper adopts the following notations. The bold-font lower-case letter denotes a column vector, e.g.,

b,

z

. Bold capital letters such as

A

denote matrices.

{(\cdot)}^{T}

and

{(\cdot)}^{H}

stand for the transpose and the conjugate transpose, respectively. Calligraphic letters such as

I

denote a index set, and

| I |

denotes the number of elements of

I

.

{\{a_{i}\}}_{i = 1}^{m}

is a simple notation of a set of elements in the form of

\{a_{1}, \dots, a_{m}\}

.

{∥x∥}_{2}

denotes the

ℓ_{2}

-norm of

x

.

∥x∥

means

{∥x∥}_{2}

for the sake of simplicity, if not specially specified.

The reminder of this article is organized as follows. The algorithm is given in Section 2. Section 3 provides the theoretical error analysis about the proposed initialization method with some relevant technique error bounds being placed in the Appendix. Section 4 illustrates the numerical performance and makes a comparison of the proposed method with other initialization methods.

2. The Formulation of the Composite Initialization Method

This section provides the formulation of the proposed method. Without loss of generality, we assume that

b_{1} \leq b_{2} \leq \dots \leq b_{m} .

Measuring vectors

{a_{i}}_{i = 1}^{m}

are assumed to be independently sampled centered Gaussian variables with identity covariance for the convenience of theoretical analysis. Let

x_{0}

be the true solution of (1). For concreteness, we focus on the real-valued Gaussian model. We further assume that

∥x_{0}∥ = 1

since this investigation focuses on the estimation of the original signal. In practice,

∥x_{0}∥

can be estimated by

{∥x_{0}∥}^{2} \approx \frac{1}{m} \sum_{1}^{m} b_{i}^{2}

. To begin with, we describe the null vector and the maximal correlation method which form the basis of our algorithm.

The null vector method first picks out a subset of vectors

{\{a_{i}\}}_{i \in I_{1}}

, where

I_{1} \subset \{1, 2, \dots, m\}

. Then, the null vector method approximates the original signal

x_{0}

by a vector that is most orthogonal to this subset. Since

{\{b_{i}\}}_{i = 1}^{m}

is in the ascending order,

I_{1}

can be directly written as

\{1, 2, \dots, T_{1}\}

, where

T_{1}

is the cardinality of

I_{1}

. The intuition of the null vector method is as simple as follows. Since

\sum_{i \in I_{1}} x^{H} a_{i} a_{i}^{H} x

takes a very small value

\sum_{i \in I_{1}} b_{i}^{2}

when

x = x_{0}

, one can construct the following minimizing problem to estimate

x_{0}

to coincide this property.

\hat{x} = arg min_{∥z∥ = 1} z^{H} (\frac{1}{T_{1}} \sum_{i = 1}^{T_{1}} a_{i} a_{i}^{H}) z .

(2)

Solving (2) is equivalent to finding the smallest eigenvalue and the corresponding eigenvector of

M_{1} : = \frac{1}{T} \sum_{i = 1}^{T_{1}} a_{i} a_{i}^{H}

. The so-called orthogonality promoting method proposed by Wang is an approximately equivalent method that transforms the minimization problem (2) into a maximization problem.

The maximal correlation method is based on the opposite intuition, which picks out a subset of vectors

{\{a_{i}\}}_{i \in I_{2}}

corresponding to the largest

T_{2}

elements in

{\{b_{i}\}}_{i = 1}^{m}

. As

{\{b_{i}\}}_{i = 1}^{m}

is in the ascending order,

I_{2}

can also be written directly as

\{m - T_{2} + 1, \dots, m\} .

The core idea of maximal correlation method is searching for the vector that is most correlated with

{\{a_{i}\}}_{i \in I_{2}}

. This is achieved by solving the following maximizing problem.

\hat{x} = arg max_{∥z∥ = 1} \frac{1}{T_{2}} \sum_{i = m - T_{2} + 1}^{m} {| ⟨a_{i}, z⟩ |}^{2} = arg max_{∥z∥ = 1} z^{H} (\frac{1}{T_{2}} \sum_{i = m - T_{2} + 1}^{m} a_{i} a_{i}^{H}) z .

(3)

From (2) and (3), one can observe that these two methods only utilize a subset of vectors that are either most correlated or most orthogonal to the original signal. To make use of as much as possible information, we try to solve a composite problem combining (2) and (3) together.

\hat{x} = arg min_{∥z∥ = 1} \{z^{H} (\frac{1}{T_{1}} \sum_{i = 1}^{T_{1}} a_{i} a_{i}^{H}) z - α z^{H} (\frac{1}{T_{2}} \sum_{i = m - T_{2} + 1}^{m} a_{i} a_{i}^{H}) z\} .

(4)

We can observe that (4) has a symmetrical structure and can be interpreted as a modified version of (2), which adds the objection of (3) as a penalization term. The solution of (4) will be a more accurate estimate than that of (3) since it utilize the information from (2).

Let

H_{1} = \sum_{i = 1}^{T_{1}} b_{i}^{2} / T_{1}, H_{2} = \sum_{i = m - T_{2} + 1}^{T_{2}} b_{i}^{2} / T_{2}

. We set the parameter as

α = H_{1} / H_{2}

in (4). The true signal

x

is roughly the solution of (2) and (3); thus, it is also the approximated solution of (4). In next section, we will analyze the error of our proposed method.

The algorithm is presented in Algorithm 1.

2 α I

is added to ensure the positivity of

M

. The

x_{k + 1}^{'}

in Step 4 is obtained by solving the following system of linear equations:

M x = x_{k} .

(5)

which can be solved by the conjugate descent method since M is a positive and Hermitian by using

x_{k - 1}

as initializer.

Since

x_{0}

and

e^{- i ϕ} x_{0}

are indistinguishable for the phase retrieval, we evaluate the estimate using the following metric.

dist (\tilde{x}, x_{0}) = min_{ϕ \in [0, 2 π)} ∥ e^{- i ϕ} \tilde{x} - x_{0} ∥ .

(6)

The other initialization methods, e.g., the spectral method, the truncated spectral method, and the (weighted) maximal correlation method, are all based on the power iteration. The spectral method is for finding the leading eigenvector of matrix

\sum_{i = 1}^{m} b_{i}^{2} a_{i} a_{i}^{T}

, which is also realized by the power approach. Chen proposed the truncated spectral method to improve the performance of the spectral method, which constructed another matrix via the threshold parameter

τ

[17]. The corresponding matrix for iteration is

\sum_{i = 1}^{m} 1_{τ} (b_{i}) b_{i}^{2} a_{i} a_{i}^{H}

, where

1_{τ}

is defined by the following.

1_{τ} (b_{i}) = \{\begin{matrix} 1, if b_{i} \leq τ {∥b∥}_{2} / \sqrt{m}; \\ 0, else . \end{matrix}

(7)

The numerical performance of these initialization methods will be compared later.

Algorithm 1 The composite initialization method.

Input: Increasingly arranged

{\{b_{i}\}}_{i = 1}^{m}

and corresponding

{\{a_{i}\}}_{i = 1}^{m}

, truncated value

T_{1}, T_{2}

, threshold value

ϵ

, a random initialization

x_{1}

.

1:: $M_{1} = \frac{1}{T_{1}} \sum_{i = 1}^{T_{1}} a_{i} a_{i}^{H}$ , $M_{2} = \frac{1}{T_{2}} \sum_{i = T_{2} + 1}^{m} a_{i} a_{i}^{H}$ , $α = (\frac{1}{T_{1}} \sum_{i = 1}^{T_{1}} b_{i}^{2}) / (\frac{1}{T_{2}} \sum_{i = m - T_{2} + 1}^{m} b_{i}^{2})$
2:: $M = M_{1} - α M_{2} + 2 α I$
3:: for $k = 1, 2, \dots$ do
4:: $x_{k + 1}^{'} \leftarrow M^{- 1} x_{k}$
5:: $x_{k + 1} \leftarrow x_{k}^{'} / ∥x_{k + 1}^{'}∥$
6:: if $∥x_{k + 1} - x_{k}∥ \leq ϵ$ , break
7:: end for

Output:

x_{OPMP} = x_{k + 1} / ∥x_{k + 1}∥ .

3. Theoretical Analysis

In this section, we will present the error estimate of the proposed method under Gaussian assumptions in the real case.

Since the measuring vectors are assumed to be i.i.d. Gaussians, we can assume that

x_{0} = e_{1}

without loss of generality, where

e_{1} = {[1, 0, \dots, 0]}^{T}

. Otherwise if

x_{0} \neq e_{1}

, there exists an orthogonal matrix,

P

, satisfying

P x_{0} = e_{1}

. Denote

A = [a_{1}, a_{2}, \dots, a_{m}]

, then

A^{T} x = A^{T} P^{T} P x

. Let

z = Px

, then the original problem (1) is identical to

{(PA)}^{T} z = b

because

PA

is composed by Gaussian vectors due to the invariance of Gaussian under the orthogonal transform.

The spectral matrix for orthogonality promoting method is the following.

M_{1} = \frac{1}{T_{1}} \sum_{i = 1}^{T_{1}} a_{i} a_{i}^{T} .

(8)

By denoting

d_{i} = {(a_{i 2}, a_{i 3}, \dots, a_{i n})}^{T}

, then

M

can be written as follows:

M_{1} = \frac{1}{T_{1}} \sum_{i = 1}^{T_{1}} [\begin{matrix} a_{i 1}^{2} & a_{i 1} d_{i}^{T} \\ a_{i 1} d_{i} & d_{i} d_{i}^{T} \end{matrix}] = [\begin{matrix} H_{1} & E_{1}^{T} \\ E_{1} & G_{1} \end{matrix}]

(9)

where the following is the case.

H_{1} = \frac{1}{T_{1}} \sum_{i = 1}^{T_{1}} a_{i 1}^{2}, E_{1} = \frac{1}{T_{1}} \sum_{i = 1}^{T_{1}} a_{i 1} d_{i}, G_{1} = \frac{1}{T_{1}} \sum_{i = 1}^{T_{1}} d_{i} d_{i}^{T} .

(10)

The matrix for maximal correlation method is the following:

M_{2} = \frac{1}{T_{2}} \sum_{i = m - T_{2} + 1}^{m} [\begin{matrix} a_{i 1}^{2} & a_{i 1} d_{i}^{T} \\ a_{i 1} d_{i} & d_{i} d_{i}^{T} \end{matrix}] = [\begin{matrix} H_{2} & E_{2}^{T} \\ E_{2} & G_{2} \end{matrix}]

(11)

where the following is the case.

H_{2} = \frac{1}{T_{2}} \sum_{i = m - T_{2} + 1}^{m} a_{i 1}^{2}, E_{2} = \frac{1}{T_{2}} \sum_{i = m - T_{2} + 1}^{m} a_{i 1} d_{i}, G_{2} = \frac{1}{T_{2}} \sum_{i = m - T_{2} + 1}^{m} d_{i} d_{i}^{T} .

(12)

The constructed matrix for estimation is the following.

M = M_{1} - α M_{2}, α = H_{1} / H_{2} .

(13)

To proceed, we notice the following basic facts:

1.: The orthogonality promoting method is for finding the eigenvector of minimum eigenvalue of $M$ .

$M = [\begin{matrix} 0 & E_{1}^{T} - α E_{2}^{T} \\ E_{1} - α E_{2} & G_{1} - α G_{2} \end{matrix}] : = [\begin{matrix} 0 & E^{T} \\ E & G \end{matrix}] .$

(14)
2.: In the ideal case that $E$ is a zero matrix, $M$ degenerates to the following.

$\tilde{M} = [\begin{matrix} 0 & 0 \\ 0 & G \end{matrix}] .$

If we can also ensure that $G = G_{1} - α G_{2}$ is positively definite, the eigenvector corresponding to the smallest eigenvalue is exactly the true signal.

These two facts inspire us to estimate the error by computing the eigenvector of matrix

\tilde{M}

after adding perturbation

E = E_{1} - α E_{2}

. Thus, the outline of theoretical analysis is concluded as follows:

1.: Estimate $Δ λ = λ - \tilde{λ}$ , where $λ$ and $\tilde{λ}$ are the minimum eigenvalue of $M$ and $\tilde{M}$ , respectively;
2.: With $Δ λ$ , we can then compute the perturbation of corresponding eigenvector, $∥ v - \tilde{v} ∥$ , which is the exact error of our algorithm.

Specifically, we have the following roadmap of theoretical analysis. Section 3.1 presents bound results for each component of the spectral matrix

M

. Using the results in Section 3.1, the bounds of

Δ λ

can then be easily obtained in Section 3.2. The relationship between perturbation of eigenvalues and eigenvectors is presented in Section 3.3, which finally induces the error estimation of our algorithm formally in Section 3.4.

3.1. Analysis of Each Component of the Spectral Matrix

In this part, we will provide the bounds for the variables involved of matrix

M

, which are basic ingredients for estimating perturbation of eigenvalue and eigenvector, namely

Δ λ

and

∥ v - \tilde{v} ∥

. In particular, the upper bound for

H_{1}

, lower bound for

H_{2},

and the norm of

E

will be analysed.

3.1.1. Upper Bound of $α = H_{1} / H_{2}$

Finding the upper bound of

α

actually consists in finding the upper bound of

H_{1}

and the lower bound of

H_{2}

. First, we have the upper bound of

H_{1}

under statistical meaning in the follow lemma.

Lemma 1.

We have the following:

Pr (H_{1} \leq \frac{p_{1}^{2}}{4 ρ_{0}^{2}}) \geq 1 - exp (- \frac{m p_{1}^{2}}{50})

(15)

where

p_{1} = T_{1} / m

.

The proof of Lemma 1 is placed in the Appendix A and Appendix B. As for the lower bound of

H_{2}

, we borrow a result from Wang [13], Lemma 3.

Lemma 2.

The following holds with probability exceeding

1 - e^{- c_{0} m}

:

H_{2} \geq 0.99 (1 + log (m / T_{2})),

(16)

provided that

T_{2} \geq c_{1} n, m \geq c_{2} T_{2}

for some absolute constants

c_{0}, c_{1}

, and

c_{2}

.

Then, we obtained the required upper bound of

H_{1} / H_{2} .

Lemma 3.

Under the set-up of Lemmas 1 and 2, we have the following:

α = H_{1} / H_{2} \leq \frac{p_{1}^{2}}{3.96 ρ_{0}^{2} (1 - log p_{2})}

(17)

with probability of at least

1 - exp (- \frac{m p_{1}^{2}}{50}) - exp (- c_{0} m),

(18)

where

ρ_{0} = 1 / \sqrt{2 π}

,

p_{1}

, and

p_{2}

denote

T_{1} / m

and

T_{2} / m

, respectively.

3.1.2. Lower Bound of the Smallest Eigenvalue of $G$

Since

G

is a linear combination of

G_{1}

and

G_{2}

, the lower bound of the smallest eigenvalue of

G

can be estimated from the bounds of eigenvalues of

G_{1}

and

G_{2}

.

According to (10) and (12), we rewrite

G_{1}

and

G_{2}

as the following:

\begin{matrix} G_{1} & = \frac{1}{T_{1}} D_{1}^{T} D_{1}, G_{2} = \frac{1}{T_{2}} D_{2}^{T} D_{2}, \end{matrix}

(19)

where

D_{1} = {[d_{1}, \dots, d_{T_{1}}]}^{T}

and

D_{2} = {[d_{m - T_{2} + 1}, \dots, d_{m}]}^{T} .

D_{1}

and

D_{2}

are termed Gaussian matrices here since their entries are all sampled from the Gaussian distribution. We notice that

D_{1}^{T} D_{1}

and

D_{2}^{T} D_{2}

are Wishart matrices, which can be written as the product of a Gaussian matrix and its adjoint [18]. Moreover, the eigenvalue perturbation of Wishart matrix obeys the following classical result.

Theorem 1

([19], Corollary 5.35). Let

A

be an

N_{1} \times N_{2}

matrix for which its entries are independent standard normal random variables. Then, for every

t \geq 0

, with a probability of at least

1 - 2 exp (- t^{2} / 2)

, the following events hold simultaneously:

\begin{matrix} \sqrt{N_{1}} - \sqrt{N_{2}} - t \leq s_{min} (A), \end{matrix}

(20)

\begin{matrix} s_{max} (A) \leq \sqrt{N_{1}} + \sqrt{N_{2}} + t, \end{matrix}

(21)

where

s_{min} (A)

and

s_{max} (A)

stand for the smallest and the largest singular value of

A

, respectively.

By pplying Theorem 1 to

D_{1} \in R^{T_{1} \times (n - 1)}

with the following replacements:

(1): $D_{1} \to A$ ,
(2): $T_{1} \to N_{1}$ , $n - 1 \to N_{2}$ ,
(3): $\sqrt{n - 1} / 10 \to t$ ,

one can see that the following is the case.

\begin{matrix} Pr (s_{min} (D_{1}) \geq \sqrt{T_{1}} - 1.1 \sqrt{n - 1}) \geq 1 - 2 exp (- \frac{n - 1}{200}) . \end{matrix}

(22)

In theoretical analysis, we assume that

T_{1}

is large enough such that

\sqrt{T_{1}} - 1.1 \sqrt{n - 1} \geq 0

. However, it should be observed that setting

T_{1} \geq n

suffices to output reasonable estimates as shown in the numerical test. Based on this assumption, one then has the following.

\begin{matrix} s_{min} (G_{1}) & = s_{min} (\frac{1}{T_{1}} D_{1}^{T} D_{1}) = \frac{1}{T_{1}} s_{min}^{2} (D_{1}) \\ \geq \frac{1}{T_{1}} {(\sqrt{T_{1}} - 1.1 \sqrt{n - 1})}^{2} = {(1 - 1.1 \sqrt{s_{1}})}^{2} \end{matrix}

(23)

The above holds with probability exceeding

1 - 2 exp (- \frac{n - 1}{200})

, where

s_{1} = (n - 1) / T_{1}

. Similarly, we have the following:

\begin{matrix} Pr (s_{max} (G_{2}) \leq {(1 + 1.1 \sqrt{s_{2}})}^{2}) \geq 1 - 2 exp (- \frac{n - 1}{200}) \end{matrix}

(24)

where

s_{2} = (n - 1) / T_{2}

.

Hence, we obtain the lower bound for the smallest eigenvalue of G.

\begin{matrix} s_{min} (G) & = s_{min} (G_{1} - α G_{2}) \geq s_{min} (G_{1}) - α s_{max} (G_{2}) \\ \geq {(1 - 1.1 \sqrt{s_{1}})}^{2} - α {(1 + 1.1 \sqrt{s_{2}})}^{2} . \end{matrix}

(25)

A necessary condition for the validness of our method is

s_{min} (G) \geq 0,

which can be satisfied by choosing proper

s_{1}

and

s_{2} .

3.1.3. The Upper Bound of ${∥ E ∥}^{2}$

Now, let us turn to the estimation of the norm of

E

. We denote each item of

E

as

ξ_{k}

.

ξ_{k} = \frac{1}{T_{1}} \sum_{i = 1}^{T_{1}} a_{i 1} a_{i k} - \frac{α}{T_{2}} \sum_{i = m - T_{2} + 1}^{m} a_{i 1} a_{i k} .

(26)

Since

a_{i k}

obeys normal distribution,

H_{1} = \frac{1}{T_{1}} \sum_{i = 1}^{T_{1}} a_{i 1}^{2}

and

H_{2} = \frac{1}{T_{2}} \sum_{i = m - T_{2} + 1}^{m} a_{i 1}^{2}

, we can derive the following.

ξ_{k} \sim N (0, \frac{H_{1}}{T_{1}} + \frac{α^{2} H_{2}}{T_{2}}) .

Let

ψ_{k} = ξ_{k} / \sqrt{\frac{H_{1}}{T_{1}} + \frac{α^{2} H_{2}}{T_{2}}}

, then

ψ_{k}^{2}

obeys chi-squared distribution, which is a sub-exponential distribution. With

ψ_{k}

, we can reform

{∥E∥}_{2}^{2}

as the following.

{∥ E ∥}^{2} = (\frac{H_{1} (n - 1)}{T_{1}} + \frac{α^{2} H_{2} (n - 1)}{T_{2}}) \frac{1}{n - 1} \sum_{k = 2}^{n} ξ_{k}^{2} = (H_{1} s_{1} + H_{1} α s_{2}) \sum_{k = 2}^{n} \frac{ξ_{k}^{2}}{n - 1} .

(27)

The expectation of

{∥E∥}_{2}^{2}

is obviously

H_{1} (s_{1} + s_{2} α)

.

The variance of

{∥E∥}_{2}^{2}

is also needed in order obtain the upper bound of

{∥E∥}_{2}^{2}

. To this end, we need to recall the Bernstein-type inequality for sub-exponential random variable.

Theorem 2.

Let

X_{1}, \dots, X_{N}

be i.i.d. centered sub-exponential random variables with sub-exponential norm denoted as K. Then, for every

t \geq 0

, we have the following:

Pr (\frac{1}{N} | \sum_{i = 1}^{N} X_{i} | \geq t) \leq 2 exp [- c N min (\frac{t^{2}}{K^{2}}, \frac{t}{K})]

(28)

where

c > 0

is an absolute constant. K is the so-called sub-exponential norm defined as the following.

K = sup_{p \geq 1} \frac{1}{p} (E | X_{1} {|^{p})}^{1 / p},

(29)

Define a centered sub-exponential random variable

Z_{i} = ξ_{i} - 1

. The sub-exponential norm of

Z_{i}

is computed in Appendix C. Appendix C tells that the sub-exponential norm

K = 1

in our case. By replacing

n - 1 \to N, 0.1 \to t

and

K \to 1

in Theorem 2, one can then conclude the following result.

Theorem 3.

Since

K \leq 2

, let

t = 0.1

, then we have the following:

Pr (\frac{1}{n - 1} \sum_{i = 2}^{n} ξ_{i}^{2} - 1 \geq 0.1) \leq exp (- c \frac{n - 1}{100})

(30)

where

c > 0

is an absolute constant.

Gathering these results, we have the following theorem.

Theorem 4.

{∥E∥}^{2} \leq 1.1 H_{1} (s_{1} + s_{2} α)

(31)

The above holds with the probability stated below:

1 - exp (- c \frac{n - 1}{100}),

(32)

with some absolute constant

c > 0

.

3.2. Estimate of $Δ λ$

To make an estimate for

Δ λ

, we first recall a classical result from matrix perturbation theory in [20].

Theorem 5.

Let the following be the case:

M = [\begin{matrix} H & E^{T} \\ E & G \end{matrix}], \tilde{M} = [\begin{matrix} H & 0 \\ 0 & G \end{matrix}]

(33)

Let the above be Hermitian and set

η = min | μ - ν |

over all

μ \in σ (H)

and

ν \in σ (G)

, where

σ (H)

stands for the set of eigenvalue of

H

. Then, the following is the case:

max_{1 \leq j \leq n} | λ_{j}^{↑} - {\tilde{λ}}_{j}^{↑} | \leq \frac{2 {∥E∥}_{2}^{2}}{η + \sqrt{η^{2} + 4 {∥E∥}_{2}^{2}}} .

(34)

where

λ_{j}^{↑}

and

{\tilde{λ}}_{j}^{↑}

are the j-th smallest one among the eigenvalues of

M

and

\tilde{M}

, respectively.

To ensure the effectiveness of our algorithm,

H_{1}

should still be the minimum eigenvalue after adding perturbation in our case. Then,

η

is simply

s_{min} (G)

that is bounded by (25). Hence,

Δ λ

can be estimated by the following.

\begin{matrix} | Δ λ | & \leq max_{1 \leq j \leq n} | λ_{j}^{↑} - {\tilde{λ}}_{j}^{↑} | \leq \frac{2 {∥E∥}_{2}^{2}}{η + \sqrt{η^{2} + 4 {∥E∥}_{2}^{2}}} \leq \frac{2 {∥E∥}_{2}^{2}}{2 η} \\ \leq \frac{1.1 H_{1} (s_{1} + α s_{2})}{{(1 - 1.1 \sqrt{s_{1}})}^{2} - α {(1 - 1.1 \sqrt{s_{2}})}^{2}} . \end{matrix}

(35)

3.3. Computing $∥ v - \tilde{v} ∥$

With the upper bound of

Δ λ

, we can now estimate the perturbation of the eigenvector. The minimum eigenvector of

M

is computed by solving

(M - (λ + Δ λ) I_{n}) v = 0

:

(M - (λ + Δ λ) I_{m}) v = [\begin{matrix} - Δ λ & E^{T} \\ E & G - (λ + Δ λ) I_{n - 1} \end{matrix}] v = 0,

(36)

where

I_{n}

denotes the

n \times n

identity matrix here. Without loss of generality, the solution of (36) can be written in the form of

v = {(1, δ v)}^{T}

. Then, we have the following.

δ v = - {(G - (λ + Δ λ) I_{n - 1})}^{- 1} E .

(37)

Therefore,

∥ v - \tilde{v} ∥

can be bounded by the following:

\begin{matrix} ∥ v - \tilde{v} ∥^{2} \leq {∥δ v∥}^{2} \\ (38) & \leq & {∥{(G - (λ + Δ λ) I_{n - 1})}^{- 1}∥}^{2} {∥ E ∥}^{2} \\ (39) & \leq & {∥E∥}^{2} / (η - {∥E∥}^{2} / η), \end{matrix}

where (38) is derived based on (37). The form of (39) would be a bit complicated, as stated as follows.

\frac{{∥E∥}^{2}}{η - {∥E∥}^{2} / η} \leq \frac{1.1 H_{1} (s_{1} + s_{2} α) ({(1 - 1.1 \sqrt{s_{1}})}^{2} - α {(1 + 1.1 \sqrt{s_{2}})}^{2})}{{({(1 - 1.1 \sqrt{s_{1}})}^{2} - α {(1 + 1.1 \sqrt{s_{2}})}^{2})}^{2} - 1.1 H_{1} (s_{1} + s_{2} α)} .

(40)

3.4. Main Result

Notice that

∥v - \tilde{v}∥

is exactly

∥x_{0} - \tilde{x}∥

. Then, it is straightforward to induce our main theorem by combining the above results.

Theorem 6 (Main result).

Consider the problem of estimating arbitrary

x \in R^{n}

from m phaseless measurements (1) under the Gaussian assumption. Suppose the output of Algorithm 1 is

{\tilde{x}}_{0}

. If

m \geq c_{1} T_{1}

,

m \geq c_{2} T_{2}

,

T_{1} > n

, and

m \geq T_{1} + T_{2}

, then we have the following error estimate for the composite initialization method:

∥ x_{0} - \tilde{x} ∥^{2} \leq \frac{R Q}{R - Q},

(41)

with a probability of at least the following:

P = 1 - exp (- \frac{m p_{1}^{2}}{144} e^{\frac{- p_{1}^{2}}{2 ρ_{0}^{2}}}) - e^{- \frac{m p_{1}^{2}}{50}} - 2 e^{- \frac{n - 1}{200}} - e^{- c \frac{n - 1}{100}} - e^{- c_{0} m},

(42)

where the following is the case,

\begin{matrix} Q & = 1.1 p_{1}^{2} (4 ρ_{0}^{2} (1 - log p_{2}) s_{1} + 0.99 p_{1}^{2} s_{2}) (1 - log p_{2}), \end{matrix}

(43)

\begin{matrix} R & = {[4 ρ_{0}^{2} (1 - log p_{2}) {(1 - 1.1 \sqrt{s_{1}})}^{2} - 0.99 p_{1}^{2} {(1 + 1.1 \sqrt{s_{2}})}^{2}]}^{2}, \end{matrix}

(44)

p_{i} = T_{i} / m, s_{i} = (n - 1) / T_{i}

for

i = 1, 2,

and some absolute constants

c_{0}, c_{1}, c_{2}, ρ_{0} \geq 0

.

Probability P in (42) can be negative when m and n are too small, which shows the limitation of our analysis since m and n need to be large enough so as to render the probability reasonable. In the extreme case that

p_{1}, p_{2}, s_{1}

and

s_{2}

are close to 0, our error estimation expression (41) can be approximated by a simpler form:

∥ x_{0} - \tilde{x} ∥^{2} ⪅ \frac{1.1 p_{1}^{2} s_{1}}{10 ρ_{0}^{2}}

(45)

which is verified by numerical experiments later.

4. Numerical Results

In this section, we test the accuracy of the proposed method and compare it with other methods, including the spectral method, the truncated spectral method, the reweighted maximal correlation method, the null vector method, and the orthogonality method. The sampling vectors

a_{i}

and the original signal

x_{0} \in R^{100}

are independently randomly generated. To eliminate the influence of error brought by estimation of

∥x_{0}∥

, the original signal

x_{0}

is normalized. We chose

T_{1} = n, T_{2} = 0.3 n

for numerical experiments for the proposed method. All the following simulation results are averaged over 80 independent Monte Carlo realizations.

Figure 1 provides the RMSE calculated by (6) versus the oversampling rate

m / n

of the mentioned initialization methods. Obviously, all methods exhibit better performance as

m / n

increases. In particular, the proposed initialization method outweighs the other methods. When

m / n \geq 10

roughly, the composite initialization method performs closely as the null vector does, and the convergency behavior coincides with (45). When

m / n \leq 10

, the proposed method does not lose its accuracy as dramatically as the null vector method does. The proposed algorithm provides the most accurate estimate when oversampling rate is below the information limit, i.e.,

m / n \approx 2

.

Figure 1. RMSE vs. oversampling rate of several initialization method.

n = 100

,

T_{1} = n, T_{2} = 0.3 n

for the composite initialization method, while for the other algorithms, the involved parameters are selected according to related articles.

Figure 2 illustrates the importance of initialization method on refining the success rate of TAF algorithm [12]. In each simulation, each initialization generates an estimate as the initializer for the TAF algorithm. A trial is considered to be successful if TAF algorithm can return a result with RMSE less than

10^{- 5}

. When

m / n

is large, all the presented initialization methods ensure almost 100% success rate. However, when

m / n

approaches 0, the differences in success rates of various methods appear gradually. Therefore, the composite initialization method can help TAF to achieve better success rate compared with the other two methods.

Figure 2. Empirical success rate of different initialization methods versus number of measurements with

n = 100

, with

m / n

varying form 1 to 4 for TAF.

Table 1 illustrates the CPU time needed for the TAF using our method as an initializer compared with other two typical initialization methods. To distinguish the performance more clearly, the length of signal is set as a large number

n = 1000

. The proposed method and null vector method need to solve a linear system of equations at each inverse power computation, which makes it more time-consuming than the power computation of the maximal correlation method. However, the proposed method provides a more accurate initializer, which can help TAF converges faster. Hence, the overall efficiency of our algorithm is not far behind power-type methods as shown in Table 1.

Table 1. CPU time (s) for TAF using the proposed initialization method compared with other two typical initializer.

5. Conclusions

This paper proposes a new initialization method that combines the advantages of two methods constructed from two totally contrary intuitions. Both theoretical analysis and numerical experiments indicate the validness of our method and higher accuracy compared with other methods. Future work will focus on extending our initialization algorithms to the more generalized problem of PR, e.g., the quadratic sensing and the matrix reconstruction problems [21].

Author Contributions

Q.L. conceptualization, methodology, validation, and writing—original draft preparation; S.L. formal analysis, software, and writing—review and editing; H.W. supervision, project administration, and resources. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Chinese National Natural Science foundation 61977065 and National Key Research and Development Plan 2020YFA0713504.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Upper Bound of H₁

H_{1}

is the average of the smallest

T_{1}

elements of the set

{\{b_{i}^{2}\}}_{i = 1}^{m}

. As

b_{i}^{2}

has been sorted in ascending order, we have the following.

H_{1} = \frac{1}{T_{1}} \sum_{i = 1}^{T_{1}} a_{i 1}^{2} = \frac{1}{T_{1}} \sum_{i = 1}^{T_{1}} b_{i}^{2} .

By the Gaussian assumption,

| a_{i}^{T} x_{0} |^{2} = a_{i 1}^{2}

, obeys the following chi-squared distribution with probability function:

ρ (z) = ρ_{0} \frac{e^{- z / 2}}{\sqrt{z}}, z \geq 0,

(A1)

where

ρ_{0} = \frac{1}{\sqrt{2 π}}

is the normalization constant. The cumulative distribution function is the following.

F (τ) : = \int_{0}^{τ} ρ_{0} \frac{e^{- z / 2}}{\sqrt{z}} d z .

(A2)

Let

p = \frac{T_{1}}{m}

and

τ_{*} = F^{- 1} (p)

. The estimation of

H_{1}

hinges on the value of

τ_{*}

. However,

ρ (z)

is not explicitly integrable; hence we cannot derive an explicit expression of

τ_{*}

about p. To this end, we first calculate following bounds of

τ_{*}

using some basic inequalities, as the following two lemmas shows. The detailed proof is placed in Appendix B.

Lemma A1 (Upper bound of

τ_{*}

).

Let

τ_{*} = F^{- 1} (p)

, then for

p \in (0, 0.75]

, we have the following.

τ_{*} < \frac{4 p^{2}}{9 ρ_{0}^{2}} .

(A3)

Lemma A2 (Lower bound of

τ_{*}

).

Let

τ_{*} = F^{- 1} (p)

, then for

p \in (0, 1)

, we have the following.

τ_{*} > \frac{p^{2}}{4 ρ_{0}^{2}} .

(A4)

Then, we turn back to the estimation of

H_{1}

. Notice that

{\{b_{i}^{2}\}}_{i = 1}^{T_{1}}

can be approximately regarded as random variables sampled from a bounded chi-squared distribution. Therefore, we first provide an upper bound of the largest element of

{\{b_{i}^{2}\}}_{i = 1}^{T_{1}}

, namely,

b_{T_{1}}^{2}

. We prove the following result.

Theorem A1.

Assume that

F (τ_{*}) = p_{1} = T_{1} / m

. We have the following.

\begin{matrix} Pr (b_{T_{1}}^{2} \geq \frac{p_{1}^{2}}{2 ρ_{0}^{2}}) & \leq Pr (b_{T_{1}}^{2} \geq \frac{9}{8} τ_{*}) \leq exp (- \frac{m p_{1}^{2}}{144} exp (\frac{- p_{1}^{2}}{2 ρ_{0}^{2}})) . \end{matrix}

(A5)

Proof.

Let

ψ : = F (τ_{*} + \frac{1}{8} τ_{*}) - F (τ_{*})

, which satisfies the following:

ψ \geq \frac{1}{8} τ_{*} F^{'} (τ_{*} + \frac{1}{8} τ_{*}),

(A6)

as the probability function

ρ (z)

is monotonically decreasing. Since

F^{'} (τ) = ρ (τ)

and (A3), we have the following.

\begin{matrix} ψ^{2} & \geq \frac{τ_{*}^{2}}{64} \cdot \frac{ρ_{0}^{2} exp (- 9 τ_{*} / 8)}{9 τ_{*} / 8} = \frac{1}{72} ρ_{0}^{2} τ_{*} exp (\frac{- 9 τ_{*}}{8}) \\ \geq \frac{p_{1}^{2}}{288} exp (- \frac{9}{8} τ_{*}) \geq \frac{p_{1}^{2}}{288} exp (- \frac{p_{1}^{2}}{2 ρ_{0}^{2}}) . \end{matrix}

(A7)

Define the following indicator random variables:

\{r_{i} | r_{i} = 1_{[9 τ_{*} / 8, + \infty)} (τ_{i}), i = 1, \dots, m\},

(A8)

where

τ_{i}

i.i.d. obeys the chi-squared distribution with the probability function (A1) and the characteristic function of the following.

1_{[9 τ_{*} / 8, + \infty)} (τ_{i}) = \{\begin{matrix} 1, & τ_{i} \geq 9 τ_{*} / 8; \\ 0, & otherwise . \end{matrix}

(A9)

The event

\{b_{T_{1}}^{2} \geq 9 τ_{*} / 8\}

means that at least

m - T_{1} + 1

measurements are larger than

9 τ_{*} / 8

. Therefore, we have the following.

Pr (b_{T_{1}}^{2} \geq 9 τ_{*} / 8) = Pr (\sum_{i = 1}^{m} r_{i} \geq m - T_{1} + 1) = Pr (\sum_{i = 1}^{m} r_{i} > m - T_{1}) .

(A10)

The expectation of

r_{i}

is then

E r_{i} = 1 - F (9 τ_{*} / 8) .

r_{i}

is the bounded distribution and the upper bound of the sum of

r_{i}

can be provided by the well-known Hoeffding’s inequality.

Lemma A3 (Hoeffding’s inequality).

Let

X_{1}, \dots, X_{n}

be i.i.d. random variables bounded by the interval

[a, b]

. Then, the following is the case.

Pr (\bar{X} - E X \geq t) \leq exp (- \frac{2 n t^{2}}{{(b - a)}^{2}})

(A11)

Using (A10), Hoeffding’s inequality yields the following:

\begin{matrix} (A12) & Pr (b_{T_{1}}^{2} \geq τ_{*} + \frac{1}{8} τ_{*}) & = Pr (\sum_{i = 1}^{m} r_{i} > m - T_{1}) \\ = Pr (\frac{1}{m} \sum_{i = 1}^{m} r_{i} - E r_{i} > 1 - T_{1} / m - E r_{i}) \\ (A13) & = Pr (\frac{1}{m} \sum_{i = 1}^{m} r_{i} - E r_{i} > 1 - F (τ_{*}) - (1 - F (9 τ_{*} / 8))) \\ (A14) & \leq exp (- 2 m ψ^{2}), \end{matrix}

with replacements

τ_{*} / 8 \to t

and

[a, b] \to [0, 1]

in Lemma A3. Combining (A7) and (A14), we can see that (A5) holds naturally, which completes the proof. □

Let us consider the i.i.d. bounded random variables of the following.

Z_{i} : = b_{i}^{2} 1_{[0, 9 τ_{*} / 8]} (b_{i}^{2}) .

(A15)

Since

\frac{1}{T_{1}} \sum_{i = 1}^{m} Z_{i}

is larger than

H_{1}

, we can provide an upper bound of

H_{1}

by finding the upper bound of

\frac{1}{T_{1}} \sum_{i = 1}^{m} Z_{i}

.

The expectation of

Z_{i}

is the following:

E (b_{i}^{2} 1_{9 τ_{*} / 8} (b_{i}^{2})) = \int_{0}^{9 τ_{*} / 8} x ρ (x) d x \leq {(\frac{9}{8})}^{3 / 2} \int_{0}^{τ_{*}} x ρ (x) d x,

(A16)

with the following.

\begin{matrix} \int_{0}^{τ_{*}} x ρ (x) d x & = \int_{0}^{τ_{*}} ρ_{0} \sqrt{x} exp (- x / 2) d x \\ \leq \int_{0}^{τ_{*}} ρ_{0} \sqrt{x} d x = \frac{2}{3} ρ_{0} τ_{*}^{3 / 2} \leq \frac{48 p_{1}^{3}}{243 ρ_{0}^{2}} \leq \frac{p_{1}^{3}}{5 ρ_{0}^{2}} . \end{matrix}

(A17)

After computing the expectation of

Z_{i}

, we can bound

\sum_{i = 1}^{m} Z_{i}

by Hoeffding’s inequality for bounded random variables similarly.

Since

0 \leq Z_{i} \leq 9 τ_{*} / 8

, we can obtain the following result by replacing

p_{1}^{3} / (20 ρ_{0}^{2}) \to t

in Lemma A3.

Pr (\frac{1}{m} \sum_{i = 1}^{m} Z_{i} - E Z_{i} \geq \frac{p_{1}^{3}}{20 ρ_{0}^{2}}) \leq exp (- \frac{1}{50} m p_{1}^{2}) .

(A18)

Hence, we have the following.

Pr (\frac{1}{T_{1}} \sum_{i = 1}^{T} Z_{i} \geq \frac{p_{1}^{2}}{4 ρ_{0}^{2}}) \leq exp (- \frac{m p_{1}^{2}}{50}) .

(A19)

Therefore, we obtain the upper bound of

H_{1}

under statistical meaning.

Pr (H_{1} \leq \frac{p_{1}^{2}}{4 ρ_{0}^{2}}) \geq 1 - exp (- \frac{m p_{1}^{2}}{50}) .

(A20)

Appendix B. The Bound Estimation of τ *

Appendix B.1. The Upper Bound for τ *

Construct the following function.

\hat{ρ} (x) = ρ_{0} \frac{1 - \sqrt{x} / 2}{\sqrt{x}},

(A21)

Then, it can be proved that the following is the case.

e^{- x^{2} / 2} > 1 - x / 2, \forall x \in (0, 2] .

(A22)

Therefore, we have the following.

ρ (x) > \hat{ρ} (x), \forall x \in (0, 4] .

(A23)

Define

\hat{F} (τ) = \int_{0}^{τ} \hat{ρ} (x) d x

. Then, it is obvious that the following is the case.

F (x) > \hat{F} (x), \forall x \in (0, 4] .

(A24)

Since

F (τ)

and

\hat{F} (τ)

are both strictly monotone decreasing over

(0, 4]

, their inverse functions satisfy the following:

τ_{*} = F^{- 1} (p) < {\hat{F}}^{- 1} (p) : = \hat{τ}, \forall p \in (0, \hat{F} (4)] .

(A25)

where

\hat{F} (4)

is about

0.79

.

In the orthogonality promoting apporach using inverse power,

| I | / m

is always less than 1/2 to achieve a tolerant performance. Therefore, in our concerned range (

0 < p < 1 / 2

), it is accessible to estimate

τ_{*}

by function

{\hat{F}}^{- 1}

.

Now, we compute

\hat{F}

.

\hat{F} (τ) = \int_{0}^{τ} ρ_{0} \frac{1 - \sqrt{x} / 2}{\sqrt{x}} d x = ρ_{0} (4 \sqrt{τ} - τ) / 2 .

(A26)

Then, we obtain

\hat{τ}

by solving

\hat{F} (τ) = p

.

τ_{\pm} = {(2 \pm \sqrt{4 - 2 p / ρ_{0}})}^{2} .

(A27)

Picking out the unreasonable solution with excess

(0, 4]

, we have the following.

\hat{τ} = {(2 - \sqrt{4 - 2 p / ρ_{0}})}^{2} .

(A28)

Since

2 p / ρ_{0} < 1

and

\forall s \in [0, 1], 2 - \sqrt{4 - s} < s / 3

, we can determine the following upper bound of

\hat{τ}

.

\hat{τ} < {(\frac{2 p}{3 ρ_{0}})}^{2} .

(A29)

Appendix B.2. Lower Bound for τ *

Similarly, consider the following function.

\overset{ˇ}{ρ} (x) = ρ_{0} \frac{1}{\sqrt{x}} \geq ρ (x) .

(A30)

Define

\overset{ˇ}{F} (τ) = \int_{0}^{τ} \overset{ˇ}{ρ} (x) d x

. Then, it is obvious that the following is the case.

F (τ) \leq \overset{ˇ}{F} (τ) .

(A31)

Therefore, we have the following.

\overset{ˇ}{τ} = {\overset{ˇ}{F}}^{- 1} (p) = {(\frac{p}{2 ρ_{0}})}^{2} < F^{- 1} (p) = τ_{*} .

(A32)

Appendix C. The Bound of K

The sub-exponential norm of

ξ

is defined by the following.

\begin{matrix} K & = sup_{p \geq 1} p^{- 1} {(E | Z_{k} |^{p})}^{1 / p} \\ = sup_{p \geq 1} p^{- 1} {(\int_{0}^{+ \infty} x^{p} ρ_{0} \frac{exp (- x)}{\sqrt{x}} d x)}^{1 / p} . \end{matrix}

(A33)

Denote

I_{p} = \int_{0}^{+ \infty} x^{p} ρ_{0} \frac{exp (- x)}{\sqrt{x}} d x

, then through integrating by part, we have the following.

\begin{matrix} I_{p} & = - \int_{0}^{\infty} ρ_{0} x^{p - 1 / 2} d exp (- x) \\ = - (ρ_{0} e^{- x} x^{p - 1 / 2}) |_{0}^{+ \infty} + (p - 1 / 2) \int_{0}^{+ \infty} ρ_{0} x^{p - 1 - 1 / 2} exp (- x) d x \\ = (p - 1 / 2) I_{p - 1} . \end{matrix}

Noticing that

I_{0} = 1

, we have the following.

I_{p} = \prod_{i = 1}^{p} (i - 1 / 2) \leq p! .

(A34)

It is also obvious that the mean value of

ξ

is 1. Therefore,

Z_{i} = ξ_{i} - 1

obeys a centered sub-exponential distribution since its mean value is 0 and sub-exponential norm is limited.

\begin{matrix} K & = sup_{p \geq 1} p^{- 1} {(E | Z_{k} |^{p})}^{1 / p} \\ = sup_{p \geq 1} p^{- 1} {({E | ξ - 1 |}^{p})}^{1 / p} \\ \leq 1 + sup_{p \geq 1} p^{- 1} {({E | ξ |}^{p})}^{1 / p} \\ \leq 1 + sup_{p \geq 1} p^{- 1} {(p!)}^{1 / p} \leq 2 . \end{matrix}

References

Miao, J.; Charalambous, P.; Kirz, J.; Sayre, D. Extending the methodology of X-ray crystallography to allow imaging of micrometre-sized non-crystalline specimens. Nature 1999, 400, 342. [Google Scholar] [CrossRef]
Shechtman, Y.; Eldar, Y.C.; Cohen, O.; Chapman, H.N.; Miao, J.; Segev, M. Phase retrieval with application to optical imaging: A contemporary overview. IEEE Signal Process. Mag. 2015, 32, 87–109. [Google Scholar] [CrossRef] [Green Version]
Stefik, M. Inferring DNA structures from segmentation data. Artif. Intell. 1978, 11, 85–114. [Google Scholar] [CrossRef]
Fienup, C.; Dainty, J. Phase retrieval and image reconstruction for astronomy. Image Recover. Theory Appl. 1987, 231, 275. [Google Scholar]
Luo, Q.; Wang, H. The Matrix Completion Method for Phase Retrieval from Fractional Fourier Transform Magnitudes. Math. Probl. Eng. 2016, 2016, 4617327. [Google Scholar] [CrossRef] [Green Version]
Candes, E.J.; Strohmer, T.; Voroninski, V. Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming. Commun. Pure Appl. Math. 2013, 66, 1241–1274. [Google Scholar] [CrossRef] [Green Version]
Waldspurger, I.; d’Aspremont, A.; Mallat, S. Phase recovery, maxcut and complex semidefinite programming. Math. Program. 2015, 149, 47–81. [Google Scholar] [CrossRef] [Green Version]
Netrapalli, P.; Jain, P.; Sanghavi, S. Phase retrieval using alternating minimization. IEEE Trans. Signal Process. 2015, 63, 4814–4826. [Google Scholar] [CrossRef] [Green Version]
Elser, V. Phase retrieval by iterated projections. JOSA A 2003, 20, 40–55. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Candes, E.J.; Li, X.; Soltanolkotabi, M. Phase retrieval via Wirtinger flow: Theory and algorithms. IEEE Trans. Inf. Theory 2015, 61, 1985–2007. [Google Scholar] [CrossRef] [Green Version]
Zhang, H.; Liang, Y. Reshaped wirtinger flow for solving quadratic system of equations. Adv. Neural Inf. Process. Syst. 2016, 29, 2622–2630. [Google Scholar]
Wang, G.; Giannakis, G.; Saad, Y.; Chen, J. Solving most systems of random quadratic equations. arXiv 2017, arXiv:1705.10407. [Google Scholar]
Wang, G.; Giannakis, G.B.; Saad, Y.; Chen, J. Phase retrieval via reweighted amplitude flow. IEEE Trans. Signal Process. 2018, 66, 2818–2833. [Google Scholar] [CrossRef]
Luo, Q.; Wang, H.; Lin, S. Phase retrieval via smoothed amplitude flow. Signal Process. 2020, 177, 107719. [Google Scholar] [CrossRef]
Luo, Q.; Lin, S.; Wang, H. Robust phase retrieval via median-truncated smoothed amplitude flow. In Inverse Problems in Science and Engineering; Taylor & Francis: Abingdon-on-Thames, UK, 2021; pp. 1–17. [Google Scholar]
Nesterov, Y. Introductory Lectures on Convex Optimization: A Basic Course; Springer Science: New York, NY, USA, 2013; Volume 87. [Google Scholar]
Chen, P.; Fannjiang, A.; Liu, G.R. Phase retrieval by linear algebra. SIAM J. Matrix Anal. Appl. 2017, 38, 854–868. [Google Scholar] [CrossRef] [Green Version]
Wishart, J. The generalised product moment distribution in samples from a normal multivariate population. Biometrika 1928, 20, 32–52. [Google Scholar] [CrossRef] [Green Version]
Vershynin, R. Introduction to the non-asymptotic analysis of random matrices. arXiv 2010, arXiv:1011.3027. [Google Scholar]
Li, C.K.; Li, R.C. A note on eigenvalues of perturbed Hermitian matrices. Linear Algebra Its Appl. 2005, 395, 183–190. [Google Scholar] [CrossRef] [Green Version]
Chi, Y.; Lu, Y.M.; Chen, Y. Nonconvex optimization meets low-rank matrix factorization: An overview. IEEE Trans. Signal Process. 2019, 67, 5239–5269. [Google Scholar] [CrossRef] [Green Version]

Figure 1. RMSE vs. oversampling rate of several initialization method.

n = 100

,

T_{1} = n, T_{2} = 0.3 n

for the composite initialization method, while for the other algorithms, the involved parameters are selected according to related articles.

Figure 1. RMSE vs. oversampling rate of several initialization method.

n = 100

,

T_{1} = n, T_{2} = 0.3 n

for the composite initialization method, while for the other algorithms, the involved parameters are selected according to related articles.

Figure 2. Empirical success rate of different initialization methods versus number of measurements with

n = 100

, with

m / n

varying form 1 to 4 for TAF.

Figure 2. Empirical success rate of different initialization methods versus number of measurements with

n = 100

, with

m / n

varying form 1 to 4 for TAF.

Table 1. CPU time (s) for TAF using the proposed initialization method compared with other two typical initializer.

	Method	Initialization Stage	TAF Stage	Overall
	Null vector	1.24	1.34	2.58
$m / n = 2$	Maximal correlation	0.47	0.76	1.23
	Proposed	0.72	0.70	1.42
	Null vector	1.12	1.02	2.14
$m / n = 3$	Maximal correlation	0.42	0.49	0.91
	Proposed	0.73	0.47	1.2
	Null vector	0.95	1.02	1.97
$m / n = 5$	Maximal correlation	0.44	0.45	0.89
	Proposed	0.67	0.46	1.13

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.

A Composite Initialization Method for Phase Retrieval

Abstract

1. Introduction

1.1. Prior Art

1.2. This Work

1.3. Article Organization and Notations

2. The Formulation of the Composite Initialization Method

3. Theoretical Analysis

3.1. Analysis of Each Component of the Spectral Matrix

3.1.1. Upper Bound of α = H 1 / H 2

3.1.2. Lower Bound of the Smallest Eigenvalue of G

3.1.3. The Upper Bound of ∥ E ∥ 2

3.2. Estimate of Δ λ

3.3. Computing ∥ v − v ˜ ∥

3.4. Main Result

4. Numerical Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Upper Bound of H1

Appendix B. The Bound Estimation of τ *

Appendix B.1. The Upper Bound for τ *

Appendix B.2. Lower Bound for τ *

Appendix C. The Bound of K

References

Article Metrics

Article Access Statistics

3.1.1. Upper Bound of $α = H_{1} / H_{2}$

3.1.2. Lower Bound of the Smallest Eigenvalue of $G$

3.1.3. The Upper Bound of ${∥ E ∥}^{2}$

3.2. Estimate of $Δ λ$

3.3. Computing $∥ v - \tilde{v} ∥$

Appendix A. Upper Bound of H₁