Next Article in Journal
Asymptotics and Hille-Type Results for Dynamic Equations of Third Order with Deviating Arguments
Next Article in Special Issue
Least-Squares Finite Element Method for Solving Stokes Flow under Point Source Magnetic Field
Previous Article in Journal
A New Method for 3-Satisfiability Problem Solving Space Structure on Structural Entropy
Previous Article in Special Issue
A New Application of Gauss Quadrature Method for Solving Systems of Nonlinear Equations
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Composite Initialization Method for Phase Retrieval

1
Department of Mathematics, National University of Defense Technology, Changsha 410000, China
2
College of Meteorology and Oceanography, National University of Defense Technology, Changsha 410000, China
*
Author to whom correspondence should be addressed.
Symmetry 2021, 13(11), 2006; https://doi.org/10.3390/sym13112006
Submission received: 16 September 2021 / Revised: 18 October 2021 / Accepted: 19 October 2021 / Published: 23 October 2021
(This article belongs to the Special Issue Symmetry in Numerical Analysis and Numerical Methods)

Abstract

:
Phase retrieval is a classical inverse problem with respect to recovering a signal from a system of phaseless constraints. Many recently proposed methods for phase retrieval such as PhaseMax and gradient-descent algorithms enjoy benign theoretical guarantees on the condition that an elaborate estimate of true solution is provided. Current initialization methods do not perform well when number of measurements are low, which deteriorates the success rate of current phase retrieval methods. We propose a new initialization method that can obtain an estimate of the original signal with uniformly higher accuracy which combines the advantages of the null vector method and maximal correlation method. The constructed spectral matrix for the proposed initialization method has a simple and symmetrical form. A lower error bound is proved theoretically as well as verified numerically.

1. Introduction

Phase retrieval (PR) has many applications in science and engineering, which include X-ray crystallography [1], molecular imaging [2], biological imaging [3], and astronomy [4]. Mathematically, phase retrieval is a problem about finding a signal x R n or C n satisfying the following phaseless constraints:
b i = | a i , x | + ϵ i , i = 1 , , m .
where b i R is the observed measurement, a i R n or C n is the measuring vector, and ϵ i denotes noise. In optics and quantum physics, a i is related to the Fourier transform or the Fractional Fourier transform [5]. To remove the theoretical barrier, recent works also focus on the generic measurement vectors, namely a i R independently sampled from N ( 0 , I n × n ) .
Current methods for solving phase retrieval can be generally divided into two groups: convex and non-convex approaches. Convex methods, e.g., PhaseLift [6] and PhaseCut [7], utilize a semidefinite relaxation to transfer the original problem into a convex one. Nevertheless, the heavy computational cost of these methods hinders its application to practical scenarios. On the other hand, non-convex methods including the Gerchberg–Saxton (GS) [8,9], the (truncated) Wirtinger flow (WF/TWF) [10,11], and the (truncated) amplitude flow (AF/TAF) have a significant advantage in lower computational cost and sampling complexity (i.e., the minimal required m / n to achieve successful recovery) [12]. More recently, two variants of AF using the reweighting [13] and the smoothing technique [14,15] have been proposed to further lower sampling complexity. However, for the mentioned nonconvex methods, the rate of successful reconstruction relies upon elaborate initialization. Moreover, an initialization point nearby the true solution is also a necessity for convergency property in theoretical analysis.

1.1. Prior Art

Current initialization methods for PR problem include spectral method [16], the orthogonality promoting method [12], null vector method [17], and the (weighted) maximal correlation method [13]. These methods first construct a spectral matrix in the form of
M = i = 1 m w i a i a i H ,
where the weight w i is a positive real number. The estimate is then given as the (scaled) eigenvector corresponding to the largest or smallest eigenvalue of M . The spectral method and the (weighted) maximal correlation method fall into one category as they seek the leading vector of M as the estimate. We categorize these methods as correlation-focused methods, since their spectral matrices endow more weight to the measuring vectors more correlated to the original signal. Conversely, the null vector and orthogonality promoting method can be regarded as orthogonality-focused methods. For the vector a i with less correlation to the original signal, the corresponding weight w i is larger in the spectral matrix.
These two types of initialization methods each have their own merits and demerits. The correlation-focused methods perform better when the oversalting rate m / n is low. While the situation is contrary as m / n is large enough, the orthogonality-focused method provides a more accurate estimate.

1.2. This Work

In this paper, we propose a method combining the advantages of the above two types of initialization methods. Our intuition is to construct a composite spectral matrix, which is a weighted sum of spectral matrices of correlation-focused methods and orthogonality-focused methods. Hence, the new method is termed as the composite initialization method.

1.3. Article Organization and Notations

This paper adopts the following notations. The bold-font lower-case letter denotes a column vector, e.g., b , z . Bold capital letters such as A denote matrices. · T and · H stand for the transpose and the conjugate transpose, respectively. Calligraphic letters such as I denote a index set, and | I | denotes the number of elements of I . a i i = 1 m is a simple notation of a set of elements in the form of a 1 , , a m . x 2 denotes the 2 -norm of x . x means x 2 for the sake of simplicity, if not specially specified.
The reminder of this article is organized as follows. The algorithm is given in Section 2. Section 3 provides the theoretical error analysis about the proposed initialization method with some relevant technique error bounds being placed in the Appendix. Section 4 illustrates the numerical performance and makes a comparison of the proposed method with other initialization methods.

2. The Formulation of the Composite Initialization Method

This section provides the formulation of the proposed method. Without loss of generality, we assume that b 1 b 2 b m . Measuring vectors { a i } i = 1 m are assumed to be independently sampled centered Gaussian variables with identity covariance for the convenience of theoretical analysis. Let x 0 be the true solution of (1). For concreteness, we focus on the real-valued Gaussian model. We further assume that x 0 = 1 since this investigation focuses on the estimation of the original signal. In practice, x 0 can be estimated by x 0 2 1 m 1 m b i 2 . To begin with, we describe the null vector and the maximal correlation method which form the basis of our algorithm.
The null vector method first picks out a subset of vectors a i i I 1 , where I 1 1 , 2 , , m . Then, the null vector method approximates the original signal x 0 by a vector that is most orthogonal to this subset. Since b i i = 1 m is in the ascending order, I 1 can be directly written as 1 , 2 , , T 1 , where T 1 is the cardinality of I 1 . The intuition of the null vector method is as simple as follows. Since i I 1 x H a i a i H x takes a very small value i I 1 b i 2 when x = x 0 , one can construct the following minimizing problem to estimate x 0 to coincide this property.
x ^ = arg min z = 1 z H 1 T 1 i = 1 T 1 a i a i H z .
Solving (2) is equivalent to finding the smallest eigenvalue and the corresponding eigenvector of M 1 : = 1 T i = 1 T 1 a i a i H . The so-called orthogonality promoting method proposed by Wang is an approximately equivalent method that transforms the minimization problem (2) into a maximization problem.
The maximal correlation method is based on the opposite intuition, which picks out a subset of vectors a i i I 2 corresponding to the largest T 2 elements in b i i = 1 m . As b i i = 1 m is in the ascending order, I 2 can also be written directly as m T 2 + 1 , , m . The core idea of maximal correlation method is searching for the vector that is most correlated with a i i I 2 . This is achieved by solving the following maximizing problem.
x ^ = arg max z = 1 1 T 2 i = m T 2 + 1 m | a i , z | 2 = arg max z = 1 z H 1 T 2 i = m T 2 + 1 m a i a i H z .
From (2) and (3), one can observe that these two methods only utilize a subset of vectors that are either most correlated or most orthogonal to the original signal. To make use of as much as possible information, we try to solve a composite problem combining (2) and (3) together.
x ^ = arg min z = 1 z H 1 T 1 i = 1 T 1 a i a i H z α z H 1 T 2 i = m T 2 + 1 m a i a i H z .
We can observe that (4) has a symmetrical structure and can be interpreted as a modified version of (2), which adds the objection of (3) as a penalization term. The solution of (4) will be a more accurate estimate than that of (3) since it utilize the information from (2).
Let H 1 = i = 1 T 1 b i 2 / T 1 , H 2 = i = m T 2 + 1 T 2 b i 2 / T 2 . We set the parameter as α = H 1 / H 2 in (4). The true signal x is roughly the solution of (2) and (3); thus, it is also the approximated solution of (4). In next section, we will analyze the error of our proposed method.
The algorithm is presented in Algorithm 1. 2 α I is added to ensure the positivity of M . The x k + 1 in Step 4 is obtained by solving the following system of linear equations:
M x = x k .
which can be solved by the conjugate descent method since M is a positive and Hermitian by using x k 1 as initializer.
Since x 0 and e i ϕ x 0 are indistinguishable for the phase retrieval, we evaluate the estimate using the following metric.
dist x ˜ , x 0 = min ϕ [ 0 , 2 π ) e i ϕ x ˜ x 0 .
The other initialization methods, e.g., the spectral method, the truncated spectral method, and the (weighted) maximal correlation method, are all based on the power iteration. The spectral method is for finding the leading eigenvector of matrix i = 1 m b i 2 a i a i T , which is also realized by the power approach. Chen proposed the truncated spectral method to improve the performance of the spectral method, which constructed another matrix via the threshold parameter τ [17]. The corresponding matrix for iteration is i = 1 m 1 τ ( b i ) b i 2 a i a i H , where 1 τ is defined by the following.    
1 τ ( b i ) = 1 , if b i τ b 2 / m ; 0 , else .
The numerical performance of these initialization methods will be compared later.
Algorithm 1 The composite initialization method.
Input: Increasingly arranged b i i = 1 m and corresponding a i i = 1 m , truncated value T 1 , T 2 , threshold value ϵ , a random initialization x 1 .
1:
M 1 = 1 T 1 i = 1 T 1 a i a i H , M 2 = 1 T 2 i = T 2 + 1 m a i a i H , α = ( 1 T 1 i = 1 T 1 b i 2 ) / ( 1 T 2 i = m T 2 + 1 m b i 2 )
2:
M = M 1 α M 2 + 2 α I
3:
for k = 1 , 2 , do
4:
x k + 1 M 1 x k
5:
x k + 1 x k / x k + 1
6:
if x k + 1 x k ϵ , break
7:
end for
Output: x OPMP = x k + 1 / x k + 1 .

3. Theoretical Analysis

In this section, we will present the error estimate of the proposed method under Gaussian assumptions in the real case.
Since the measuring vectors are assumed to be i.i.d. Gaussians, we can assume that x 0 = e 1 without loss of generality, where e 1 = [ 1 , 0 , , 0 ] T . Otherwise if x 0 e 1 , there exists an orthogonal matrix, P , satisfying P x 0 = e 1 . Denote A = a 1 , a 2 , , a m , then A T x = A T P T P x . Let z = Px , then the original problem (1) is identical to PA T z = b because PA is composed by Gaussian vectors due to the invariance of Gaussian under the orthogonal transform.
The spectral matrix for orthogonality promoting method is the following.
M 1 = 1 T 1 i = 1 T 1 a i a i T .
By denoting d i = a i 2 , a i 3 , , a i n T , then M can be written as follows:
M 1 = 1 T 1 i = 1 T 1 a i 1 2 a i 1 d i T a i 1 d i d i d i T = H 1 E 1 T E 1 G 1
where the following is the case.
H 1 = 1 T 1 i = 1 T 1 a i 1 2 , E 1 = 1 T 1 i = 1 T 1 a i 1 d i , G 1 = 1 T 1 i = 1 T 1 d i d i T .
The matrix for maximal correlation method is the following:
M 2 = 1 T 2 i = m T 2 + 1 m a i 1 2 a i 1 d i T a i 1 d i d i d i T = H 2 E 2 T E 2 G 2
where the following is the case.
H 2 = 1 T 2 i = m T 2 + 1 m a i 1 2 , E 2 = 1 T 2 i = m T 2 + 1 m a i 1 d i , G 2 = 1 T 2 i = m T 2 + 1 m d i d i T .
The constructed matrix for estimation is the following.
M = M 1 α M 2 , α = H 1 / H 2 .
To proceed, we notice the following basic facts:
1.
The orthogonality promoting method is for finding the eigenvector of minimum eigenvalue of M .
M = 0 E 1 T α E 2 T E 1 α E 2 G 1 α G 2 : = 0 E T E G .
2.
In the ideal case that E is a zero matrix, M degenerates to the following.
M ˜ = 0 0 0 G .
If we can also ensure that G = G 1 α G 2 is positively definite, the eigenvector corresponding to the smallest eigenvalue is exactly the true signal.
These two facts inspire us to estimate the error by computing the eigenvector of matrix M ˜ after adding perturbation E = E 1 α E 2 . Thus, the outline of theoretical analysis is concluded as follows:
1.
Estimate Δ λ = λ λ ˜ , where λ and λ ˜ are the minimum eigenvalue of M and M ˜ , respectively;
2.
With Δ λ , we can then compute the perturbation of corresponding eigenvector, v v ˜ , which is the exact error of our algorithm.
Specifically, we have the following roadmap of theoretical analysis. Section 3.1 presents bound results for each component of the spectral matrix M . Using the results in Section 3.1, the bounds of Δ λ can then be easily obtained in Section 3.2. The relationship between perturbation of eigenvalues and eigenvectors is presented in Section 3.3, which finally induces the error estimation of our algorithm formally in Section 3.4.

3.1. Analysis of Each Component of the Spectral Matrix

In this part, we will provide the bounds for the variables involved of matrix M , which are basic ingredients for estimating perturbation of eigenvalue and eigenvector, namely Δ λ and v v ˜ . In particular, the upper bound for H 1 , lower bound for H 2 , and the norm of E will be analysed.

3.1.1. Upper Bound of α = H 1 / H 2

Finding the upper bound of α actually consists in finding the upper bound of H 1 and the lower bound of H 2 . First, we have the upper bound of H 1 under statistical meaning in the follow lemma.
Lemma 1.
We have the following:
Pr ( H 1 p 1 2 4 ρ 0 2 ) 1 exp m p 1 2 50
where p 1 = T 1 / m .
The proof of Lemma 1 is placed in the Appendix A and Appendix B. As for the lower bound of H 2 , we borrow a result from Wang [13], Lemma 3.
Lemma 2.
The following holds with probability exceeding 1 e c 0 m :
H 2 0.99 ( 1 + log ( m / T 2 ) ) ,
provided that T 2 c 1 n , m c 2 T 2 for some absolute constants c 0 , c 1 , and c 2 .
Then, we obtained the required upper bound of H 1 / H 2 .
Lemma 3.
Under the set-up of Lemmas 1 and 2, we have the following:
α = H 1 / H 2 p 1 2 3.96 ρ 0 2 ( 1 log p 2 )
with probability of at least
1 exp m p 1 2 50 exp ( c 0 m ) ,
where ρ 0 = 1 / 2 π , p 1 , and p 2 denote T 1 / m and T 2 / m , respectively.

3.1.2. Lower Bound of the Smallest Eigenvalue of G

Since G is a linear combination of G 1 and G 2 , the lower bound of the smallest eigenvalue of G can be estimated from the bounds of eigenvalues of G 1 and G 2 .
According to (10) and (12), we rewrite G 1 and G 2 as the following:
G 1 = 1 T 1 D 1 T D 1 , G 2 = 1 T 2 D 2 T D 2 ,
where D 1 = [ d 1 , , d T 1 ] T and D 2 = [ d m T 2 + 1 , , d m ] T .   D 1 and D 2 are termed Gaussian matrices here since their entries are all sampled from the Gaussian distribution. We notice that D 1 T D 1 and D 2 T D 2 are Wishart matrices, which can be written as the product of a Gaussian matrix and its adjoint [18]. Moreover, the eigenvalue perturbation of Wishart matrix obeys the following classical result.
Theorem 1
([19], Corollary 5.35). Let A be an N 1 × N 2 matrix for which its entries are independent standard normal random variables. Then, for every t 0 , with a probability of at least 1 2 exp t 2 / 2 , the following events hold simultaneously:
N 1 N 2 t s min ( A ) ,
s max ( A ) N 1 + N 2 + t ,
where s min ( A ) and s max ( A ) stand for the smallest and the largest singular value of A , respectively.
By pplying Theorem 1 to D 1 R T 1 × ( n 1 ) with the following replacements:
(1)
D 1 A ,
(2)
T 1 N 1 , n 1 N 2 ,
(3)
n 1 / 10 t ,
one can see that the following is the case.
Pr s min ( D 1 ) T 1 1.1 n 1 1 2 exp n 1 200 .
In theoretical analysis, we assume that T 1 is large enough such that T 1 1.1 n 1 0 . However, it should be observed that setting T 1 n suffices to output reasonable estimates as shown in the numerical test. Based on this assumption, one then has the following.
s min ( G 1 ) = s min ( 1 T 1 D 1 T D 1 ) = 1 T 1 s min 2 ( D 1 ) 1 T 1 ( T 1 1.1 n 1 ) 2 = ( 1 1.1 s 1 ) 2
The above holds with probability exceeding 1 2 exp ( n 1 200 ) , where s 1 = ( n 1 ) / T 1 . Similarly, we have the following:
Pr s max ( G 2 ) 1 + 1.1 s 2 2 1 2 exp n 1 200
where s 2 = ( n 1 ) / T 2 .
Hence, we obtain the lower bound for the smallest eigenvalue of G.
s min ( G ) = s min ( G 1 α G 2 ) s min ( G 1 ) α s max ( G 2 ) 1 1.1 s 1 2 α 1 + 1.1 s 2 2 .
A necessary condition for the validness of our method is s min ( G ) 0 , which can be satisfied by choosing proper s 1 and s 2 .

3.1.3. The Upper Bound of E 2

Now, let us turn to the estimation of the norm of E . We denote each item of E as ξ k .
ξ k = 1 T 1 i = 1 T 1 a i 1 a i k α T 2 i = m T 2 + 1 m a i 1 a i k .
Since a i k obeys normal distribution, H 1 = 1 T 1 i = 1 T 1 a i 1 2 and H 2 = 1 T 2 i = m T 2 + 1 m a i 1 2 , we can derive the following.
ξ k N ( 0 , H 1 T 1 + α 2 H 2 T 2 ) .
Let ψ k = ξ k / H 1 T 1 + α 2 H 2 T 2 , then ψ k 2 obeys chi-squared distribution, which is a sub-exponential distribution. With ψ k , we can reform E 2 2 as the following.
E 2 = H 1 ( n 1 ) T 1 + α 2 H 2 ( n 1 ) T 2 1 n 1 k = 2 n ξ k 2 = H 1 s 1 + H 1 α s 2 k = 2 n ξ k 2 n 1 .
The expectation of E 2 2 is obviously H 1 ( s 1 + s 2 α ) .
The variance of E 2 2 is also needed in order obtain the upper bound of E 2 2 . To this end, we need to recall the Bernstein-type inequality for sub-exponential random variable.
Theorem 2.
Let X 1 , , X N be i.i.d. centered sub-exponential random variables with sub-exponential norm denoted as K. Then, for every t 0 , we have the following:
Pr 1 N | i = 1 N X i | t 2 exp c N min t 2 K 2 , t K
where c > 0 is an absolute constant. K is the so-called sub-exponential norm defined as the following.
K = sup p 1 1 p ( E | X 1 | p ) 1 / p ,
Define a centered sub-exponential random variable Z i = ξ i 1 . The sub-exponential norm of Z i is computed in Appendix C. Appendix C tells that the sub-exponential norm K = 1 in our case. By replacing n 1 N , 0.1 t and K 1 in Theorem 2, one can then conclude the following result.
Theorem 3.
Since K 2 , let t = 0.1 , then we have the following:
Pr 1 n 1 i = 2 n ξ i 2 1 0.1 exp c n 1 100
where c > 0 is an absolute constant.
Gathering these results, we have the following theorem.
Theorem 4.
E 2 1.1 H 1 ( s 1 + s 2 α )
The above holds with the probability stated below:
1 exp c n 1 100 ,
with some absolute constant c > 0 .

3.2. Estimate of Δ λ

To make an estimate for Δ λ , we first recall a classical result from matrix perturbation theory in [20].
Theorem 5.
Let the following be the case:
M = H E T E G , M ˜ = H 0 0 G
Let the above be Hermitian and set η = min | μ ν | over all μ σ ( H ) and ν σ ( G ) , where σ ( H ) stands for the set of eigenvalue of H . Then, the following is the case:
max 1 j n | λ j λ ˜ j | 2 E 2 2 η + η 2 + 4 E 2 2 .
where λ j and λ ˜ j are the j-th smallest one among the eigenvalues of M and M ˜ , respectively.
To ensure the effectiveness of our algorithm, H 1 should still be the minimum eigenvalue after adding perturbation in our case. Then, η is simply s min ( G ) that is bounded by (25). Hence, Δ λ can be estimated by the following.
| Δ λ | max 1 j n | λ j λ ˜ j | 2 E 2 2 η + η 2 + 4 E 2 2 2 E 2 2 2 η 1.1 H 1 ( s 1 + α s 2 ) ( 1 1.1 s 1 ) 2 α ( 1 1.1 s 2 ) 2 .

3.3. Computing v v ˜

With the upper bound of Δ λ , we can now estimate the perturbation of the eigenvector. The minimum eigenvector of M is computed by solving M ( λ + Δ λ ) I n v = 0 :
M ( λ + Δ λ ) I m v = Δ λ E T E G ( λ + Δ λ ) I n 1 v = 0 ,
where I n denotes the n × n identity matrix here. Without loss of generality, the solution of (36) can be written in the form of v = 1 , δ v T . Then, we have the following.
δ v = G ( λ + Δ λ ) I n 1 1 E .
Therefore, v v ˜ can be bounded by the following:
v v ˜ 2 δ v 2 (38) G ( λ + Δ λ ) I n 1 1 2 E 2 (39) E 2 / η E 2 / η ,
where (38) is derived based on (37). The form of (39) would be a bit complicated, as stated as follows.
E 2 η E 2 / η 1.1 H 1 ( s 1 + s 2 α ) 1 1.1 s 1 2 α 1 + 1.1 s 2 2 1 1.1 s 1 2 α 1 + 1.1 s 2 2 2 1.1 H 1 ( s 1 + s 2 α ) .

3.4. Main Result

Notice that v v ˜ is exactly x 0 x ˜ . Then, it is straightforward to induce our main theorem by combining the above results.
Theorem 6 (Main result).
Consider the problem of estimating arbitrary x R n from m phaseless measurements (1) under the Gaussian assumption. Suppose the output of Algorithm 1 is x ˜ 0 . If m c 1 T 1 , m c 2 T 2 , T 1 > n , and m T 1 + T 2 , then we have the following error estimate for the composite initialization method:
x 0 x ˜ 2 R Q R Q ,
with a probability of at least the following:
P = 1 exp m p 1 2 144 e p 1 2 2 ρ 0 2 e m p 1 2 50 2 e n 1 200 e c n 1 100 e c 0 m ,
where the following is the case,
Q = 1.1 p 1 2 4 ρ 0 2 ( 1 log p 2 ) s 1 + 0.99 p 1 2 s 2 ( 1 log p 2 ) ,
R = 4 ρ 0 2 ( 1 log p 2 ) ( 1 1.1 s 1 ) 2 0.99 p 1 2 ( 1 + 1.1 s 2 ) 2 2 ,
p i = T i / m , s i = ( n 1 ) / T i for i = 1 , 2 , and some absolute constants c 0 , c 1 , c 2 , ρ 0 0 .
Probability P in (42) can be negative when m and n are too small, which shows the limitation of our analysis since m and n need to be large enough so as to render the probability reasonable. In the extreme case that p 1 , p 2 , s 1 and s 2 are close to 0, our error estimation expression (41) can be approximated by a simpler form:
x 0 x ˜ 2 1.1 p 1 2 s 1 10 ρ 0 2
which is verified by numerical experiments later.

4. Numerical Results

In this section, we test the accuracy of the proposed method and compare it with other methods, including the spectral method, the truncated spectral method, the reweighted maximal correlation method, the null vector method, and the orthogonality method. The sampling vectors a i and the original signal x 0 R 100 are independently randomly generated. To eliminate the influence of error brought by estimation of x 0 , the original signal x 0 is normalized. We chose T 1 = n , T 2 = 0.3 n for numerical experiments for the proposed method. All the following simulation results are averaged over 80 independent Monte Carlo realizations.
Figure 1 provides the RMSE calculated by (6) versus the oversampling rate m / n of the mentioned initialization methods. Obviously, all methods exhibit better performance as m / n increases. In particular, the proposed initialization method outweighs the other methods. When m / n 10 roughly, the composite initialization method performs closely as the null vector does, and the convergency behavior coincides with (45). When m / n 10 , the proposed method does not lose its accuracy as dramatically as the null vector method does. The proposed algorithm provides the most accurate estimate when oversampling rate is below the information limit, i.e., m / n 2 .
Figure 2 illustrates the importance of initialization method on refining the success rate of TAF algorithm [12]. In each simulation, each initialization generates an estimate as the initializer for the TAF algorithm. A trial is considered to be successful if TAF algorithm can return a result with RMSE less than 10 5 . When m / n is large, all the presented initialization methods ensure almost 100% success rate. However, when m / n approaches 0, the differences in success rates of various methods appear gradually. Therefore, the composite initialization method can help TAF to achieve better success rate compared with the other two methods.
Table 1 illustrates the CPU time needed for the TAF using our method as an initializer compared with other two typical initialization methods. To distinguish the performance more clearly, the length of signal is set as a large number n = 1000 . The proposed method and null vector method need to solve a linear system of equations at each inverse power computation, which makes it more time-consuming than the power computation of the maximal correlation method. However, the proposed method provides a more accurate initializer, which can help TAF converges faster. Hence, the overall efficiency of our algorithm is not far behind power-type methods as shown in Table 1.

5. Conclusions

This paper proposes a new initialization method that combines the advantages of two methods constructed from two totally contrary intuitions. Both theoretical analysis and numerical experiments indicate the validness of our method and higher accuracy compared with other methods. Future work will focus on extending our initialization algorithms to the more generalized problem of PR, e.g., the quadratic sensing and the matrix reconstruction problems [21].

Author Contributions

Q.L. conceptualization, methodology, validation, and writing—original draft preparation; S.L. formal analysis, software, and writing—review and editing; H.W. supervision, project administration, and resources. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Chinese National Natural Science foundation 61977065 and National Key Research and Development Plan 2020YFA0713504.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Upper Bound of H1

H 1 is the average of the smallest T 1 elements of the set b i 2 i = 1 m . As b i 2 has been sorted in ascending order, we have the following.
H 1 = 1 T 1 i = 1 T 1 a i 1 2 = 1 T 1 i = 1 T 1 b i 2 .
By the Gaussian assumption, | a i T x 0 | 2 = a i 1 2 , obeys the following chi-squared distribution with probability function:
ρ ( z ) = ρ 0 e z / 2 z , z 0 ,
where ρ 0 = 1 2 π is the normalization constant. The cumulative distribution function is the following.
F ( τ ) : = 0 τ ρ 0 e z / 2 z d z .
Let p = T 1 m and τ * = F 1 ( p ) . The estimation of H 1 hinges on the value of τ * . However, ρ ( z ) is not explicitly integrable; hence we cannot derive an explicit expression of τ * about p. To this end, we first calculate following bounds of τ * using some basic inequalities, as the following two lemmas shows. The detailed proof is placed in Appendix B.
Lemma A1 (Upper bound of τ * ).
Let τ * = F 1 ( p ) , then for p ( 0 , 0.75 ] , we have the following.
τ * < 4 p 2 9 ρ 0 2 .
Lemma A2 (Lower bound of τ * ).
Let τ * = F 1 ( p ) , then for p ( 0 , 1 ) , we have the following.
τ * > p 2 4 ρ 0 2 .
Then, we turn back to the estimation of H 1 . Notice that b i 2 i = 1 T 1 can be approximately regarded as random variables sampled from a bounded chi-squared distribution. Therefore, we first provide an upper bound of the largest element of b i 2 i = 1 T 1 , namely, b T 1 2 . We prove the following result.
Theorem A1.
Assume that F ( τ * ) = p 1 = T 1 / m . We have the following.
Pr b T 1 2 p 1 2 2 ρ 0 2 Pr b T 1 2 9 8 τ * exp m p 1 2 144 exp p 1 2 2 ρ 0 2 .
Proof. 
Let ψ : = F ( τ * + 1 8 τ * ) F ( τ * ) , which satisfies the following:
ψ 1 8 τ * F ( τ * + 1 8 τ * ) ,
as the probability function ρ ( z ) is monotonically decreasing. Since F ( τ ) = ρ ( τ ) and (A3), we have the following.
ψ 2 τ * 2 64 · ρ 0 2 exp 9 τ * / 8 9 τ * / 8 = 1 72 ρ 0 2 τ * exp 9 τ * 8 p 1 2 288 exp 9 8 τ * p 1 2 288 exp p 1 2 2 ρ 0 2 .
Define the following indicator random variables:
r i | r i = 1 [ 9 τ * / 8 , + ) ( τ i ) , i = 1 , , m ,
where τ i i.i.d. obeys the chi-squared distribution with the probability function (A1) and the characteristic function of the following.
1 [ 9 τ * / 8 , + ) ( τ i ) = 1 , τ i 9 τ * / 8 ; 0 , otherwise .
The event b T 1 2 9 τ * / 8 means that at least m T 1 + 1 measurements are larger than 9 τ * / 8 . Therefore, we have the following.
Pr b T 1 2 9 τ * / 8 = Pr i = 1 m r i m T 1 + 1 = Pr i = 1 m r i > m T 1 .
The expectation of r i is then
E r i = 1 F ( 9 τ * / 8 ) .
r i is the bounded distribution and the upper bound of the sum of r i can be provided by the well-known Hoeffding’s inequality.
Lemma A3 (Hoeffding’s inequality).
Let X 1 , , X n be i.i.d. random variables bounded by the interval [ a , b ] . Then, the following is the case.
Pr ( X ¯ E X t ) exp 2 n t 2 ( b a ) 2
Using (A10), Hoeffding’s inequality yields the following:
(A12) Pr b T 1 2 τ * + 1 8 τ * = Pr i = 1 m r i > m T 1 = Pr 1 m i = 1 m r i E r i > 1 T 1 / m E r i (A13) = Pr 1 m i = 1 m r i E r i > 1 F ( τ * ) 1 F ( 9 τ * / 8 ) (A14) exp 2 m ψ 2 ,
with replacements τ * / 8 t and [ a , b ] [ 0 , 1 ] in Lemma A3. Combining (A7) and (A14), we can see that (A5) holds naturally, which completes the proof. □
Let us consider the i.i.d. bounded random variables of the following.
Z i : = b i 2 1 [ 0 , 9 τ * / 8 ] ( b i 2 ) .
Since 1 T 1 i = 1 m Z i is larger than H 1 , we can provide an upper bound of H 1 by finding the upper bound of 1 T 1 i = 1 m Z i .
The expectation of Z i is the following:
E b i 2 1 9 τ * / 8 ( b i 2 ) = 0 9 τ * / 8 x ρ ( x ) d x 9 8 3 / 2 0 τ * x ρ ( x ) d x ,
with the following.
0 τ * x ρ ( x ) d x = 0 τ * ρ 0 x exp ( x / 2 ) d x 0 τ * ρ 0 x d x = 2 3 ρ 0 τ * 3 / 2 48 p 1 3 243 ρ 0 2 p 1 3 5 ρ 0 2 .
After computing the expectation of Z i , we can bound i = 1 m Z i by Hoeffding’s inequality for bounded random variables similarly.
Since 0 Z i 9 τ * / 8 , we can obtain the following result by replacing p 1 3 / ( 20 ρ 0 2 ) t in Lemma A3.
Pr ( 1 m i = 1 m Z i E Z i p 1 3 20 ρ 0 2 ) exp 1 50 m p 1 2 .
Hence, we have the following.
Pr 1 T 1 i = 1 T Z i p 1 2 4 ρ 0 2 exp m p 1 2 50 .
Therefore, we obtain the upper bound of H 1 under statistical meaning.
Pr ( H 1 p 1 2 4 ρ 0 2 ) 1 exp m p 1 2 50 .

Appendix B. The Bound Estimation of τ *

Appendix B.1. The Upper Bound for τ *

Construct the following function.
ρ ^ ( x ) = ρ 0 1 x / 2 x ,
Then, it can be proved that the following is the case.
e x 2 / 2 > 1 x / 2 , x ( 0 , 2 ] .
Therefore, we have the following.
ρ ( x ) > ρ ^ ( x ) , x ( 0 , 4 ] .
Define F ^ ( τ ) = 0 τ ρ ^ ( x ) d x . Then, it is obvious that the following is the case.
F ( x ) > F ^ ( x ) , x ( 0 , 4 ] .
Since F ( τ ) and F ^ ( τ ) are both strictly monotone decreasing over ( 0 , 4 ] , their inverse functions satisfy the following:
τ * = F 1 ( p ) < F ^ 1 ( p ) : = τ ^ , p ( 0 , F ^ ( 4 ) ] .
where F ^ ( 4 ) is about 0.79 .
In the orthogonality promoting apporach using inverse power, | I | / m is always less than 1/2 to achieve a tolerant performance. Therefore, in our concerned range ( 0 < p < 1 / 2 ), it is accessible to estimate τ * by function F ^ 1 .
Now, we compute F ^ .
F ^ ( τ ) = 0 τ ρ 0 1 x / 2 x d x = ρ 0 4 τ τ / 2 .
Then, we obtain τ ^ by solving F ^ ( τ ) = p .
τ ± = 2 ± 4 2 p / ρ 0 2 .
Picking out the unreasonable solution with excess ( 0 , 4 ] , we have the following.
τ ^ = 2 4 2 p / ρ 0 2 .
Since 2 p / ρ 0 < 1 and s [ 0 , 1 ] , 2 4 s < s / 3 , we can determine the following upper bound of τ ^ .
τ ^ < 2 p 3 ρ 0 2 .

Appendix B.2. Lower Bound for τ *

Similarly, consider the following function.
ρ ˇ ( x ) = ρ 0 1 x ρ ( x ) .
Define F ˇ ( τ ) = 0 τ ρ ˇ ( x ) d x . Then, it is obvious that the following is the case.
F ( τ ) F ˇ ( τ ) .
Therefore, we have the following.
τ ˇ = F ˇ 1 ( p ) = p 2 ρ 0 2 < F 1 ( p ) = τ * .

Appendix C. The Bound of K

The sub-exponential norm of ξ is defined by the following.
K = sup p 1 p 1 E | Z k | p 1 / p = sup p 1 p 1 0 + x p ρ 0 exp ( x ) x d x 1 / p .
Denote I p = 0 + x p ρ 0 exp ( x ) x d x , then through integrating by part, we have the following.
I p = 0 ρ 0 x p 1 / 2 d exp x = ( ρ 0 e x x p 1 / 2 ) | 0 + + ( p 1 / 2 ) 0 + ρ 0 x p 1 1 / 2 exp ( x ) d x = ( p 1 / 2 ) I p 1 .
Noticing that I 0 = 1 , we have the following.
I p = i = 1 p ( i 1 / 2 ) p ! .
It is also obvious that the mean value of ξ is 1. Therefore, Z i = ξ i 1 obeys a centered sub-exponential distribution since its mean value is 0 and sub-exponential norm is limited.
K = sup p 1 p 1 E | Z k | p 1 / p = sup p 1 p 1 E | ξ 1 | p 1 / p 1 + sup p 1 p 1 E | ξ | p 1 / p 1 + sup p 1 p 1 ( p ! ) 1 / p 2 .

References

  1. Miao, J.; Charalambous, P.; Kirz, J.; Sayre, D. Extending the methodology of X-ray crystallography to allow imaging of micrometre-sized non-crystalline specimens. Nature 1999, 400, 342. [Google Scholar] [CrossRef]
  2. Shechtman, Y.; Eldar, Y.C.; Cohen, O.; Chapman, H.N.; Miao, J.; Segev, M. Phase retrieval with application to optical imaging: A contemporary overview. IEEE Signal Process. Mag. 2015, 32, 87–109. [Google Scholar] [CrossRef] [Green Version]
  3. Stefik, M. Inferring DNA structures from segmentation data. Artif. Intell. 1978, 11, 85–114. [Google Scholar] [CrossRef]
  4. Fienup, C.; Dainty, J. Phase retrieval and image reconstruction for astronomy. Image Recover. Theory Appl. 1987, 231, 275. [Google Scholar]
  5. Luo, Q.; Wang, H. The Matrix Completion Method for Phase Retrieval from Fractional Fourier Transform Magnitudes. Math. Probl. Eng. 2016, 2016, 4617327. [Google Scholar] [CrossRef] [Green Version]
  6. Candes, E.J.; Strohmer, T.; Voroninski, V. Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming. Commun. Pure Appl. Math. 2013, 66, 1241–1274. [Google Scholar] [CrossRef] [Green Version]
  7. Waldspurger, I.; d’Aspremont, A.; Mallat, S. Phase recovery, maxcut and complex semidefinite programming. Math. Program. 2015, 149, 47–81. [Google Scholar] [CrossRef] [Green Version]
  8. Netrapalli, P.; Jain, P.; Sanghavi, S. Phase retrieval using alternating minimization. IEEE Trans. Signal Process. 2015, 63, 4814–4826. [Google Scholar] [CrossRef] [Green Version]
  9. Elser, V. Phase retrieval by iterated projections. JOSA A 2003, 20, 40–55. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  10. Candes, E.J.; Li, X.; Soltanolkotabi, M. Phase retrieval via Wirtinger flow: Theory and algorithms. IEEE Trans. Inf. Theory 2015, 61, 1985–2007. [Google Scholar] [CrossRef] [Green Version]
  11. Zhang, H.; Liang, Y. Reshaped wirtinger flow for solving quadratic system of equations. Adv. Neural Inf. Process. Syst. 2016, 29, 2622–2630. [Google Scholar]
  12. Wang, G.; Giannakis, G.; Saad, Y.; Chen, J. Solving most systems of random quadratic equations. arXiv 2017, arXiv:1705.10407. [Google Scholar]
  13. Wang, G.; Giannakis, G.B.; Saad, Y.; Chen, J. Phase retrieval via reweighted amplitude flow. IEEE Trans. Signal Process. 2018, 66, 2818–2833. [Google Scholar] [CrossRef]
  14. Luo, Q.; Wang, H.; Lin, S. Phase retrieval via smoothed amplitude flow. Signal Process. 2020, 177, 107719. [Google Scholar] [CrossRef]
  15. Luo, Q.; Lin, S.; Wang, H. Robust phase retrieval via median-truncated smoothed amplitude flow. In Inverse Problems in Science and Engineering; Taylor & Francis: Abingdon-on-Thames, UK, 2021; pp. 1–17. [Google Scholar]
  16. Nesterov, Y. Introductory Lectures on Convex Optimization: A Basic Course; Springer Science: New York, NY, USA, 2013; Volume 87. [Google Scholar]
  17. Chen, P.; Fannjiang, A.; Liu, G.R. Phase retrieval by linear algebra. SIAM J. Matrix Anal. Appl. 2017, 38, 854–868. [Google Scholar] [CrossRef] [Green Version]
  18. Wishart, J. The generalised product moment distribution in samples from a normal multivariate population. Biometrika 1928, 20, 32–52. [Google Scholar] [CrossRef] [Green Version]
  19. Vershynin, R. Introduction to the non-asymptotic analysis of random matrices. arXiv 2010, arXiv:1011.3027. [Google Scholar]
  20. Li, C.K.; Li, R.C. A note on eigenvalues of perturbed Hermitian matrices. Linear Algebra Its Appl. 2005, 395, 183–190. [Google Scholar] [CrossRef] [Green Version]
  21. Chi, Y.; Lu, Y.M.; Chen, Y. Nonconvex optimization meets low-rank matrix factorization: An overview. IEEE Trans. Signal Process. 2019, 67, 5239–5269. [Google Scholar] [CrossRef] [Green Version]
Figure 1. RMSE vs. oversampling rate of several initialization method. n = 100 , T 1 = n , T 2 = 0.3 n for the composite initialization method, while for the other algorithms, the involved parameters are selected according to related articles.
Figure 1. RMSE vs. oversampling rate of several initialization method. n = 100 , T 1 = n , T 2 = 0.3 n for the composite initialization method, while for the other algorithms, the involved parameters are selected according to related articles.
Symmetry 13 02006 g001
Figure 2. Empirical success rate of different initialization methods versus number of measurements with n = 100 , with m / n varying form 1 to 4 for TAF.
Figure 2. Empirical success rate of different initialization methods versus number of measurements with n = 100 , with m / n varying form 1 to 4 for TAF.
Symmetry 13 02006 g002
Table 1. CPU time (s) for TAF using the proposed initialization method compared with other two typical initializer.
Table 1. CPU time (s) for TAF using the proposed initialization method compared with other two typical initializer.
MethodInitialization StageTAF StageOverall
Null vector1.241.342.58
m / n = 2 Maximal correlation0.470.761.23
Proposed0.720.701.42
Null vector1.121.022.14
m / n = 3 Maximal correlation0.420.490.91
Proposed0.730.471.2
Null vector0.951.021.97
m / n = 5 Maximal correlation0.440.450.89
Proposed0.670.461.13
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Luo, Q.; Lin, S.; Wang, H. A Composite Initialization Method for Phase Retrieval. Symmetry 2021, 13, 2006. https://doi.org/10.3390/sym13112006

AMA Style

Luo Q, Lin S, Wang H. A Composite Initialization Method for Phase Retrieval. Symmetry. 2021; 13(11):2006. https://doi.org/10.3390/sym13112006

Chicago/Turabian Style

Luo, Qi, Shijian Lin, and Hongxia Wang. 2021. "A Composite Initialization Method for Phase Retrieval" Symmetry 13, no. 11: 2006. https://doi.org/10.3390/sym13112006

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop