Abstract
Phase retrieval is a classical inverse problem with respect to recovering a signal from a system of phaseless constraints. Many recently proposed methods for phase retrieval such as PhaseMax and gradient-descent algorithms enjoy benign theoretical guarantees on the condition that an elaborate estimate of true solution is provided. Current initialization methods do not perform well when number of measurements are low, which deteriorates the success rate of current phase retrieval methods. We propose a new initialization method that can obtain an estimate of the original signal with uniformly higher accuracy which combines the advantages of the null vector method and maximal correlation method. The constructed spectral matrix for the proposed initialization method has a simple and symmetrical form. A lower error bound is proved theoretically as well as verified numerically.
1. Introduction
Phase retrieval (PR) has many applications in science and engineering, which include X-ray crystallography [1], molecular imaging [2], biological imaging [3], and astronomy [4]. Mathematically, phase retrieval is a problem about finding a signal or satisfying the following phaseless constraints:
where is the observed measurement, or is the measuring vector, and denotes noise. In optics and quantum physics, is related to the Fourier transform or the Fractional Fourier transform [5]. To remove the theoretical barrier, recent works also focus on the generic measurement vectors, namely independently sampled from .
Current methods for solving phase retrieval can be generally divided into two groups: convex and non-convex approaches. Convex methods, e.g., PhaseLift [6] and PhaseCut [7], utilize a semidefinite relaxation to transfer the original problem into a convex one. Nevertheless, the heavy computational cost of these methods hinders its application to practical scenarios. On the other hand, non-convex methods including the Gerchberg–Saxton (GS) [8,9], the (truncated) Wirtinger flow (WF/TWF) [10,11], and the (truncated) amplitude flow (AF/TAF) have a significant advantage in lower computational cost and sampling complexity (i.e., the minimal required to achieve successful recovery) [12]. More recently, two variants of AF using the reweighting [13] and the smoothing technique [14,15] have been proposed to further lower sampling complexity. However, for the mentioned nonconvex methods, the rate of successful reconstruction relies upon elaborate initialization. Moreover, an initialization point nearby the true solution is also a necessity for convergency property in theoretical analysis.
1.1. Prior Art
Current initialization methods for PR problem include spectral method [16], the orthogonality promoting method [12], null vector method [17], and the (weighted) maximal correlation method [13]. These methods first construct a spectral matrix in the form of
where the weight is a positive real number. The estimate is then given as the (scaled) eigenvector corresponding to the largest or smallest eigenvalue of . The spectral method and the (weighted) maximal correlation method fall into one category as they seek the leading vector of as the estimate. We categorize these methods as correlation-focused methods, since their spectral matrices endow more weight to the measuring vectors more correlated to the original signal. Conversely, the null vector and orthogonality promoting method can be regarded as orthogonality-focused methods. For the vector with less correlation to the original signal, the corresponding weight is larger in the spectral matrix.
These two types of initialization methods each have their own merits and demerits. The correlation-focused methods perform better when the oversalting rate is low. While the situation is contrary as is large enough, the orthogonality-focused method provides a more accurate estimate.
1.2. This Work
In this paper, we propose a method combining the advantages of the above two types of initialization methods. Our intuition is to construct a composite spectral matrix, which is a weighted sum of spectral matrices of correlation-focused methods and orthogonality-focused methods. Hence, the new method is termed as the composite initialization method.
1.3. Article Organization and Notations
This paper adopts the following notations. The bold-font lower-case letter denotes a column vector, e.g., . Bold capital letters such as denote matrices. and stand for the transpose and the conjugate transpose, respectively. Calligraphic letters such as denote a index set, and denotes the number of elements of . is a simple notation of a set of elements in the form of . denotes the -norm of . means for the sake of simplicity, if not specially specified.
The reminder of this article is organized as follows. The algorithm is given in Section 2. Section 3 provides the theoretical error analysis about the proposed initialization method with some relevant technique error bounds being placed in the Appendix. Section 4 illustrates the numerical performance and makes a comparison of the proposed method with other initialization methods.
2. The Formulation of the Composite Initialization Method
This section provides the formulation of the proposed method. Without loss of generality, we assume that Measuring vectors are assumed to be independently sampled centered Gaussian variables with identity covariance for the convenience of theoretical analysis. Let be the true solution of (1). For concreteness, we focus on the real-valued Gaussian model. We further assume that since this investigation focuses on the estimation of the original signal. In practice, can be estimated by . To begin with, we describe the null vector and the maximal correlation method which form the basis of our algorithm.
The null vector method first picks out a subset of vectors , where . Then, the null vector method approximates the original signal by a vector that is most orthogonal to this subset. Since is in the ascending order, can be directly written as , where is the cardinality of . The intuition of the null vector method is as simple as follows. Since takes a very small value when , one can construct the following minimizing problem to estimate to coincide this property.
Solving (2) is equivalent to finding the smallest eigenvalue and the corresponding eigenvector of . The so-called orthogonality promoting method proposed by Wang is an approximately equivalent method that transforms the minimization problem (2) into a maximization problem.
The maximal correlation method is based on the opposite intuition, which picks out a subset of vectors corresponding to the largest elements in . As is in the ascending order, can also be written directly as The core idea of maximal correlation method is searching for the vector that is most correlated with . This is achieved by solving the following maximizing problem.
From (2) and (3), one can observe that these two methods only utilize a subset of vectors that are either most correlated or most orthogonal to the original signal. To make use of as much as possible information, we try to solve a composite problem combining (2) and (3) together.
We can observe that (4) has a symmetrical structure and can be interpreted as a modified version of (2), which adds the objection of (3) as a penalization term. The solution of (4) will be a more accurate estimate than that of (3) since it utilize the information from (2).
Let . We set the parameter as in (4). The true signal is roughly the solution of (2) and (3); thus, it is also the approximated solution of (4). In next section, we will analyze the error of our proposed method.
The algorithm is presented in Algorithm 1. is added to ensure the positivity of . The in Step 4 is obtained by solving the following system of linear equations:
which can be solved by the conjugate descent method since M is a positive and Hermitian by using as initializer.
Since and are indistinguishable for the phase retrieval, we evaluate the estimate using the following metric.
The other initialization methods, e.g., the spectral method, the truncated spectral method, and the (weighted) maximal correlation method, are all based on the power iteration. The spectral method is for finding the leading eigenvector of matrix , which is also realized by the power approach. Chen proposed the truncated spectral method to improve the performance of the spectral method, which constructed another matrix via the threshold parameter [17]. The corresponding matrix for iteration is , where is defined by the following.
The numerical performance of these initialization methods will be compared later.
| Algorithm 1 The composite initialization method. |
Input: Increasingly arranged and corresponding , truncated value , threshold value , a random initialization .
Output: |
3. Theoretical Analysis
In this section, we will present the error estimate of the proposed method under Gaussian assumptions in the real case.
Since the measuring vectors are assumed to be i.i.d. Gaussians, we can assume that without loss of generality, where . Otherwise if , there exists an orthogonal matrix, , satisfying . Denote , then . Let , then the original problem (1) is identical to because is composed by Gaussian vectors due to the invariance of Gaussian under the orthogonal transform.
The spectral matrix for orthogonality promoting method is the following.
By denoting , then can be written as follows:
where the following is the case.
The matrix for maximal correlation method is the following:
where the following is the case.
The constructed matrix for estimation is the following.
To proceed, we notice the following basic facts:
- 1.
- The orthogonality promoting method is for finding the eigenvector of minimum eigenvalue of .
- 2.
- In the ideal case that is a zero matrix, degenerates to the following.If we can also ensure that is positively definite, the eigenvector corresponding to the smallest eigenvalue is exactly the true signal.
These two facts inspire us to estimate the error by computing the eigenvector of matrix after adding perturbation . Thus, the outline of theoretical analysis is concluded as follows:
- 1.
- Estimate , where and are the minimum eigenvalue of and , respectively;
- 2.
- With , we can then compute the perturbation of corresponding eigenvector, , which is the exact error of our algorithm.
Specifically, we have the following roadmap of theoretical analysis. Section 3.1 presents bound results for each component of the spectral matrix . Using the results in Section 3.1, the bounds of can then be easily obtained in Section 3.2. The relationship between perturbation of eigenvalues and eigenvectors is presented in Section 3.3, which finally induces the error estimation of our algorithm formally in Section 3.4.
3.1. Analysis of Each Component of the Spectral Matrix
In this part, we will provide the bounds for the variables involved of matrix , which are basic ingredients for estimating perturbation of eigenvalue and eigenvector, namely and . In particular, the upper bound for , lower bound for and the norm of will be analysed.
3.1.1. Upper Bound of
Finding the upper bound of actually consists in finding the upper bound of and the lower bound of . First, we have the upper bound of under statistical meaning in the follow lemma.
Lemma 1.
We have the following:
where .
The proof of Lemma 1 is placed in the Appendix A and Appendix B. As for the lower bound of , we borrow a result from Wang [13], Lemma 3.
Lemma 2.
The following holds with probability exceeding :
provided that for some absolute constants , and .
Then, we obtained the required upper bound of
Lemma 3.
Under the set-up of Lemmas 1 and 2, we have the following:
with probability of at least
where , , and denote and , respectively.
3.1.2. Lower Bound of the Smallest Eigenvalue of
Since is a linear combination of and , the lower bound of the smallest eigenvalue of can be estimated from the bounds of eigenvalues of and .
According to (10) and (12), we rewrite and as the following:
where and and are termed Gaussian matrices here since their entries are all sampled from the Gaussian distribution. We notice that and are Wishart matrices, which can be written as the product of a Gaussian matrix and its adjoint [18]. Moreover, the eigenvalue perturbation of Wishart matrix obeys the following classical result.
Theorem 1
([19], Corollary 5.35). Let be an matrix for which its entries are independent standard normal random variables. Then, for every , with a probability of at least , the following events hold simultaneously:
where and stand for the smallest and the largest singular value of , respectively.
By pplying Theorem 1 to with the following replacements:
- (1)
- ,
- (2)
- , ,
- (3)
- ,
one can see that the following is the case.
In theoretical analysis, we assume that is large enough such that . However, it should be observed that setting suffices to output reasonable estimates as shown in the numerical test. Based on this assumption, one then has the following.
The above holds with probability exceeding , where . Similarly, we have the following:
where .
Hence, we obtain the lower bound for the smallest eigenvalue of G.
A necessary condition for the validness of our method is which can be satisfied by choosing proper and
3.1.3. The Upper Bound of
Now, let us turn to the estimation of the norm of . We denote each item of as .
Since obeys normal distribution, and , we can derive the following.
Let , then obeys chi-squared distribution, which is a sub-exponential distribution. With , we can reform as the following.
The expectation of is obviously .
The variance of is also needed in order obtain the upper bound of . To this end, we need to recall the Bernstein-type inequality for sub-exponential random variable.
Theorem 2.
Let be i.i.d. centered sub-exponential random variables with sub-exponential norm denoted as K. Then, for every , we have the following:
where is an absolute constant. K is the so-called sub-exponential norm defined as the following.
Define a centered sub-exponential random variable . The sub-exponential norm of is computed in Appendix C. Appendix C tells that the sub-exponential norm in our case. By replacing and in Theorem 2, one can then conclude the following result.
Theorem 3.
Since , let , then we have the following:
where is an absolute constant.
Gathering these results, we have the following theorem.
Theorem 4.
The above holds with the probability stated below:
with some absolute constant .
3.2. Estimate of
To make an estimate for , we first recall a classical result from matrix perturbation theory in [20].
Theorem 5.
Let the following be the case:
Let the above be Hermitian and set over all and , where stands for the set of eigenvalue of . Then, the following is the case:
where and are the j-th smallest one among the eigenvalues of and , respectively.
To ensure the effectiveness of our algorithm, should still be the minimum eigenvalue after adding perturbation in our case. Then, is simply that is bounded by (25). Hence, can be estimated by the following.
3.3. Computing
With the upper bound of , we can now estimate the perturbation of the eigenvector. The minimum eigenvector of is computed by solving :
where denotes the identity matrix here. Without loss of generality, the solution of (36) can be written in the form of . Then, we have the following.
Therefore, can be bounded by the following:
where (38) is derived based on (37). The form of (39) would be a bit complicated, as stated as follows.
3.4. Main Result
Notice that is exactly . Then, it is straightforward to induce our main theorem by combining the above results.
Theorem 6 (Main result).
Consider the problem of estimating arbitrary from m phaseless measurements (1) under the Gaussian assumption. Suppose the output of Algorithm 1 is . If , , , and , then we have the following error estimate for the composite initialization method:
with a probability of at least the following:
where the following is the case,
for and some absolute constants .
Probability P in (42) can be negative when m and n are too small, which shows the limitation of our analysis since m and n need to be large enough so as to render the probability reasonable. In the extreme case that and are close to 0, our error estimation expression (41) can be approximated by a simpler form:
which is verified by numerical experiments later.
4. Numerical Results
In this section, we test the accuracy of the proposed method and compare it with other methods, including the spectral method, the truncated spectral method, the reweighted maximal correlation method, the null vector method, and the orthogonality method. The sampling vectors and the original signal are independently randomly generated. To eliminate the influence of error brought by estimation of , the original signal is normalized. We chose for numerical experiments for the proposed method. All the following simulation results are averaged over 80 independent Monte Carlo realizations.
Figure 1 provides the RMSE calculated by (6) versus the oversampling rate of the mentioned initialization methods. Obviously, all methods exhibit better performance as increases. In particular, the proposed initialization method outweighs the other methods. When roughly, the composite initialization method performs closely as the null vector does, and the convergency behavior coincides with (45). When , the proposed method does not lose its accuracy as dramatically as the null vector method does. The proposed algorithm provides the most accurate estimate when oversampling rate is below the information limit, i.e., .
Figure 1.
RMSE vs. oversampling rate of several initialization method. , for the composite initialization method, while for the other algorithms, the involved parameters are selected according to related articles.
Figure 2 illustrates the importance of initialization method on refining the success rate of TAF algorithm [12]. In each simulation, each initialization generates an estimate as the initializer for the TAF algorithm. A trial is considered to be successful if TAF algorithm can return a result with RMSE less than . When is large, all the presented initialization methods ensure almost 100% success rate. However, when approaches 0, the differences in success rates of various methods appear gradually. Therefore, the composite initialization method can help TAF to achieve better success rate compared with the other two methods.
Figure 2.
Empirical success rate of different initialization methods versus number of measurements with , with varying form 1 to 4 for TAF.
Table 1 illustrates the CPU time needed for the TAF using our method as an initializer compared with other two typical initialization methods. To distinguish the performance more clearly, the length of signal is set as a large number . The proposed method and null vector method need to solve a linear system of equations at each inverse power computation, which makes it more time-consuming than the power computation of the maximal correlation method. However, the proposed method provides a more accurate initializer, which can help TAF converges faster. Hence, the overall efficiency of our algorithm is not far behind power-type methods as shown in Table 1.
Table 1.
CPU time (s) for TAF using the proposed initialization method compared with other two typical initializer.
5. Conclusions
This paper proposes a new initialization method that combines the advantages of two methods constructed from two totally contrary intuitions. Both theoretical analysis and numerical experiments indicate the validness of our method and higher accuracy compared with other methods. Future work will focus on extending our initialization algorithms to the more generalized problem of PR, e.g., the quadratic sensing and the matrix reconstruction problems [21].
Author Contributions
Q.L. conceptualization, methodology, validation, and writing—original draft preparation; S.L. formal analysis, software, and writing—review and editing; H.W. supervision, project administration, and resources. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by Chinese National Natural Science foundation 61977065 and National Key Research and Development Plan 2020YFA0713504.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A. Upper Bound of H1
is the average of the smallest elements of the set . As has been sorted in ascending order, we have the following.
By the Gaussian assumption, , obeys the following chi-squared distribution with probability function:
where is the normalization constant. The cumulative distribution function is the following.
Let and . The estimation of hinges on the value of . However, is not explicitly integrable; hence we cannot derive an explicit expression of about p. To this end, we first calculate following bounds of using some basic inequalities, as the following two lemmas shows. The detailed proof is placed in Appendix B.
Lemma A1 (Upper bound of ).
Let , then for , we have the following.
Lemma A2 (Lower bound of ).
Let , then for , we have the following.
Then, we turn back to the estimation of . Notice that can be approximately regarded as random variables sampled from a bounded chi-squared distribution. Therefore, we first provide an upper bound of the largest element of , namely, . We prove the following result.
Theorem A1.
Assume that . We have the following.
Proof.
Let , which satisfies the following:
as the probability function is monotonically decreasing. Since and (A3), we have the following.
Define the following indicator random variables:
where i.i.d. obeys the chi-squared distribution with the probability function (A1) and the characteristic function of the following.
The event means that at least measurements are larger than . Therefore, we have the following.
The expectation of is then
is the bounded distribution and the upper bound of the sum of can be provided by the well-known Hoeffding’s inequality.
Lemma A3 (Hoeffding’s inequality).
Let be i.i.d. random variables bounded by the interval . Then, the following is the case.
Let us consider the i.i.d. bounded random variables of the following.
Since is larger than , we can provide an upper bound of by finding the upper bound of .
The expectation of is the following:
with the following.
After computing the expectation of , we can bound by Hoeffding’s inequality for bounded random variables similarly.
Since , we can obtain the following result by replacing in Lemma A3.
Hence, we have the following.
Therefore, we obtain the upper bound of under statistical meaning.
Appendix B. The Bound Estimation of τ *
Appendix B.1. The Upper Bound for τ *
Construct the following function.
Then, it can be proved that the following is the case.
Therefore, we have the following.
Define . Then, it is obvious that the following is the case.
Since and are both strictly monotone decreasing over , their inverse functions satisfy the following:
where is about .
In the orthogonality promoting apporach using inverse power, is always less than 1/2 to achieve a tolerant performance. Therefore, in our concerned range (), it is accessible to estimate by function .
Now, we compute .
Then, we obtain by solving .
Picking out the unreasonable solution with excess , we have the following.
Since and , we can determine the following upper bound of .
Appendix B.2. Lower Bound for τ *
Similarly, consider the following function.
Define . Then, it is obvious that the following is the case.
Therefore, we have the following.
Appendix C. The Bound of K
The sub-exponential norm of is defined by the following.
Denote , then through integrating by part, we have the following.
Noticing that , we have the following.
It is also obvious that the mean value of is 1. Therefore, obeys a centered sub-exponential distribution since its mean value is 0 and sub-exponential norm is limited.
References
- Miao, J.; Charalambous, P.; Kirz, J.; Sayre, D. Extending the methodology of X-ray crystallography to allow imaging of micrometre-sized non-crystalline specimens. Nature 1999, 400, 342. [Google Scholar] [CrossRef]
- Shechtman, Y.; Eldar, Y.C.; Cohen, O.; Chapman, H.N.; Miao, J.; Segev, M. Phase retrieval with application to optical imaging: A contemporary overview. IEEE Signal Process. Mag. 2015, 32, 87–109. [Google Scholar] [CrossRef] [Green Version]
- Stefik, M. Inferring DNA structures from segmentation data. Artif. Intell. 1978, 11, 85–114. [Google Scholar] [CrossRef]
- Fienup, C.; Dainty, J. Phase retrieval and image reconstruction for astronomy. Image Recover. Theory Appl. 1987, 231, 275. [Google Scholar]
- Luo, Q.; Wang, H. The Matrix Completion Method for Phase Retrieval from Fractional Fourier Transform Magnitudes. Math. Probl. Eng. 2016, 2016, 4617327. [Google Scholar] [CrossRef] [Green Version]
- Candes, E.J.; Strohmer, T.; Voroninski, V. Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming. Commun. Pure Appl. Math. 2013, 66, 1241–1274. [Google Scholar] [CrossRef] [Green Version]
- Waldspurger, I.; d’Aspremont, A.; Mallat, S. Phase recovery, maxcut and complex semidefinite programming. Math. Program. 2015, 149, 47–81. [Google Scholar] [CrossRef] [Green Version]
- Netrapalli, P.; Jain, P.; Sanghavi, S. Phase retrieval using alternating minimization. IEEE Trans. Signal Process. 2015, 63, 4814–4826. [Google Scholar] [CrossRef] [Green Version]
- Elser, V. Phase retrieval by iterated projections. JOSA A 2003, 20, 40–55. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Candes, E.J.; Li, X.; Soltanolkotabi, M. Phase retrieval via Wirtinger flow: Theory and algorithms. IEEE Trans. Inf. Theory 2015, 61, 1985–2007. [Google Scholar] [CrossRef] [Green Version]
- Zhang, H.; Liang, Y. Reshaped wirtinger flow for solving quadratic system of equations. Adv. Neural Inf. Process. Syst. 2016, 29, 2622–2630. [Google Scholar]
- Wang, G.; Giannakis, G.; Saad, Y.; Chen, J. Solving most systems of random quadratic equations. arXiv 2017, arXiv:1705.10407. [Google Scholar]
- Wang, G.; Giannakis, G.B.; Saad, Y.; Chen, J. Phase retrieval via reweighted amplitude flow. IEEE Trans. Signal Process. 2018, 66, 2818–2833. [Google Scholar] [CrossRef]
- Luo, Q.; Wang, H.; Lin, S. Phase retrieval via smoothed amplitude flow. Signal Process. 2020, 177, 107719. [Google Scholar] [CrossRef]
- Luo, Q.; Lin, S.; Wang, H. Robust phase retrieval via median-truncated smoothed amplitude flow. In Inverse Problems in Science and Engineering; Taylor & Francis: Abingdon-on-Thames, UK, 2021; pp. 1–17. [Google Scholar]
- Nesterov, Y. Introductory Lectures on Convex Optimization: A Basic Course; Springer Science: New York, NY, USA, 2013; Volume 87. [Google Scholar]
- Chen, P.; Fannjiang, A.; Liu, G.R. Phase retrieval by linear algebra. SIAM J. Matrix Anal. Appl. 2017, 38, 854–868. [Google Scholar] [CrossRef] [Green Version]
- Wishart, J. The generalised product moment distribution in samples from a normal multivariate population. Biometrika 1928, 20, 32–52. [Google Scholar] [CrossRef] [Green Version]
- Vershynin, R. Introduction to the non-asymptotic analysis of random matrices. arXiv 2010, arXiv:1011.3027. [Google Scholar]
- Li, C.K.; Li, R.C. A note on eigenvalues of perturbed Hermitian matrices. Linear Algebra Its Appl. 2005, 395, 183–190. [Google Scholar] [CrossRef] [Green Version]
- Chi, Y.; Lu, Y.M.; Chen, Y. Nonconvex optimization meets low-rank matrix factorization: An overview. IEEE Trans. Signal Process. 2019, 67, 5239–5269. [Google Scholar] [CrossRef] [Green Version]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

