1. Introduction
Phase Retrieval (PhR) is the inverse problem of reconstructing a signal when only the magnitude of its Fourier transform is available. A typical example is Ptychography, where a set of interference patterns is generated by illuminating a specimen at various centering positions, at preset intervals between them.
The obtained measurements contain only magnitude information and computational imaging methods can produce an image of the specimen by combining the information of each pattern at different positions. Examples of phase retrieval applications include microscopy [
1], X-ray crystallography [
2,
3,
4], coherent diffraction imaging [
5], array imaging [
6], blind deconvolution [
7], acoustics [
8], interferometry [
9] and astronomy [
10]. The problem can be stated more generally as retrieving
from measurements of the type
where
is the
i-th sampling vector and
is an additive noise term. Phase Retrieval belongs to the class of non-linear non-convex optimization problems. Theoretical advances in random linear operators enabled the application of the solution uniqueness principles of random underdetermined linear systems (Compressive Sensing) to the quadratic measurements problem of Phase Retrieval.
In detail, the original signal
can be recovered, when the sampling vectors
are drawn randomly, following a probability distribution, such as a Gaussian [
11]. In practice, this can be achieved by masking the sample with binary masks or optical grating [
12]. Additionally, in the general case the uniqueness of the solution cannot be guaranteed unless a sufficient number of samples exceeding the signal size is available.
1.1. Related Work
The most widely accepted reconstruction methods before the invention of modern techniques were the Gerchberg–Saxton [
13] and Fienup [
10] algorithms. Both methods are based on iterative nonlinear projections for the refinement of their estimates. There was no guarantee for the convergence of these algorithms and solutions could typically be obtained under special conditions for the input signals or special initializations relying on prior information. Subsequent research produced nonconvex iterative methods such as the extended Ptychographic Iterative Engine [
14] and Difference Map [
15].
The advent of Compressive Sensing [
16] and its theoretical connection to the problem of Matrix Completion [
17] led to the development of new theoretical results on Phase Retrieval. Specifically, in [
11,
12], Candès et al. approached the problem via the “Lifting” technique, where a convex relaxation allows to search for the solution to the problem in the space of positive semidefinite matrices, through trace norm minimization with the problem being recast as one of matrix completion. This formulation, in conjunction with special properties for the transform operators, namely having a randomness property, allowed to provide guarantees for convergence to a unique solution for the problem, given the availability of sufficient samples [
8,
18,
19,
20].
The matrix completion formulation of Phase Retrieval and related semidefinite programming methods [
21] are computationally prohibitive for any sizeable signal, for example, a high-resolution image, since they involve the manipulation of very large dimensional variables. In response to this, a number of efficient non-convex optimization methods based on Stochastic Gradient Descent (SGD) were developed, namely Wirtinger Flow and Amplitude flow [
19,
22,
23], where random operator properties and a good initial estimation of the solution are used in order to guarantee their convergence. The success of SGD methods led to the development of numerous modifications, aiming to improve their convergence and noise resilience performance, see [
24,
25,
26,
27,
28] among others.
Beyond Wirtiger flow-related studies, research on Phase Retrieval produced algorithms based on non-linear optimization [
29], alternating minimization [
30,
31] and ADMM methods [
32]. Furthermore, the problem was addressed from the perspective of Basis Pursuit convex optimization [
33,
34], low rank matrix completion [
35,
36] and Total Variation minimization [
37]. Ref. [
38] explored insights on the geometry of the problem of Phase Retrieval. In [
39] the Generalized Approximate Message Passing framework was applied to Phase Retrieval.
The ongoing developments in Deep Learning have resulted in a plethora of Neural Network methods for Phase Retrieval. Deep Learning methods include direct network approaches, in which Neural Networks are trained with specific datasets in order to learn the function mapping input to output [
40,
41]. Beyond dataset-based approaches, a number of physics-oriented methods were developed. Physics-based methods utilize a model of the dynamics of the sensing system as a prior driving the training or inference process of Neural Networks. In such methods, the Neural Network can act as a regularizer in an iterative estimation process, confining the estimates to certain spaces [
42,
43,
44]. Alternatively, the iterations of an analytical process for Phase Retrieval can be mapped on the layers of a Neural Network [
45,
46].
Neural Networks have also been combined with numerical methods in order to refine imperfect estimates into high-quality reconstuctions [
47,
48]. Untrained neural network physics-based methods have been used to alleviate various problems that stem from the lack of adequate or good-quality training data as well as imperfect modeling of the signal propagation [
49,
50,
51]. Deep learning has also been utilized to optimize the design of coded diffraction patterns [
52].
Alternating optimization algorithms have also utilized Neural Networks as regularizers [
53,
54] in order to improve noise stability and achieve better performance. The ADMM has also been used as a physical model for untrained network methods [
55,
56]. An overview of Deep Learning techniques for Phase Retrieval related to a wide variety of applications and sensing configurations is provided in [
57].
1.2. Our Contribution
We introduce a solver for the non-convex optimization problem of Equation (
1), concerning real signals, which belongs to the category of alternating minimization algorithms. Our method differs from traditional alternating estimation algorithms since it does not involve the type of updates used in the Gerchberg–Saxton and Fienup methods, where successive nonlinear projections to desirable sets are used to refine the solution. Our study follows the line of theoretical results on Phase Retrieval with random sensing operators, specifically the interpretation of Phase Retrieval as a Matrix Completion problem and uses established results on the uniqueness of the solution in the space of rank-1 Hermitian matrices [
11,
22] in order to derive a nonconvex split variables formulation (see Equation (
3)).
Our formulation differs from other alternating minimization methods such as [
30,
31], which estimate a phase and a solution vector. Instead, it reformulates the optimization problem by using two vectors for the estimated solution, expanding the search space to all rank-1 matrices (Equation (
4)). The paper [
32] shares the same optimization problem formulation as the one presented here (Equation (
7)). However, the main theoretical and algorithmic innovation of our study lies in that it unveils and utilizes an implicit geometric relation of the split variables at optimal points in order to enforce their equality, by calculating a recombination of them at each iteration (Equation (
13)) effectively restricting the estimated solution space to rank-1 Hermitian matrices. This, is without the need for additional regularization terms in the objective function, such as the ones used in various versions of the Alternating Direction Method of Multipliers.
As a result, the proposed updated equations correspond to a fundamentally different formulation of the Alternating Direction Method compared to the one presented in [
32] (also see
Section S5 in Supplementary File). To the best of our knowledge, the updated equations of the proposed method are not equivalent to any existing iterative method for Phase Retrieval. Empirical results show that the inclusion of the recombination step is necessary for the algorithm to converge (see
Section S6 of the Supplementary File). Furthermore, since we are not aware of any results that correspond to the recombination step in the literature, we provide a theoretical analysis exploring the convergence properties of the proposed non-linear optimization method, in order to justify its general applicability.
An experimental comparison shows that our method demonstrates superior convergence properties compared to state-of-the-art analytical methods, in terms of its ability to converge for various numbers of available samples, processing time and accuracy under the presence of noise.
1.3. Paper Structure
The rest of the paper is organized as follows: In
Section 2, the problem formulation and an introduction of the proposed optimization method is provided along with a theoretical analysis of its convergence.
Section 3 contains experimental results, where the proposed method is evaluated and compared against other analytical Phase Retrieval solvers.
3. Results
This section contains experiments showing the performance of the proposed algorithm in terms of numerical error and execution time. The proposed method is compared with other analytical methods such as the Wirtinger Flow Phase Retrieval (WF) [
63], Truncated Wirtinger Flow (TWF) [
23] Truncated Amplitude Flow (TAF) [
19], the Momentum median reweighted Truncated Amplitude Flow (MRTAF) [
28] and the PhaseSplit [
32] method which shares the same formulation as the method proposed in this study.
Notice that the forward model of TAF, or MRTAF uses the non-squared magnitudes as input, but the proposed method, WF, TWF and PhaseSplit consider the squared magnitudes. Therefore, the noisy measurements of one category of algorithms do not result in the same SNR for the other and cannot be used directly. However, we compare the methods by adding a level of noise which results in the same SNR to the magnitudes and squared magnitudes observations, respectively.
Each method has parameters that control its performance. The standard parameters were used, as provided by their authors.
In the experiments three initialization methods were considered, one based on the Truncated Wirtinger Flow spectral initialization (TWF) introduced in [
23], one based on the Truncated Amplitude Flow spectral initialization (TAF) introduced in [
19] and a proposed positive random numbers initialization method (see
Section S4 of the Supplementary File).
The proposed method was implemented in MATLAB © and MATLAB © implementations of WF, TAF and TWF were downloaded from the respective websites, see
Table 1. All test images shown were obtained by the USC-SIPI Image Database (The USC-SIPI test image dataset can be found in
https://sipi.usc.edu/database/ (accessed on 20 July 2024)). The implementation of PhaseSplit and MRTAF where not readily available and were implemented by us since they correspond to minor modifications of the proposed and TAF methods, respectively.
In the simulations, three different images were used. “Lena” of sizes and , “Cameraman” of size , and “Man” of sizes and .
In the experiments, the proposed algorithm stops executing when the distance between the last two estimated output values becomes lower than a given threshold, set to .
To evaluate the performance of the proposed method we use the following metric, which is the square root of the metric proposed in [
11]
This metric takes into account the fact that the original signal
and any other signal obtained by a global phase delay of
always produces the same observation.
For the case where
is real, this is
We generate simulated observations according to the acquisition model introduced in Equation (
2).
We consider a different number of masks in our simulations, that is .
The elements of each mask
,
, are uniformly drawn from the Coded Diffraction Patterns dictionary
(see [
22]). These diffraction patterns correspond to physically realizable acquisition systems, where only a phase delay is introduced using appropriate masking.
Different levels of AWGN noise were considered in our simulations, with SNRs equal to ∞, 30, 24, 20 and 10 dB.
3.1. Initialization Quality
Table 2 shows the normalized error obtained by the proposed random initialization method, TWF initialization method and TAF initialization method, for the noiseless case with
.
The proposed random initialization method produces estimates of similar quality for all cases. The error of the proposed initialization method increases with the size of the input image but is similar for different numbers of masks with each size.
We observe that TWF and TAF need at least K = 8 and K = 4 masks, respectively, to obtain a normalized error smaller than one, but the proposed initialization method can obtain normalized errors smaller than one, even with K = 2.
Table 3 shows the normalized error obtained by the proposed initialization method, TAF initialization method and TWF initialization method, for SNR = 20 dB and
. The proposed method performs similarly to the noiseless case.
The TWF method needs more than K = 8 masks to obtain a normalized error less than 1 while the TAF initialization begins to do so with K = 4 masks. The quality of the estimates of the proposed initialization method does not change substantially by the presence of noise compared to the noiseless case.
Table 4 shows the time required for the initialization routine to return for the WF, TWF and TAF methods. The Proposed method is omitted since it has effectively zero execution time (for example, it is 0.01 s for
and
images, the slowest case in the experiments conducted in this work). The results are for various image sizes and
in the noise-free observations case.
The noisy cases are not shown but would have the same return times since the presence of noise does not affect the execution times of the iterations, and the iterations number is predefined. We observe that the times required grow with the number of masks and image sizes, which is expected due to the higher computational complexity. The WF method is the fastest, with the second fastest being the TWF and TAF the slowest.
This pattern reflects the higher complexity of the truncation calculations in each iteration. Generally, the WF, TWF and TAF initializations can consume substantial computational resources, with execution times equivalent to 40% of the reconstruction time in the cases where noise is present.
3.2. Reconstructions with Noise-Free Observations
Figure 2 shows the reconstructions of 10 test images for
K = 2 masks in the noiseless case, using the proposed method. In all cases the images were perfectly reconstructed.
In all noise-free cases, the proposed method is able to recover the original signal. Regarding the compared methods, TWF, TAF, and MRTAF also recover the exact solution in all cases. However, WF only recovers the exact solution when . For , WF converges to an inexact reconstruction of the image.
Figure 3 shows the evolution of the reconstruction error with time for all methods tested. The proposed method converges to a solution substantially faster than the compared methods. Beyond this WF, TAF and MRTAF are the methods that converge relatively faster. The PhaseSplit method converges more slowly than the proposed and SGD-based methods and its convergence behavior is very sensitive to changes in its parameters (also see
Section S5 of Supplementary Material).
Table 5 shows the execution time taken for all methods to achieve exact reconstruction for various input sizes and
for both the proposed and TAF initializations, with the latter being considered the best available spectral initializer. In some of the experiments, the WF method failed to reach an exact solution and terminated early with a solution of normalized error typically close to 0.03.
In all experiments, the proposed method is faster than all the compared methods. More specifically, we observe that the proposed method is approximately four times faster than the second WF fastest method.
In the next experiment, the convergence success rate is measured. We generate 100 random signals of size
and apply the observation model in Equation (
3) to generate 100 noise-free observations. Then, we apply the compared methods and calculate the percentage of signals that have been successfully recovered.
The experiment is repeated for two different ways of generating the signal. More specifically, we use a uniform distribution on the interval [0, 1], and a standard Gaussian distribution with mean 0 and covariance the identity matrix. The results of this experiment are shown in
Table 6.
For signals generated with the Gaussian distribution, the proposed method converges with higher probability when the TAF initialization is used, since the quality of the proposed initialization is worse, as was seen in the previous experiment.
The failures in convergence were associated with poor initialization quality up to . Beyond this threshold there is enough information for the spectral initialization to be of good quality. More than six masks also suffice for the method to converge, regardless of the initialization, which can be deduced by the success rate for a general real signal when the initialization only contains positive numbers. When the TAF initialization was used, all methods had progressively better success rates with higher K.
The WF method follows the same converge patterns, regardless of initialization and signal type with only the number of masks determining the rate of success. TAF and TWF always failed when the proposed random initialization was used for general real signals due to the poor quality of the initialization.
The proposed method and PhaseSplit had higher success rates for lower K compared to all other methods.
Generally, the proposed method produces reconstructions for the noise-free case comparable to the obtained ones by the state-of-the-art methods, with the advantage of being much faster.
The proposed random initialization method also leads to acceptable initializations in practice.
The proposed method works when the lower theoretical bound of
K = 2 masks is available, given that a good-quality initialization is available. Reconstructions with
K = 2 masks have only been reported in the TAF paper [
19], for real valued sampling vectors. PhaseSplit could also produce a similar level of performance to the proposed method when its parameters were finely tuned.
The proposed method also converges with K = 2 masks in the case of complex sampling vectors.
3.3. Reconstructions with Noisy Observations
In this subsection, we evaluate the performance of the compared methods for noisy observations.
Figure 4 and
Figure 5 show the evolution of the normalized error for all compared methods for SNR = 24 dB and
K = 8, when the image size is
and
, respectively.
Table 7 shows the results obtained for all the compared methods, for image sizes
and SNR = 24 dB. In addition to Normalized error, we also show the PSNR and SSIM of the reconstructed images.
The proposed method obtains better reconstructions in terms of Normalized error, PSNR and SSIM. We also observe that when the number of masks K increases, the three methods obtain better reconstructions.
For K = 4, the difference in PSNR is approximately 2 dB with the WF, which is the nearest competitor of the GSD-based methods. However, when K increases, this difference decreases, and when K = 8 the difference with the nearest competitor WF is about 0.8 dB. Regarding running times, the proposed method needs about 1 s when the number of masks is K = 8. For the same case, WF and TWF need more than 6 s and TAF and MRTAF more than 3.5 s.
Table 8 and
Table 9 show the results obtained for all the compared methods, for image size
with SNR = 24 dB and SNR = 30 dB, respectively. The proposed method reconstructions have a better Normalized error, PSNR and SSIM with these metrics improving with a higher number of masks.
PhaseSplit and the proposed method result in the same level of reconstruction quality for . The proposed method converges 3.5 to 6 times faster than PhaseSplit. For K = 4 and SNR = 24 dB there is a 3.5 dB difference with the next best SGD method, TAF. For K = 8 there is a 0.85 dB difference with the next best SGD method, WF. In terms of execution time, the proposed method execution time is 2.5 lower than the next best one.
Figure 6 shows an example of the reconstructed images by the compared methods, for image size
,
K = 4 masks and SNR = 20 dB.
Figure 6a shows the ground truth image that we used to generate the observation.
Figure 6b shows the reconstruction obtained for the proposed method.
The proposed method recovered most of the details in the image. For instance, see the high-frequencies information on the straw at the bottom-right of the image. See also, the feathers hanging of the man’s hat.
Figure 6c,d shows the reconstructions obtained by WF, TWF, MRTAF and PhaseSplit, respectively. These reconstructions are very similar to the ones obtained with the proposed method; however, we can observe that the man’s face in
Figure 6c–e, looks noisier than the man’s face in
Figure 6b. The reconstruction of PhaseSplit presented in
Figure 6f is very similar to the one produced by the proposed method.
3.4. Robustness to Number of Masks and Noise Level
The performance of the Proposed method for different levels of noise and number of masks is examined next.
Figure 7 shows the effect of noise on the convergence of the proposed method. As expected, we observe that we obtain more error for higher noise levels. However, we observe that for higher noise levels, the proposed method needs less time to converge.
To visualize the effect of various noise levels on the reconstruction, the results for an image, reconstructed with the proposed method, are presented in
Figure 8.
Figure 8a is the noiseless image.
Figure 8b is the reconstructed image with SNR = 10 dB. In this case, there is an obvious effect from the noise in the image quality that can be seen throughout the whole image.
Figure 8c,d shows the reconstructed images for SNR = 20 dB and SNR = 30 dB, respectively.
In these cases, the noise effect is less apparent but can be seen as changes in texture, especially on large patches of the same color or texture in the image.
Figure 9 shows the performance of the proposed algorithm for different values of
K. A lower number of masks leads to faster convergence due to lower computational complexity and a higher normalized error.
The images in
Figure 10 present the corresponding reconstructions for
K = 2, 4, 8, 16 for a
image and 20 dB SNR.
Figure 10a is the case with
K = 2 where the effect of noise is most obvious.
Figure 10b–d are the cases with
K = 4, 8 and 16, respectively. In these cases, the reconstruction is better and the effect of noise becomes progressively less apparent with the growing number of masks.
4. Discussion
We have presented an analytical method, which outperforms state-of-the-art algorithms for the solution of non-linear quadratic optimization problems associated with Phase Retrieval, when real signals are involved.
Following established results in the literature on the connection between Matrix Completion and Phase Retrieval as well as the uniqueness of the solution under random forward operator conditions [
11,
22], we have reformulated the original problem into one of alternating optimization with split variables.
While various alternating optimization for Phase Retrieval approaches exist in the literature [
30,
31] and the formulation considered in this study has also been used in [
32], our method differs from other Phase Retrieval solvers, since it introduces an algorithmic step to recombine the split variables and confine the estimated solution to the desired space. This was possible due to a close examination of the relations of the variables involved which allowed us to identify implicit regularizations.
The convergence properties of the algorithm were theoretically examined, in order to establish its applicability for any real signal where it was shown that the algorithm will converge, for some mild initialization conditions.
The presence of noise in the observations is implicitly factored in the statistical uncertainty terms in the theoretical analysis; however, the noise terms were not specifically modeled. An experimental exploration of the effect of AGWN on the method (see
Figure 7) showed that the method converges and the output is corrupted according to the noise level. The proposed method performed better than state-of-the-art analytical methods when tested at the same noise level (please see
Section 3.3).
Since the algorithm does not use any explicit regularization terms, the only algorithmic parameters that are controllable are the tolerance and maximum iterations of the Conjugate Gradient solver and the number of maximum iterations of Algorithm 1. A higher noise level can allow for the use of higher tolerance or maximum iterations for the GC solver. The proposed algorithm can be implemented on any platform that can support the solution of linear systems via Conjugate Gradient. Since the forward model is based on the Fast Fourier Transform, the only storage requirements are for the masks and split variables, allowing for the handling of large images by standard desktop computers with limited memory.
Our experiments show that given an adequate number of observations, appropriate types of masks and a good initialization, the proposed algorithm can reconstruct any real signal.
However, it must be highlighted that the analysis and implementation of the presented algorithm concerns real signals only. Both the algorithmic implementation and theoretical analysis would be fundamentally different in the case of complex signals. This fact precludes the use of the method in the form presented in this paper by most practical Phase Retrieval applications.
The expansion of this method to its complex signal equivalent and the mapping of the iterations of Algorithm 1 or its complex equivalent onto a Neural Network architecture based on deep unrolling, remain a future research direction.