Dictionary Learning Phase Retrieval from Noisy Diffraction Patterns

This paper proposes a novel algorithm for image phase retrieval, i.e., for recovering complex-valued images from the amplitudes of noisy linear combinations (often the Fourier transform) of the sought complex images. The algorithm is developed using the alternating projection framework and is aimed to obtain high performance for heavily noisy (Poissonian or Gaussian) observations. The estimation of the target images is reformulated as a sparse regression, often termed sparse coding, in the complex domain. This is accomplished by learning a complex domain dictionary from the data it represents via matrix factorization with sparsity constraints on the code (i.e., the regression coefficients). Our algorithm, termed dictionary learning phase retrieval (DLPR), jointly learns the referred to dictionary and reconstructs the unknown target image. The effectiveness of DLPR is illustrated through experiments conducted on complex images, simulated and real, where it shows noticeable advantages over the state-of-the-art competitors.


The Phase Retrieval Problem
Phase Retrieval (PR) is an important and challenging problem in many fields of science and technology. PR is a crucial step in most diffraction-or scattering-based physical measurement systems. In such systems, the signals being used, namely a coherent photon flux, laser, X-ray, or any other variants of the electromagnetic radiation, are diffracted through an object under examination. These diffracted signals encode the structural information of the object such as thickness, density, or refractive index, and are detected using a suitable sensor. Since the detectors sense the diffracted signals by converting the photon flux to electrons, the information related to the phase of the signal is not recorded. This is a major limitation as important structural information of the object are often coded mainly in the phase. PR aims at estimating the unknown phase from the intensity measurements. This is a challenging problem as there is no one-to-one relation between the phase and the measured intensity, which is proportional to the magnitude square of the field. This paper discusses the phase retrieval problem in an optical imaging scenario. We would like to remark that, however, despite being developed in a specific scenario, the proposed concept and the algorithm are easily adapted to other imaging fields characterized by similar mathematical observation models.
Hereafter, we consider that the power spectral density, i.e., the squared magnitude of the Fourier transform of the object wavefront, is the measurement provided by the sensor. This assumption is very reasonable and common in the literature [1][2][3]. It is based on the fact that the optical wavefront at the sensor plane in the far field (i.e., at a large enough distance from the imaging plane) is well approximated by the power spectral density of the wavefront at the object plane. This result can easily be derived from the Fraunhofer diffraction equation for coherent imaging systems [4,5]. Figure 1, courtesy of [2], schematizes a lensless optical imaging system. A planar wavefont produced by a laser beam (here, without loss of generality, we assume that the intensity of the laser beam is constant over the object) is transmitted through an object (whose image to be inferred) and propagates until it reaches the sensor. The 2D PR problem in the vectorized form is as follows: find x ∈ C n subject to where x is the complex-valued wavefront with n pixels at the object plane and A ∈ C n×n is an n × n matrix modeling the wavefront propagation from the object to the sensor plane, which, in the far field, is well approximated by Discrete Fourier transform (DFT) matrix F ∈ C n×n , i.e., A = F, | · | is the component-wise magnitude operator, and w accounts for measurement (or model) noise. The presence of the term |Ax| in Equation (1)
Error-reduction algorithms are iterative and guarantee to reduce error (R) in each iteration. The most popular family of PR algorithm, the Gerchberg-Saxton (GS) [31] and its variants [32,33], comes under this category. These are iterative algorithms based on alternating projections between the object plane and the diffraction (Fourier) plane. A basic GS algorithm has four simple steps: (1) forward projection, Fourier transform of the object wavefront; (2) imposing Fourier magnitude constraints at the Fourier plane, replace the modulus of the forward propagated wavefront with the measured intensity to form an estimate of the Fourier transform; (3) backward projection, inverse Fourier transform operation; and (4) imposing spatial-domain constraints at the object plane, modifying the backward projected wavefront by imposing support constraints in accordance with the prior information of the object.
Fienup, in his famous Hybrid Input-Output (HIO) algorithm [34], addressed the slow convergence of the GS algorithms by imposing additional spatial-domain corrections. Although GS, HIO and their variant algorithms are widely used in optical imaging, they suffer from stagnation at local minima due to the non-convex nature of the Fourier magnitude constraints.
An alternative class of methods to address PR is based on gradient searches. In this class, in each iteration, the wavefronts are updated in such a way that the partial derivative of the gradient of an error metric, which one seeks to minimize (e.g., R), is equated to zero. This line of attack is discussed in [34], where, although the convergence rate of a steepest descent-based [34] algorithm is low, its conjugate gradient-based version [35] is much faster.
The recent Wirtinger Flow (WF) algorithm [3] is an iterative complex domain gradient descent technique. Although the gradient operation is not well-defined for complex domain variables, WF adopts a surrogate derivative, termed the "Wirtinger derivative", and is characterized by special features, such as spectral initialization, non-trivial step-size parameter, etc. The truncated version of the WF algorithm, termed truncated Wirtinger Flow (TWF) algorithm [36], improves the initialization and descent procedures in an adaptive fashion by a statistically motivated regularizing technique, which filters out the terms bearing too much influence on the initial estimate or search directions. A surprising claim stated in [36] is that solving the quadratic Equation (1) is "nearly as easy as solving linear equations", which is a significant leap in many fields.
An alternative phase retrieval strategy has been proposed recently by exploiting the semi-definite relaxations. The set of quadratic equations represented by Equation (1) is rewritten as linear equations in a higher dimension space. The phaseLift [1] and PhaseCut [37] are two important algorithms in this category. In PhaseLift, the "lifting" is done by the variable transformation X := xx H , where H represents the Hermitian operator, which provides a convenient linear constraint in terms of the matrix variable X. The underlying rank minimization non-convex problem is relaxed to a convex one yielding a semidefinite program (SDP). Similarly, in PhaseCut, the complex variable x is separated into an amplitude component and a phase component, and only the phase component is "lifted" and then optimized via SDP. However, the matrix lifting in SDP-based approaches results in a higher-dimensional variable, which, in turn, makes the algorithm computationally demanding compared to the alternating projection approaches.
A computationally light PR algorithm based on convex relaxation that operates in the natural domain of the signal is proposed in [38]. In this work, a "complex polytope" of feasible solutions is considered by relaxing the quadratic equations of phaseless measurements to inequalities. The desired solution would be one of the extreme points of this polytope which is found using a convex program.
In recent decades, PR has been highly benefited from the advances in the field of compressed sensing and new imaging technologies. In line with the overview [39], we wish to mention two important approaches in phase retrieval which have been sprouted as a result of these advancements: (1) phase front modulation based approaches; and (2) sparsity-based approaches.

Phase Front Modulation
Phase front modifications [40,41] are imaging techniques in which a known phase modulation is intentionally introduced into the object field. These phase modifications are done through phase masks mounted on the wavefront propagation path and different masks lead to different diffraction patterns. An analogy from microscopy would be focusing one part of the sample to be examined and then repositioning the sample to exploit the spatial diversity. By shifting the phase mask or, equivalently, by using different masks, independent diffraction patterns, termed coded diffraction patterns (CDPs) [42], are generated from the same illumination area of the object. The CDP introduces over-determination in the amplitude attenuation and phase shift of the wave diffracted through the object. In addition, in a CDP, the recorded intensity at a sensor point has contributions from all points in the object. Spreading the information of one object point to all pixels of the sensor minimizes the sensor noise and also mitigates the stagnation in the retrieval process [43].
The setup shown in Figure 2 corresponds to a modification of that in Figure 1, where a phase mask is incorporated. The corresponding PR problem is where {A s } s=S s=1 ∈ C n×n are the wavefront propagation matrices given by A s = FM s . In our model, F ∈ C n×n denotes the DFT matrix and M s ∈ C n×n is a phase mask which is a diagonal matrix of complex exponents, i.e., M s = diag e jφ s 1 , e jφ s

Sparsity Meets Phase Retrieval
The concept of sparsity has received tremendous attention from various signal processing areas. Signals and images of the real world admit sparse representations when represented on suitable frames [44], which can be exploited to build low dimensional models. As in many areas, sparsity has become a hot topic in PR methods and algorithms. Two-stage Sparse Phase Retrieval (TSPR) [45] and Greedy Sparse Phase Retrieval (GESPAR) [46] are two important state-of-the-art algorithms that exploit sparsity. TSPR is a two-stage algorithm in which the support of the signal is estimated in the first stage and the signal is estimated in the second stage using a sparsity constrained PhaseLift framework along with the learned support. GESPAR uses an optimization-based greedy algorithm in which PR is reformulated as a sparsity-constrained least square problem.
Another important research line in the context of sparsity exploits the self-similarity [47][48][49] exhibited by the natural images, which has been extensively exploited via patch-based approaches in various fields of image processing [50][51][52]. Since the phase images are natural images, they often exhibit a high level of self-similarity. A number of phase imaging algorithms exploiting self-similarity has been recently introduced, namely, the Sparse Coding based Interferometric Phase estimation (SpInPhase) [53], the Mixture of Gaussian based Interferometric Phase estimation (MoGInPhase) [54] and the Complex domain Block-matching and 3D filtering (BM3D) based CBM3D [55]. The further development of BM3D for complex domain filtering can be seen in [56,57]. We remark that the BM3D is a state-of-the-art denoising strategy to tackle speckle noises in coherent imaging systems such as digital holography (DH) [58]. For instance, Bianco et al. [59] proposed a quasi noise-free DH reconstructions algorithm by combining the advantages of Multi-Look DH (MLDH) techniques and BM3D filtering.
Sparse Approximation of the Object Phase and Amplitude (SPAR) [2] is a recent PR algorithm, equipped with BM3D, that shows remarkable performance in highly noisy scenarios, compared to the state-of-the-art algorithms. SPAR is built on a classical GS framework with additional features of random phase modulation of the wavefront and self-similarity regularization of the phase and amplitude. The two stages of filtering, i.e., filtering the noisy (Poissonian) observations at the sensor plane and filtering the phase and the amplitude at the object plane, make this algorithm a step forward in noisy PR. SPAR exploits the self-similarity of complex domain patches by separately applying sparsity to the phase and amplitude components using BM3D. Since the phase is restored modulo 2π, the recovering of the absolute phase is carried out by including a phase unwrapping [60] step. This is a sensitive part of SPAR, as the phase unwrapping is known to be an NP-hard problem. Another downside of SPAR is its high computational complexity, compared with that of GS.

Proposed Algorithm and Contribution
Inspired by SPAR [2] and by SpInPhase [53], we propose a new algorithm, termed Dictionary Learning Phase Retrieval (DLPR), for PR from noisy diffraction pattern. DLPR is derived under a variational framework and is developed for both Poissonian and Gaussian observation models. We adopt the classic alternating projection based framework (GS [31]), and incorporate: (1) the wavefront modification detailed in Section 1.2.1; (2) filtering at the sensor plane for Poissonian (or Gaussian) observation; and (3) filtering at the object plane using dictionary-based sparse coding in the complex domain. We remark that, recently, a dictionary-based PR algorithm was proposed in [61], which makes use of a real-valued dictionary and deals with real-valued object, whereas DLPR deals with complex-valued dictionary and object, which is an essential requirement in a practical PR scenario that often involves complex-valued wavefronts. One of the remarkable advantages of sparse modeling in the complex domain is that it gets rid of the ambiguity due to phase wrapping, which makes it very robust to the heavily noisy observations, compared to the SPAR algorithm. The main contributions of the proposed work are summarized below: 1. A variational reformulation of the PR problem that incorporates a dictionary-based sparse regression in the complex domain 2. An algorithm that jointly retrieves phase and learns the dictionary yielding sparse representations (codes) for the complex domain patches of the object wavefront 3. An extension of the algorithm to a class-specific scenario, where the dictionary is learned from clean images of the same class The paper is organized as follows. In Section 2, the sparsity modeling for the complex domain wavefront is discussed. This section also introduces Poissonian and Gaussian observation models. The DLPR algorithm is derived in Section 3, by solving a step-by-step variational formulation for the forward propagation, the sensor plane filters for Poissonian and Gaussian observations, the backward propagation, and the sparse modeling at the object plane. In Section 4, an experimental study and characterization of the algorithm performance are provided including comparisons with relevant state-of-the-art algorithms.

Sparse Regression Based Wavefront Modeling
Let the vectorized complex domain image wavefront x ∈ C n be represented as where a ∈ R n + is the positive amplitude, ψ ∈ R n is the absolute phase of the object wavefront and the operation stands for the element-wise (Hadamard) multiplication. Herein, all functions applied to vectors are to be understood as component-wise. One important fact to be noted here is that the accessible phase of a complex wavefront, which is extracted from the 2π-periodic complex sinusoidal function, undergoes phase wrapping as defined below: where W is the wrapping operator that performs the 2π-modulo phase wrapping operation. In the ensuing text, we use the notation ψ 2π := W (ψ), and term ψ 2π as the interferometric phase. We remark that the interferometric phase may be directly obtained from the image x as ψ 2π := angle(x). The interferometric phase is a non-linear function of the absolute phase ψ and possess a pattern-like discontinuities, called interferometric fringe patterns, as illustrated in Figure 3. Regarding the sparse modeling of a set of images x ∈ C n , one can consider their amplitude and phase independently. However, from x, or an estimate of it, we only have access to the corresponding interferometric phase, which raises a number of issues on how to model that phase, since the sparse models are usually specified in terms of the absolute phase. A solution to address this issue is to perform phase unwrapping using a suitable algorithm (e.g., PUMA [60]), obtain estimates of the absolute phase and then perform sparse modeling on it, as done in [2]. However, the unwrapping algorithms are highly sensitive to noise and image discontinuities larger than π. In addition, in many practical applications, the phase and amplitude are strongly correlated and a decoupled sparsification of them is equivalent to assume that they are statistically independent, thus failing to exploit the correlation that might exist between them.
Here, we adopt the viewpoint introduced in [53] to introduce a new PR strategy in which sparse modeling of the wavefront is done in the complex domain. This new strategy has the following distinct advantages: (1) inherent exploitation of the phase-amplitude correlation, as they are treated together in the complex domain form of wavefront; and (2) increased robustness to the noise, as the phase unwrapping is not used.
Sparse and redundant representations are recent and active research topics in signal and image processing [44] and have wide applications in many areas, such as denoising, restoration, interpolation, compression, sampling, recognition, etc., to list only a few. The patch-based approaches are very popular in this context. Motivated by the recent phase imaging techniques [53][54][55]62], we stress that, if a and ψ are self-similar, then their complex version a e jψ should also be self-similar (see [53] for further details). This means that it is possible to find similar complex domain patches located at different parts of the wavefront, which opens the door to sparse representations, where each complex image patch is represented by a linear combination of a small number of vectors, often termed atoms, from a (possibly learned) dictionary. Herein, we adopt this perspective to the optical imaging scenario of interest. The assumption of self-similarity results naturally from the fact that the involved amplitude and phase images are from real-world objects.
As a natural follow up of the above discussion, we put forward a new idea for an optical wavefront modeling based on sparse regression using complex domain dictionaries. We follow a patch-based approach for the sparse wavefront modeling. Let R i ∈ {0, 1} w 2 ×n and x p i := R i x ∈ C w 2 denote, respectively, a matrix that extracts the i square patch of size w × w from the wavefront image x and the extracted patch of size w 2 . It is to be noted that the superscript p in x p i indicates that x p i is a patch and the same notation is used hereafter. We also remark that the rows of R i are a subset of the rows of the identity matrix of size n. The index i ∈ I p of a given patch refers to the wavefront image pixel corresponding to the top left-hand side of the patch. Using these definitions, the patch aggregation, i.e., patches to image transformation can be realized using the following equation: where ∑ i∈I p R H i x p i is the sum of all patches x p i put back into their original locations. We emphasize that the matrix ∑ i∈I p R H i R i is diagonal and its jth diagonal element indicates the number of times the pixel j appears in the set of extracted patches. The patches are extracted and later aggregated in an overlapping manner and we refer to [53] for more details on the decomposition and composition of the patches.
To better understand the sparse regression, at this point, we assume that a dictionary D ≡ [d 1 , d 2 , ..., d k ] ∈ C w 2 ×k for sparsely representing a complex patches x p i , i ∈ I p , is available. By sparse regression, we mean that any patch x p i , i ∈ I p , may be represented as a linear combination of few columns, often termed atoms, from D. Regarding the patch x p i , our sparse wavefront modeling is formally defined as follows: where α ∈ C k is the optimization code for x p i , δ ≥ 0 is a parameter controlling the reconstruction error, and ||α|| 0 denotes the 0 pseudo-norm of α, i.e., the number of nonzero elements of the vector α. The constrained optimization in Equation (7) has the following interpretation: since we are minimizing the zero norm of the code, the optimization aims to find an α with the fewest number of non-zero entries, keeping the reconstruction error ||x p i − Dα|| 2 2 below a pre-controlled value δ. We incorporate the sparse regression in Equation (7) in the phase retrieval algorithm to model the object wavefront and this, in turn, serves as a filter at the object plane of the optical setup. The algorithms for learning dictionary D and code α are discussed in Section 3.

Noisy Observation Modeling
The observation model adopted in many PR algorithms assumes a noiseless scenario. However, in practice, the measured intensities at the sensors are often corrupted by noise. Two common sources of noise are: (1) the random motion of the charge carriers such as thermal noise generated in the sensor, which is modeled by additive white Gaussian perturbations; and (2) photon-limited measurements, which are modeled by Poissonian perturbations. Thus, we develop the DLPR algorithm by considering either Poissonian or Gaussian noise observation models.

Poissonian Observation Model
The optical imaging systems usually capture the wavefront intensity as the counts of photons hitting a detector. There are many practical limitations that restrict us from using a high energy radiation with a sufficient number of photons. For instance, in medical X-ray imaging, a high energy radiation can cause damage to the specimen, which is often a human organ. In addition, an imaging system that can deploy only low exposure time or can use only a limited amount of light (e.g., night vision and astronomical imaging systems) results in a low number of photons. Phase imaging using such low energy radiation is termed as photon-limited phase imaging, which results in heavily noisy measurements. The widely used additive Gaussian noise model fails to accurately model such discrete observations and, instead, a Poisson distribution is used [63][64][65][66].
In the presence of Poissonian noise, we assume that the components z s [l], for l = 1, . . . , n and where P (θ) stands for Poisson distribution with parameter θ ≥ 0. Now, let us consider a photon-limited imaging scenario and use the scaling factor χ > 0 to scale the mean value of the photon count. We restate Equation (8) as where χ is an important parameter that controls the photon-count, which in turn controls the noise level. In a practical imaging system, χ is a function of various factors such as exposure time, sensor sensitivity, etc. Using the Poisson distribution formula, the model in Equation (9) can be rewritten as where k ∈ N 0 is the photon count at the sensor. To understand the effect of χ on the noise level, we define the signal-to-noise ratio (SNR) at the sensor as the ratio between the square of the mean and the variance (equal to mean for Poisson random variable) of z s [l], yielding Equation (11) shows that SNR grows linearly with χ. Equation (11) motivates us to write a global estimate of SNR, as

Gaussian Observation Model
We also consider the widely used Gaussian noise that models, e.g., the thermal noise at the sensor. In this case, the observation model is where is a set of independent and identically distributed (i.i.d.) random variables, and σ stands for the standard deviation of the noise.

Dictionary Learning Phase Retrieval (DLPR) Algorithm
We propose a novel idea, based on sparse modeling of the object wavefront using complex-valued dictionaries, which we term as DLPR, to address the PR problem discussed in Equation (3). DLPR is an alternating minimization algorithm in which the sparse modeling of the object wavefront improves the conditioning of the inverse problem and its robustness to noise. The algorithm is derived for both Poissonian and Gaussian observation models and is developed based on the optical setup, as discussed in Figure 2.

DLPR for Poissonian Observation Model
Referring to the Poissonian observation model in Equation (10), we state our PR problem as the estimation of x ∈ C n from the noisy observations {z s } s=S s=1 . The corresponding negative log-likelihood (after neglecting the constant terms) is or equivalently s.t.: u s = A s x, s = 1, . . . , S.
A likelihood formulation similar to Equation (15) is used in [3,36] in which the focus is to maximize L({u s }, x) by computing ∂L({u s }, x)/∂x using Wirtinger derivatives [67] in an iterative manner. Here, however, we take a different approach by converting the hard constraint into a quadratic penalty and including a dictionary-based sparse regression term, which promotes image self-similarity. The DLPR variational formulation is as follows: where C is a convex set to be defined later, {u s } := {u s } s=S s=1 , and {α i } := {α i } i∈I p . The parameter τ a ≥ 0 controls the level of sparsity and γ, β > 0 control the weights of the quadratic penalties. Let us introduce the notation and use it to rewrite Equation (17) as The objective function in Equation (20) is non-convex with respect to the variables ({u s }, x, {α i }, D). We address this issue by alternating minimization methods, which is a common approach to tackle this sort of non-convexity [68]. As per this strategy, the likelihood L is optimized by considering one variable at a time, treating the others as constants. Below, we derive the optimization with respect to each variable. For each optimization, we rewrite the objective function by considering the terms depending on the optimization variable and disregarding the other terms.
Given that Equation (21) is decoupled with respect to u s [l], for s = 1, . . . , S and l = 1, . . . , n, we have is the proximity operator [69] of the function g s γ/2 computed in [63] and given by (see [70] for further details) For large values of the scale factor χ, we have In addition, from Equation (10), it follows that a very high value of the scaling factor χ models the noiseless observations where the Poissonian model can be replaced with a deterministic model with probability 1. In this case, we can write Using (24) and (25), we rewrite (22) to obtain the amplitude update formula, for noiseless observation, as follows: 3.1.2. Problem 2: Optimization with Respect to x x = arg min The minimum condition on Equation (27), obtained by equating the Wirtinger derivative for a complex domain function to zero, i.e., ∂(.) ∂x = 0, leads to the following least-square equation: and to the solution: We remark that: (i) the matrix ∑ i∈I p R H i R i is diagonal and the jth diagonal element (e.g., µ j ) represents the number of times the pixel j appears in the set of extracted patches; (ii) for the propagation matrix A s = FM s (see description of Equation (3)), A H s A s = I n , where I n is the identity matrix of order n × n; and (iii) the vector ∑ i∈I p R H i Dα i is the sum of the reconstructed patches Dα i put back into their original locations. From Remarks (i) and (ii), it can be shown that the matrix to invert in Equation (29) is diagonal with its diagonal entries given by S/βγ + µ j , j = 1, 2, ..., n. Thus, the inversion of the large matrix can easily be precomputed.

Problem 3: Optimization with Respect to {α i }
Given that the optimization with respect to {α i } is decoupled with respect to α i , for i ∈ I p , the optimization with respect to {α i } amounts to compute Optimization in Equation (30) is NP-hard due to the presence of the non-convex 0 -norm [71]. There are two common approaches to tackle this issue: (1) Apply a convex relaxation in (30) by replacing the 0 -norm with a convex surrogate, often the 1 -norm, as seen in LASSO [72], BPDN [73], etc. (2) Find an approximate solution to the non-convex problem in (30) using greedy procedures such as orthogonal matching pursuit (OMP) [74], Hard Thresholding Pursuit [75], approximate message passing (MP) [76], iterative hard thresholding (IHT) [77], etc. Among the greedy algorithms, OMP is considered to be computationally light. In addition, empirically, we have observed slight advantages for OMP compared to BPDN (see [53,78]). Hence, in our work, we choose the OMP algorithm [74] tailored to the complex domain.
Solving Equation (30) is equivalent to solving α i = argmin α ||α|| 0 s.t. : ||R i x − Dα|| 2 2 ≤ δ, as discussed in Equation (7). It is to be noted that in this equivalent version, the reconstruction error ||R i x − Dα|| 2 2 is kept below a pre-controlled value using a new parameter δ (τ a and β do not play any role here). To fix the value of δ, the residual R i x − Dα is assumed to be a complex Gaussian vector [79]. Then, assuming the standard deviation σ x of the complex-valued noise in x is known, ||R i x − Dα|| 2 2 /(σ 2 x /2) follows a chi-squared distribution χ 2 (2w 2 ) with 2w 2 degrees of freedom. Following the strategy proposed in [80], δ is found as δ = (σ 2 x /2)F −1 χ 2 (2w 2 ) (µ), where F −1 χ 2 (2w 2 ) (.) is the inverse cumulative distribution function of χ 2 (2w 2 ). All the results shown in Section 4 are obtained by setting µ = 0.96, which we empirically observed to yield optimal results. In addition, σ x is estimated from x using the first order horizontal and vertical differences.
The pseudo code for Complex domain OMP, courtesy of SpInPhase [53], is shown in Algorithm 1.

Algorithm 1: Orthogonal Matching Pursuit (OMP).
Input : D ∈ C w 2 ×k (Dictionary from C-ODL, Algorithm 2), x p ∈ C w 2 , (input patch) δ > 0 (error tolerance) Output : α ∈ C In Algorithm 1, D Q is a matrix holding the atoms of D indexed by Q and D # Q is the pseudo-inverse of D Q . In addition, α (Q) represents the components of α with indexes in Q.

Problem 4: Optimization with Respect to D
If the dictionary D is known beforehand, the optimization with respect to D is disregarded. However, if D is unknown beforehand, we should solve the quadratically constrained quadratic program (QCQP) where the convex set C is introduced to prevent the dictionary atoms d j k j=1 being arbitrarily large, which in turn leads to arbitrarily small values of the codes α i . We remark that the presence of the 0 terms ∑ i∈I p α i 0 in the objective function L promotes sparse codes and, therefore, as desired, dictionaries are able to sparsely represent the restored patches. In a large number of experiments, we have observed, however, that the quality of the dictionary, regarding its ability to produce sparse codes, improves if we include an 1 norm in the optimization of Equation (31) and solve simultaneously with respect to D and α, that is, if we solve, s.t.: D ∈ C.
It is to be noted that the optimization of 1 -norm results in better dictionaries. It may be linked with difficulties in obtaining exact solutions based on the non-convex 0 -regularizes. These findings are in line with those observed in [53,78].
In Equation (33), the joint optimization with respect to D and α is not convex, although it is convex with respect to each variable alone. This naturally invokes the alternating minimization methodology, in which one variable is minimized at a time keeping the other one fixed, in an iterative manner.
The real domain version of the optimization problem in Equation (33) was solved in Online Dictionary Learning (ODL) for Sparse Coding [78] using the alternating minimization framework. More recently, its complex domain version was proposed for interferometric phase image estimation [53] with slight modifications, which we term as C-ODL. Both ODL and C-ODL uses BPDN for sparse coding, i.e., the optimization with respect to α, and projected block-coordinate descent method to update the columns of the dictionary. The BPDN problem in [78] is solved using Least Angle Regression (LARS) [81], whereas in [53] a much faster algorithm called sparse regression by variable splitting and augmented Lagrangian (SpaRSAL) is introduced for that purpose. We adopt the dictionary learning methodology proposed in C-ODL, whose pseudo code, courtesy of SpInPhase [53], is shown in Algorithm 2.

DLPR for the Gaussian Observation Model
We now derive DLPR for the Gaussian noise as detailed in Equation (13). Then, the likelihood function in Equation (17) can be rewritten as Comparing Equations (17) and (34) We remark here that, as the value of γ changes from 0 to ∞, the solution of Equation (35) changes from |v s [l]| to z s [l].
Pseudo code for the proposed DLPR algorithm, which combines the results derived in Sections 3.1 and 3.2 for forward propagation, sensor plane filtering, backward propagation, and object plane sparsification is given in Algorithm 3. A schematic representation of the DLPR algorithm in the form of a block diagram is shown in Figure 4. To better understand the signal flow of DLPR in relation with the optical setup (Section 2), the block diagram is partitioned into three regions. (i) Object plane: All the operations related to the wavefront sparse modeling, namely patch formation, dictionary learning, OMP, sparse regression and patch aggregation are part of this region. These blocks perform the operation given by

Experiments and Results
In this section, we present a series of strong empirical evidence, using semi-real and synthetic data, to illustrate the effectiveness of DLPR. Prior to the results, the details regarding the experimental setup are described below: Optical setup: We restrict our optical setup to the lensless imaging scenario, as illustrated in Figure 2, which uses coded diffraction patterns for optical imaging. To implement the wavefront modulation, in alignment with the works presented in [2,3,36], a random phase value sampled from the set {0, π/2, −π/2, π} is selected with equal probability for each pixel of the mask.
Algorithms for performance comparison: The proposed DLPR algorithm is designed mainly for moderate and highly noisy scenarios. The recent TWF algorithm [36], designed for noiseless or small level noisy data, is chosen for comparison purposes. To the best of our knowledge, SPAR [2] is the state-of-the-art for retrieving phase from the noisy observations and is a good candidate for the comparisons. We also consider a third candidate, which we term as GS-F algorithm, by skipping the sparse modeling of the wavefront at the object plane, i.e., by making the substitutionx t =x t−1/2 , in Algorithm 3. The main rationale behind GS-F is that it helps to visualize how much improvement, in terms of performance, is contributed by the proposed sparse wavefront modeling. The performance of the conventional GS algorithm is expected to lag behind GS-F in noisy observation scenarios, as the GS lacks sensor plane filtering.
Although we consider SPAR, TWF, and GS-F for comparison purposes, the main competitor of DLPR is SPAR as it includes sparsity-based object wavefront filtering aiming at good performance for noisy data. Since the SPAR algorithm applies sparsity on absolute phase, phase unwrapping has to be implemented to obtain the absolute phase in each iteration, which sets a bottleneck to the performance of SPAR, especially for highly noisy data. We make use of the publicly available MATLAB demo-codes (http://www.cs.tut.fi/sgn/imaging/sparse/) for SPAR and TWF.
DLPR parameter settings: Parameter settings are crucial in DLPR's performance. The parameters of Algorithm 3 are heuristically set to the following values: γ = 1/χ, β = χ/1000 for Poissonian observations and γ = σ 2 /10, β = 0.01/σ 2 for Gaussian observations. For the wavefront modulation, SPAR [2] suggests a value of S ≥ 10. Without loss of generality, in alignment with the suggestions provided in [2], all our experiments are conducted by keeping S = 12 , i.e., for 12 observations with different phase modulation masks. The patches used in all experiments are square, having dimension 10 × 10, extracted with unit stride. The dictionary learning (C-ODL, Algorithm 2) parameters are tuned to the following values for the optimal results: T = 30, η = 64, and λ = 0.11 (see [53] for more details ). As justified in Section 3.1.3, the tolerance parameter δ of OMP (Algorithm 1) is set to We also remark that the algorithms chosen for comparison, i.e., SPAR, GS-F and TWF, are tuned to their optimal performances by using the parameter settings given in [2].
Performance evaluation: Since the phase is the main focus of a typical PR problem, the performance of DLPR is evaluated based on the quality of the retrieved phase ψ 2π using Root Mean Square Error (RMSE) defined as where W is the wrapping operator defined in Equation (5) and ψ 2π is the estimate of the true wrapped phase ψ 2π . Synthetic dataset: In an optical wavefront x = a e jψ ∈ C n , the vectorized phase image ψ ∈ R n and amplitude image a ∈ R n + can be quite correlated. To simulate various amplitude-phase relationships, we create different groups of complex signals in which the phases and amplitudes are related in different ways. We introduce the following formula to relate phase and amplitude in different ways: where k 0 and k 1 are parameters used to control the level of amplitude and the function f (.) is used to control the correlation between phase and amplitude. Based on Equation (37), we define the following groups of complex signals: 1. Group 1: Invariant amplitude, i.e., a i = 1. Using the above definitions for the image amplitudes and the simulated phase surfaces as shown in Figure 5, nine different complex-valued synthetic dataset are generated, which are summarised in Table 1.  Less similar Shear Plane

Experiments Using Synthetic Dataset
In this section, the experiments conducted using the dataset given in Table 1 are presented. DLPR is tested for each group in the Table 1 by considering noise power ranging from high to medium level, i.e., χ = [0.00001, 0.0001, 0.001, 0.01]. SNR values corresponding to these noise levels are. respectively. [−7, 3, 13, 23] dB. RMSE values of the retrieved phase, averaged for each group, are given in Figure 6. The performances of SPAR, TWF and GS-F are also included in the same figure. It is evident in the figure that the DLPR beats TWF and GS-F with a good margin. In comparison with SPAR, DLPR performs better for heavy noisy data (χ = 0.00001). This observation is in clear alignment with our early intuition about DLPR. On the other hand, for a medium level of noise, DLPR and SPAR give very close results. It should be noted that the low-noise scenario is not considered here as the objective of our algorithm is to retrieve phase from highly noisy observations. However, we remark that, when the observation mechanism is noise free or with very low level of noise, all four algorithms show similar performances.

Phase Unwrapping
In a practical PR scenario, the phase obtained from retrieved complex wavefront is wrapped. The absolute phase is obtained by phase unwrapping in which the quality of the retrieved wrapped phase is a crucial factor. The following experiment is a qualitative illustration to show that the high quality of the phase images retrieved through DLPR underlies the good results during the unwrapping step. Here, the phase retrieved from heavily noisy observations are unwrapped using PUMA algorithm [60] (state-of-the-art in phase unwrapping). It is evident in Figure 7a,b that TWF and GS-F provide very poor estimates. Compared to these two algorithms, DLPR and SPAR perform better. For shear plane Figure 7b, DLPR estimate is slightly better compared to SPAR. However, for truncated Gaussian Figure 7a, DLPR is much better than SPAR.

Experiments Using Real MRI Interferograms
Experiments using real MRI interferograms are included in this section. The MRI interferograms (the work was carried out on a 1.5 T GE Signa clinical scanner operating within Western General Hospital (WGH), University of Edinburgh [83]) used in this section are obtained by scanning the human head region along side, top and front orientations, as shown in Figure 8. The complex-valued test images are generated by using each of these interferograms (Figure 8a-c) as the clean phase (ψ 2π ) with the amplitude set to unity. These complex-valued images are self-similar, and thus its patches are well approximated by sparse representations over a learned dictionary. Hence, we emphasize that although an optical set-up is being considered in our discussion, these MRI interferograms are relevant test images, which we call as semi-real data. In comparison to the synthetic data used in the previous experiments, these real interferograms do not have a well-defined smooth structure and are highly challenging data for a PR experiment. Despite these challenges, DLPR remains to be a strong candidate proving its ability to retrieve real interferograms. The estimates for a highly noisy observation (χ = 0.00001) are shown in Figure 9. It can be qualitatively examined that DLPR estimates are much better in preserving the sharp details of the interferograms. This is clearly supported by the RMSE values. The results for other noise levels are shown in Figure 10, in the form of a graph, from which we can arrive at the same conclusion. These experiments show the strong ability of DLPR to retrieve real phase data.

Gaussian Observation
In this section, PR is conducted by considering a Gaussian observation model. Very low SNR values, i.e., SNR ∈ {1, 3, 7, 10} dB is considered corresponding to highly to moderate noisy scenarios. In this section, we are not repeating all the experiments conducted for Poissonian case. However, the toughest experiments with real MRI interferograms are repeated. The results are shown in Figure 11, which indicate that DLPR performs exceptionally well and beats its competitors with a good margin for all noise levels.

Prior-Plugged DLPR for Class-Specific Phase Retrieval
DLPR Algorithm 3 learns the dictionary iteratively and retrieves the phase from observed intensities. However, in many practical applications, the phase to be recovered is known to belong to certain classes of images. In such applications, the clean images from the specific classes can be used to learn prior information and to enhance the PR process. Most of the PR methods that we discussed do not have the algorithmic structure to incorporate such priors. However, DLPR, being a dictionary-based method, opens the door to the exploitation of the learned prior. Algorithm 4, which we term as prior-plugged DLPR, is a modified version of the original DLPR Algorithm 3, obtained by skipping the online dictionary learning step and including the learned dictionary D l (prior) as input.
In this section, the class-specific phase retrieval using prior-plugged DLPR is illustrated using two experiments, the first one using simulated data and the second one using semi-real data. In the first experiment, to learn the prior (dictionary), a Gaussian surface and its four quarters, as shown in Figure 12a, is considered. The C-ODL Algorithm 2 is used to learn the dictionary. PR is performed on a truncated Gaussian surface created by cutting off the alternate octants of a Gaussian surface, as shown in Figure 12b. We remark that the retrieval of this surface is quite challenging as it contains less connected areas and has a lot of sharp discontinuities. Although the prior and the testing surface are different, they belong to the same class of truncated Gaussians and share many identical regions. Figure 13a shows the results for prior-plugged DLPR in comparison with GS-F, SPAR, and DLPR using very high Poissonian noise (χ = 0.00001). It is quite evident that the prior-plugged DLPR shows remarkable improvement by exploiting the learned prior. In comparison with the original DLPR, which does not use the prior, the prior-plugged DLPR has an improvement in RMSE by 0.41 − 0.20 = 0.21. The results for similar experiments by considering other noise levels are plotted in Figure 13b in which prior-plugged DLPR beats all other methods, especially for the highly noisy cases.
Input : z s ∈ R n (noisy observation) Init : x 0 ∈ C n (object wavefront), D l ∈ C w 2 ×k (learned dictionary), α 0 ∈ C k (code), γ, β > 0(regularization parameter), S ∈ N (Number of observations), T ∈ N (number of iteration) Output : x T , α T 1 for t ≤ T do  (23) and (35) for Poissonian and Gaussian observations respectively 4 Backward Propagation: Sparse wavefront modeling at the object plane: Patch to image formation:  The remarkable improvement in RMSE in the above experiment is due to the fact that both the prior-learning surface and the phase to be retrieved share many identical regions. However, in practice, this may not be true and we only have access to data from the same class. The phase to be retrieved need not possess regions exactly identical with the training images. To account this factor, we present a second experiment using real MRI interferograms. The details of the dataset (the work was carried out on a 1.5 T GE Signa clinical scanner operating within Western General Hospital (WGH), University of Edinburgh [83] ) is shown in Figure 14. In this experiment, the MRI phase images obtained from four different persons are used. These images are the interferograms of the head region obtained through the scanning along the front, side and top orientations. In the following experiment, a particular scanning orientation, e.g. front view, is taken as a specific class. The dictionary is learned from the front view interferograms of Persons 1-3. This dictionary is used as the prior for retrieving the interferograms of the fourth person along the front view. In the PR experiment, a very high Poissonian noise corresponding to χ = 0.00001 (SNR = −7 dB) is considered. The experiment is repeated for other two scanning orientations, i.e., side view and top view. The result in terms of RMSE is given in Table 2. In the table, it is evident that prior-plugged DLPR is much better compared to SPAR, GS-F, and TWF and yields slightly better results compared to normal DLPR. We emphasize that in prior-plugged DLPR, the fourth person's MRI phase image is retrieved just by using the dictionary learned from the scan images of the other three persons and. still, the obtained results are better than the normal DLPR. This proves that prior-plugged DLPR is a powerful tool for class-specific phase retrieval applications. Table 2. Performance evaluation for class-specific phase retrieval using MRI interferograms. The best values are shown in bold font.

Complexity of DLPR
The complexity of DLPR is mainly defined by the dictionary learning and OMP steps. A theoretical analysis of this is beyond the scope of the work presented. However, we characterize the complexity by the time required to retrieve the phase. All the results presented in this paper are obtained through 20 iterations of DLPR and 50 iterations (as suggested by SPAR) of SPAR, GS-F, and TWF. We use MATLAB R2015b and computer with the processor Intel(R) Core(TM) i7-4790 CPU @3.60 GHz, 16 GB RAM. With the above-mentioned number of iterations using an image of size 100 × 100 under Poissonian noise (χ = 0.00001), the time required for phase retrieval is as follows: DLPR 77.6 s, prior-plugged DLPR 23.6 s, SPAR 15 s, GS-F 0.43 s and TWF 0.89 s. It is to be noted that, in terms of time of convergence, DLPR lags behind SPAR due to its dictionary learning and sparse coding sub-iterations that are computationally demanding.

Conclusions
Phase Retrieval from noisy intensity observations is addressed in this paper. A noisy optical system with wavefront modification is adopted for the discussion. The conventional PR algorithms fail to retrieve good quality phase images from noisy observations. To address this issue, a novel algorithm, termed as DLPR, has been proposed in this paper. DLPR is an iterative algorithm developed under the alternating minimization framework and incorporates specially designed filters both at the sensor and the object plane of the optical imaging system. At the object plane, a sparse wavefront modeling using complex-valued dictionary is done to obtain noise immunity. A second filtering mechanism is applied at the sensor plane aiming at additional noise suppression and is designed for Poissonian and for Gaussian observations. All the filters and optic propagation models are designed in a variational framework that maximizes the likelihood function and is aimed to iterate towards statistically optimal estimates.
The performance of the proposed algorithm was tested using challenging simulated and semi-real datasets. Three PR algorithms, namely SPAR, TWF, and GS-F, were used for performance comparisons; among these, SPAR is a BM3D-based state-of-the-art in Phase Retrieval. DLPR shows good improvements over the conventional GS family and the recent TWF algorithm for highly noisy observations and its performance is quite competitive to SPAR. It is shown that DLPR beats SPAR for highly noisy observations. DLPR also has an added advantage of prior exploitation in class-specific phase retrieval over SPAR, which was empirically demonstrated. As a future work, we suggest further research on dictionary-based phase retrieval to improve the convergence speed of the algorithm by exploiting fast dictionary and sparse coding techniques.