Nearly Exact Discrepancy Principle for Low-Count Poisson Image Restoration

The effectiveness of variational methods for restoring images corrupted by Poisson noise strongly depends on the suitable selection of the regularization parameter balancing the effect of the regulation term(s) and the generalized Kullback–Liebler divergence data term. One of the approaches still commonly used today for choosing the parameter is the discrepancy principle proposed by Zanella et al. in a seminal work. It relies on imposing a value of the data term approximately equal to its expected value and works well for mid- and high-count Poisson noise corruptions. However, the series truncation approximation used in the theoretical derivation of the expected value leads to poor performance for low-count Poisson noise. In this paper, we highlight the theoretical limits of the approach and then propose a nearly exact version of it based on Monte Carlo simulation and weighted least-square fitting. Several numerical experiments are presented, proving beyond doubt that in the low-count Poisson regime, the proposed modified, nearly exact discrepancy principle performs far better than the original, approximated one by Zanella et al., whereas it works similarly or slightly better in the mid- and high-count regimes.


Introduction
The image restoration problem under Poisson noise corruption is a task that has been extensively addressed in the literature as it arises in many real-world applications, where the acquired image is obtained by counting the particles, e.g., photons, hitting the image domain [1]. The typical image formation model under blur and Poisson noise corruption takes the form where H ∈ R m×n models the blur operator, which we assume to be known; y ∈ N m and x ∈ R n are the observed m 1 × m 2 and unknown n 1 × n 2 images in column-major vectorized form (with m = m 1 m 2 and n = n 1 n 2 ), respectively; b ∈ R m is a non-negative background emission, and where Poiss(λ) := (Poiss(λ 1 ), . . . , Poiss(λ m )) T , with Poiss(λ i ) indicating the realization of a Poisson-distributed random variable of parameter (mean) λ i . When tackling the recovery ofx starting from y, one has also to consider the intrinsic constraintx ∈ Ω := {x ∈ R n :x ≥ 0} , which accounts for the pixel values being non-negative.
In a probabilistic perspective [2], problem (1) and (2) can be addressed by modeling the unknown x as a random variable. In general, the information on the degradation process is encoded in the so-called likelihood probability density function (pdf) p(y | x), while the prior beliefs on the unknown x are expressed by the prior pdf p(x). In the Bayesian framework, one aims to recover the posterior pdf, which is related to the likelihood and the prior term via the Bayes formula: p(x | y) ∝ P(y | x)p(x) .
with P denoting the probability mass function (pmf) that replaces the continuous pdf p to account for the discrete nature of the data y.
According to the Maximum A Posteriori (MAP) estimation approach, the mode of p(x | y) can be selected as a single-point representative of the posterior distribution, so that the original problem (1) and (2) turns into: In light of the constraint expressed in (2), the general form of the prior pdf reads and p 0 (x) encoding other information possibly available on x. A typical choice for p 0 (x) is given by the Total Variation (TV) Gibbs prior-see [3]-which reads where Z > 0 is a normalization constant, α > 0 is the prior parameter and D := (D T h , D T v ) T ∈ R 2n×n denotes the discrete gradient operator with D h , D v ∈ R n×n , two linear operators representing the finite difference discretizations of the first-order partial derivatives of the image x in the horizontal and vertical direction. The negative logarithm of the prior pdf p(x) thus reads with ι Ω (·) denoting the indicator function of set Ω, which is equal to 0 if x ∈ Ω, or +∞ otherwise.
Concerning the likelihood pdf, first, we notice that the forward model (1) can be usefully rewritten in component-wise (pixel-wise) form as follows: with y i ∈ N, λ i , b i ∈ R + , and where H i ∈ R 1×n denotes the i-th row of matrix H. Upon the assumption of independence of the Poisson noise realizations at different pixels, we have: where P(y i | λ i ), which denotes the probability for y i to be the realization of a Poissondistributed random variable with mean λ i , reads P(y i | λ i ) = λ y i i e −λ i y i ! , y i ∈ N, λ i ∈ R + , with R + denoting the set of non-negative real numbers. Hence, the associated negative log-pmf reads − ln P(y i | λ i ) = λ i − y i ln λ i + ln(y i !) .
By plugging (6) into (5), the negative log-likelihood takes the following form: Finally, plugging (4) and (7) into (3), dropping out the constant term ln Z in (4), readjusting the constant terms in (7) (by adding y i ln y i − y i − ln(y i !) to each term in the sum), and then dividing the cost function by the positive scalar α, we obtain the so-called TV-KL variational model: where µ = 1/α, the TV semi-norm term [4], is defined by and the term KL(λ; y) indicates the generalized Kullback-Leibler (KL) divergence between λ = Hx + b and the observation y, which reads KL(λ; y) = m ∑ i=1 F(λ i ; y i ) , with F(λ; y) := λ − y ln λ + y ln y − y .
Note that the adoption of the MAP strategy within a probabilistic framework yields a minimization problem, which is typically addressed in the context of variational methods for image restoration. The TV term and the KL divergence play the role of the regularization and data fidelity term, respectively. Moreover, the parameter µ, which has been defined starting from the prior parameter α, is the so-called regularization parameter balancing the action of regularization and data fidelity terms.
The selection of a suitable value for the regularization parameter µ is of crucial importance for obtaining high-quality results. This relation is highlighted by the explicit dependence in (8) of the solutionx on the parameter µ. Very often, µ is chosen empirically by brute-force optimization with respect to some visual quality metrics. However, for Poisson data, a large amount of literature has been devoted to the analysis of the Discrepancy Principle (DP), which can be formulated in general terms as follows [5]: where the last equality and the scalar ∆ ∈ R + in (10) are commonly referred to as the discrepancy equation and the discrepancy value, respectively, while the discrepancy function D( · ; y) : R + → R + is defined by with function F defined in (9) and The DP in (10)-(12) formalizes a quite simple idea: choose the value µ * of the regularization parameter µ in the TV-KL model (8) such that the value of the KL data fidelity term associated with the solutionx(µ * ) is equal to a prescribed discrepancy value ∆. However, applying the DP in an effective manner in practice is not straightforward as several issues concerning the computational efficiency and, more importantly, the quality of the output solutions arise. We examine both of them more closely.
(I1) Computational efficiency. The solution functionx(µ) of model (8) does not admit a closed-form expression and iterative solvers must be used to compute the restored imagex associated with any µ. Hence, selecting µ * by solving the scalar discrepancy equation defined in (10)- (12) as an efficient preliminary step and then computing the sought restored imagex(µ * ) by iteratively solving model (8) only once is not feasible. (I2) Quality of solution(s). Even if an efficient algorithm is used for the computation, the obtained restored imagex(µ * ) may be of such low quality that it is of no practical use if the discrepancy value ∆ in (10) is not suitably chosen.
Issue (I1) concerning computational efficiency has been successfully addressed in [6], where the authors propose to automatically update µ along the iterations of the minimization algorithm used for solving the TV-KL model so as to satisfy (at convergence) a specific version of the general DP defined in (10)- (12).
Concerning (I2), we highlight that, in the theoretical hypothesis that the target imagē x is known, so thatλ = Hx + b is also known, one would select µ * such that the value of the KL fidelity term associated with the solutionx(µ * ) is equal to the value of the KL fidelity term associated withx. This clearly does not guarantee that the obtained solution x(µ * ) coincides with the target imagex. However, by constrainingx(µ) to belong to the unique level set of the (convex) KL fidelity term containingx, this abstract strategy, which we refer to as the Theoretical DP (TDP), represents an oracle for the general DP in (10)- (12). The TDP is thus formulated as follows: with function F defined in (9). Clearly, the value ∆ (T) cannot be computed in practice as the original imagex is not available. As in the case of the Morozov discrepancy principle for Gaussian noise, one could replace the scalar ∆ (T) with the expected value of the KL fidelity term in (9) regarded as a function of the m-variate random variable Y. We will refer to this version of the DP as Exact (or Expected value) DP (EDP). In the following formula where E Y i F λ i (µ); Y i denotes the expected value of F λ i (µ); Y i regarded as a function of the Poisson-distributed random variable Y i . Nonetheless, unlike the Gaussian noise case, the discrepancy value is not a constant but is a function ∆ (E) (µ) of the regularization parameter µ, and deriving its analytic expression is a very challenging task. A popular and widespread strategy, originally proposed in [5] for denoising purposes and extended in [7] to the image restoration task, replaces the exact expected value function ∆ (E) (µ) with a constant approximation coming from truncating its Taylor series expansion. We will refer to this version of the DP as Approximate DP (ADP). It reads: Despite its extensive use due to the good performance achieved in the mid-and highcount regimes, the (15) is known to return poor-quality results in the low-count Poisson regime [8], i.e., when the number of photons hitting the image domain is small. In fact, in [7], where the ADP was first extended to the image deblurring task, the authors state (in Remark 3) that the choice of the constant value δ (A) = 1/2 in (15) may not be "optimal" and suggest replacing it with 1/2 + , where is a small positive or negative real number. As a preliminary, qualitative proof of the possible poor performance of (15) in the low-count regime, in the first column of Figure 1, we show the two test images phantom and cameraman, which have been corrupted by blur and heavy Poisson noise (second column). The TV-KL image restoration model in (8), with regularization parameter µ selected according to the (15), has been performed. The output restorations are displayed in the third column of Figure 1. One can see that the rough approximation δ (A) = 1/2 used in the (15) can either return oversmoothed results, as in the case of phantom, or undersmoothed restorations, as for cameraman.
Since its proposal in [5], the ADP has been (and still is) widely used for variational image restoration (see, e.g., [9,10]) and it can be regarded as the standard extension of the Morozov DP for Gaussian noise to the Poisson noise case. Then, some literature exists working on the ADP, e.g., by proposing, analyzing, and testing its usage in KL-constrained variational models [11] or by analyzing it theoretically [12]. However, to the best of the authors' knowledge, the only attempt to improve the ADP by giving a face to the adjustment to the approximate, constant discrepancy value δ (A) = 1/2 is the one in [8]. The authors in [8] correctly state that must not be a constant, but a function (λ) of the photon count level. However, they propose to take (λ) as the sum of the second to tenth terms of the same Taylor expansion used in [5]. As we will highlight later in the paper, such expansion converges only for λ approaching +∞; hence, the choice in [8] cannot aspire to improve the performance of ADP in low-count regimes.

Contribution
The goal of this paper is to provide novel insights about the EDP and the ADP in order to design a novel discrepancy principle capable of outperforming the classical ADP proposed in [5]. In more detail, we will provide a qualitative study proving that the recovery of a closed-form expression for function ∆ (E) (µ) in (14) through its Taylor series expansion used in [5,8] is not only difficult to achieve but also theoretically unfeasible for low-count Poisson regimes. Moreover, we will explore in detail the properties of the ADP motivating the dichotomic behavior (i.e., oversmoothing/undersmoothing) arising upon its adoption in the low-count regime. Finally, based on a simple Monte Carlo simulation followed by weighted least-square fitting, we will derive a novel version of the general DP in (10)-(12) based on a nearly exact (NE) approximation of function ∆ (E) (µ) in (14), concisely referred to as NEDP. Our approach will successfully address issues (I1)-(I2). In particular, it will be demonstrated experimentally that NEDP can return high-quality results both for low-count and mid/high-count acquisitions. The good performance of NEDP is anticipated in the last column of Figure 1, where we show the output restorations achieved by using the TV-KL model in (8) and (9) coupled with the novel µ-selection strategy.

Limits of the Approximate DP
The discrepancy principle proposed by Zanella et al. in [5] for Poisson image denoising and then extended to image restoration by Bertero et al. in [7] relies on Lemma 1 in [5], whose proof has been completed in [5] (corrigendum), which we report below for completeness.

Lemma 1.
Let Y λ be a Poisson random variable with expected value λ ∈ R ++ and consider the function of Y λ defined by Then, the following estimate of the expected value of F(Y λ ) holds true for large λ: Based on the estimate above, and implicitly assuming a sufficiently large λ (i.e., a sufficiently high-count Poisson regime) such that the O(1/λ) term can be neglected, the exact DP outlined in (14) is replaced in [5,7] by the approximation given in (15) and recalled below: However, the ADP performs poorly for low-count Poisson images. Our goal here is to highlight that the reason for this lies precisely in the constant approximation δ (E) (λ) ≈ δ (A) used in (15) and then propose a nearly exact DP based on a much less approximate estimate δ (NE) (λ) of the expected value function δ (E) (λ) .
For this purpose, first, we carry out a preliminary Monte Carlo simulation aimed at highlighting the error associated with the approximation in (15). In particular, we consider a discrete set of λ values λ i ∈ [0, 8] and, for each λ i , we generate pseudo-randomly a large number (10 6 ) of realizations of the Poisson random variable Y λ i . Then, we compute the associated values of the function F(Y λ i ) defined in (16) and, finally, for each λ i , we obtain an estimate δ (E) (λ i ) of δ (E) (λ i ) by calculating the sample mean of these function values. The results of this simulation are shown in Figure 2. In particular, in the left figure, we report the computed estimates δ (E) (λ i ) , whereas, in the right figure, we report the percentage errors (with respect to the estimates) associated with using the constant value δ (A) = 1/2 as in the (15). The percentage error approaches +∞ for λ tending to zero and is in the order of −10% for λ ∈ [1,4]; then, as expected, it decreases (quite slowly) to zero for λ tending to +∞. The error is thus quite large for small λ and this can explain the poor performance of the (15) in the low-count Poisson regime.
In order to obtain a more accurate approximation or even an exact analytical expression for the expected value function δ (E) (λ) , we now retrace in detail the proof of Lemma 1 given in [5], and completed in [5] (corrigendum), and check if the rough truncation carried out in [5] can be avoided.
After noting that function ln(1 + ϕ) is C ∞ on its domain (−1, +∞) and considering its Taylor expansion around 0, the Taylor theorem with the remainder in integral form allows one to write: Replacing the expansion above with ϕ = (Y λ − λ)/λ in the expression of function F defined in (16), we obtain After noting that the only random quantity in (19) is Y λ , the expected value reads with coefficients ω and where denote the central moments of order i + 2 of the Poisson random variable Y λ . It is well known (see [13], p. 162) that these moments can be obtained by the recursive formula After noting that, in (20), only moments η i+2 [Y λ ] with i ≥ 0 are present and that they are all divided by λ , it is easy to verify that, by applying (22), one obtains the following general algebraic polynomial expression: where ϑ i are all integer coefficients with ϑ (0) i = 1 for any i = 0, 1, . . . , and where the degrees d i of polynomials P i (λ) are given by where · denotes the floor function. The first 8 polynomials, P i (λ), i = 0, . . . , 7, read By replacing the expressions of P i (λ) given in (23) into (20), one obtains the following general formula: where the coefficients ψ given in (21) and ϑ (j) i defined in (23). After noting that, from (24), it follows that d i ≤ i for any i = 0, 1, . . ., it is a matter of simple algebra to verify that (25) can be equivalently and more compactly rewritten as with γ (N) i ∈ Q computable coefficients. In particular, for N = 1, . . . , 9, we have from which we note how, as the truncation order N increases, the coefficients γ  Figure 3 (left). These first 34 coefficients indicate that the coefficient sequence is positive and strictly increasing for i ≥ 2. This implies that making the truncation order N tend to +∞, the (infinite) weighted geometric series in (26) is divergent for λ ≤ 1. Even without analyzing the case λ > 1, we can state that an analytical form for function δ (E) (λ) in the low-count Poisson regime is very unlikely to be obtainable as the sum of the series in (26). In fact, there will be, very likely, at least one pixel such that λ i ≤ 1.  (15) proposed in [5,7] and the Monte Carlo estimates We believe that it is worth concluding this section by pointing out the theoretical reason for the non-convergence of the series in (26). Function ln(1 + ϕ) is analytical at ϕ = 0, but its Maclaurin series converges (pointwise to the function) only for ϕ ∈ (−1, 1]. Hence, as N tends to +∞, the Taylor series expansion in (19) converges to the function (20) represents a Poisson random variable with parameter λ. Hence, for N tending to +∞, the series in (20) converges to the function δ (E) From Figure 3 (right), where we plot the probability in (28) as a function of λ, one can notice that condition (28) for convergence of the series in (20) is fulfilled asymptotically for λ approaching +∞ but it is not satisfied at all for small λ values.

A Nearly Exact DP Based on Monte Carlo Simulation
Since it is not possible to derive analytically the expression of function δ (E) (λ) in (17), the goal in this section is to compute a nearly exact estimate δ (NE) (λ) of function δ (E) (λ) based on a simple Monte Carlo simulation approach analogous to that used at the beginning of Section 2. Based on the expected shape of function δ (E) (λ) -see Figure 2 This set comes from the union of three subsets of equally spaced λ values, namely from 0 to 6 with step 0.01, from 6 to 66 with step 0.1, and from 66 to 250 with step 1. For each λ i , we generate pseudo-randomly a very large number S = 5 × 10 7 of samples y (j) i , j = 1, . . . , S, of the Poisson random variable Y λ i ; then, we compute the associated values (16) and, finally, we calculate the sample mean δ (E) (λ i ) and variance v i of these function values. In formula, (29) Notation for the sample means come from them representing estimates of the sought theoretical means and (λ i , v i ) are shown (blue crosses) in the first and second row of Figure 4, respectively. It is well known that δ (E) (λ i ) and v i represent unbiased estimators of the mean and standard deviation of the random variable F(Y λ i ) and that, according to the central limit theorem, for a very large number S of samples (which is definitely our case), the sample mean δ (E) (λ i ) can be regarded as a realization of a Gaussian random variable with the theoretical mean δ (E) (λ i ) of the random variable F(Y λ i ) and the sample variance v i divided by the number of samples S. In formulas,  We now want to fit a parametric model f (λ; c) , with c the parameter vector, to the obtained Monte-Carlo-simulated data points λ i , δ (E) (λ i ) , i = 1, . . . , 1385. First, in accordance with the trend of these data-see the blue crosses in the first row of Figure 4-and recalling the expected asymptotic behavior of function δ (E) (λ) for λ approaching +∞-see the discussion in Section 2, particularly the first two terms of the expansion in (27)-we choose a model of the form with function h exhibiting the following properties: for λ → +∞ .
Then, with the aim of achieving a good trade-off between the model's ability to accurately fit data and the computational efficiency of its evaluation, we choose the following rational form for function : Thanks to (30), fitting model f in (31) with as in (32) can be obtained via a Maximum Likelihood (ML) estimation of the parameter vector c = (c 1 , c 2 , c 3 , c 4 ) ∈ R 4 . In fact, according to (30), the likelihood reads and the ML estimate c (ML) of c can be computed as follows: where we dropped constants (with respect to the optimization variable c) and defined Problem (34) is a nonlinear (in particular, rational) weighted least-squares problem. The cost function is non-convex and local minimizers exist. We compute an estimate c of We thus define the nearly exact estimate δ (NE) (λ) of the theoretical expected value function δ (E) (λ) = E[F(Y λ )] as the parametric function f defined in (31), (32) with parameter vector c equal to c given in (35). In formula, In the first row of Figure 4, we plot the constant approximate function δ (A) and the obtained nearly exact function δ (NE) (λ) , whereas, in the third and fourth row of Figure 4, we report the errors e (A) (λ i ) and e (NE) (λ i ) , respectively. They are defined by  Figure 4) and around 10 times less for λ ∈ [6, 250] (second and third column Figure 4). In particular, in the lowcount Poisson regime (which we can roughly associate with λ ∈ [0, 6]), the proposed nearly exact estimate of the theoretical expected value function δ (E) (λ) yields a percentage error in the order of 0.5%, whereas the constant approximation used in [5,7] leads to a percentage error in the order of 10%. Such a large error is the reason for the poor performance of the (15) in the low-count regime. We thus propose the following nearly exact DP (NEDP): with function and parameter vector c given in (32) and (35), respectively.

Numerical Solution via ADMM
In the following, we detail how to tackle the minimization problem in (8) and (9) when the regularization parameter µ is automatically selected according to one of the considered versions of the DP, namely the TDP in (13), the ADP in (15), and the NEDP in (37).
In principle, one could set a fine grid of µ-values and compute the solutionx(µ) corresponding to each µ. Then, among the recorded solutions, one could select the one such that the TDP, the ADP, or the NEDP is satisfied. However, this algorithmic scheme, to which we refer as the a posteriori optimization procedure, turns out to be particularly costly.
In [6,14], the authors propose to update the regularization parameter according to the ADP along the iterations of the popular Alternating Direction Method of Multipliers (ADMM) [15,16]. Here, we detail the steps of the proposed algorithmic procedure, which can be employed for applying not only the ADP but also the TDP and the NEDP. Finally, we remark that the case of the TDP is only addressed for explanatory purposes and it cannot be performed in practice asx is not available.
To solve problem (38), we introduce the augmented Lagrangian function, where ρ λ ∈ R m , ρ g ∈ R 2n , ρ z ∈ R n are the vectors of Lagrange multipliers associated with the linear constraints in (38), while β λ , β g , β z ∈ R ++ are the ADMM penalty parameters. By setting u := (x; λ; g; z), ρ = (ρ λ ; ρ g ; ρ z ), U := R n × R m × R 2n × R n , and R := R m × R 2n × R n , we observe that solving (38) amounts to seeking solutions of the saddle-point problem: Upon suitable initialization, and for any k ≥ 0, the k-th iteration of the ADMM algorithm applied to the solution of (40) with the augmented Lagrangian function L defined in (39) reads In the following subsections, we discuss the solution of subproblems (41)-(44) for the four primal variables λ, g, z, x. Since the solution of the subproblem for variable λ is the most complicated and requires the application of the DP, it is presented last.

Solving Subproblem for g
The subproblem for g in (42) reads Solving (48) is equivalent to solving the n independent two-dimensional minimization problems which yields the unique solution

Solving Subproblem for z
The subproblem for z in (43) reads Hence, the solution z (k+1) is given by the unique Euclidean projection of vector q (k) defined in (50) onto the (convex) nonnegative orthant and admits the simple closed-form expression (51)

Solving Subproblem for x
After dropping the constant terms, the x-subproblem in (44) reads: By imposing a first-order optimality condition on the quadratic cost function in (52), after simple algebraic manipulations, we obtain the following linear system of equations: which is solvable since the coefficient matrix is symmetric positive definite and hence nonsingular. When assuming periodic boundary conditions for x, the blur matrix H is square, i.e., m = n, and, more importantly, D T D, H T H and-trivially-I are block circulant matrices with circulant blocks. Hence, the linear system (53) can be solved efficiently by one application of the direct 2D Fast Fourier Transform (FFT) and one application of the inverse 2D FFT, each at a cost of O(n log n).

Solving Subproblem for λ and Applying the DP
The subproblem for λ in (41) reads After manipulating algebraically the last two terms of the cost function in (54), dropping constant terms, and then dividing by the positive scalar β λ , problem (54) can be equivalently rewritten as follows: In (55), we introduced the explicit dependence of the solution λ (k+1) on the parameter γ, which is the basis of the application of the DP. Notice that the ADMM penalty parameter β λ is fixed; hence, γ plays the role of the regularization parameter µ in the DP applied to this subproblem. Problem (55) can be further simplified after noting that it can be equivalently rewritten in component-wise (pixel-wise) form as follows: It is easy to prove that, for any γ ∈ R + and independently of the constants y i ∈ N and v (k) i ∈ R, all the minimization problems in (56) admit a unique solution given by We now want to apply one among the DP versions-namely, (13), (15), and the proposed (37)-outlined in Sections 1 and 3 for selecting a value of the free parameter γ in (57). In particular, we select γ = γ (k+1) such that γ (k+1) satisfies the discrepancy equation, which, in accordance with the general definition given in (10) where the discrepancy function reads with function F defined in (9), and where the discrepancy value ∆, according to the definitions given in (13), (15) and (37), takes one of the following values/forms: (13) , for (15) , with rational polynomial function defined in (32) and parameter vector c given in (35). We notice that ∆ (T) and ∆ (A) are two positive scalars that can be computed once for all and do not change their values during the ADMM iterations, whereas ∆ (NE) (γ) is a function of γ, which almost certainly changes its shape along the ADMM iterations (due to function λ (k+1) i (γ) in (57) changing its shape when vector v (k) in (55) changes). Summing up, the complete procedure for the DP-based update of the parameter γ and, then, of the variable λ reads as follows: Concerning the ADP, in [6], the authors have proven that along the ADMM iterations, the function D(γ; y) is convex and decreasing so that the existence and the uniqueness of the solution of the discrepancy equation in (58) with ∆ = ∆ (A) is guaranteed. The same result can be immediately extended to the case of the TDP. When considering the NEDP, the functional form of ∆ (NE) (γ) is such that the above result cannot be straightforwardly applied. However, the following proposition on the existence of a solution for the discrepancy equation (58) with ∆ = ∆ (NE) holds true. Proposition 1. Consider the discrepancy equation in (58)-(60) with ∆ = ∆ (NE) (γ) and with vector v (k) and function λ (k+1) i (γ) defined as in (55) and (57), respectively, and let t (k) := max v (k) , 0 . (64) Then, the discrepancy equation admits a solution if the following condition is fulfilled: where function T : R + × N → R is defined by with function F, function , and parameter vector c given in (9), (32), and (35), respectively.
Then, it is easy to prove that function λ with vector t (k) defined in (64). It thus follows from (67) and from the definition of functions D in (59) and ∆ (NE) in (60) that and that where function T in (68) is defined in (65), cancelling the first summation in (69) comes from F(y; y) = 0 for any y ∈ R + (see the definition of function F in (9), where y ln y = 0 for y = 0) and (70) comes from the definition of function f in (36). From (70) and the continuity of function G(γ; y), we can conclude that, for any y = 0, the discrepancy equation G(γ; y) = 0 admits a solution if G(0; y) ≥ 0. It thus follows from (68) that the sufficient condition in (64)-(66) holds true.
In Algorithm 1, we outline the general ADMM-based scheme used for solving image restoration variational models of the TV-KL form in (8) and (9) with automatic update/selection of the regularization parameter µ according to one of the considered versions of the DP. We refer to the general scheme as DP-ADMM, whereas the specific schemes using one among the DP versions TDP, ADP, and NEDP will be named TDP-ADMM, ADP-ADMM, and NEDP-ADMM, respectively. Notice that the γ-update at step 3 can be performed by means of a derivative-free approach, such as bisection or the secant method.
Algorithm 1: General DP-ADMM approach for image restoration variational models of the TV-KL form in (8) and (9) and automatic selection of µ via DP.

Numerical Results
In this section, we evaluate the performance of the proposed NEDP in (37) for the automatic selection of the regularization parameter µ in image restoration variational models of the TV-KL form in (8) and (9).
Our approach is compared with the TDP and the ADP in (13) and (15), respectively. For each criterion, we perform the ADMM-based scheme outlined in Algorithm 1. As recalled in Section 4, the µ-selection problem along the ADMM iterations always admits a unique solution under the adoption of the ADP and TDP. Concerning the NEDP-ADMM, at the generic iteration k of Algorithm 1, the regularization parameter µ is updated provided that the condition stated in Proposition 1 is satisfied. If this is not the case, the parameter update is not performed and µ (k) = µ (k−1) .
The numerical tests have been designed with the following twofold aim: (i) to prove that the proposed NEDP criterion is capable of selecting optimal µ values returning high-quality restoration results and, in particular, that it outperforms the classical ADP criterion in the low-count Poisson regime; (ii) to prove that the proposed NEDP-ADMM scheme outlined in Algorithm 1 is capable of automatically selecting such optimal µ values in a robust (and efficient) manner.
More specifically, the latter point will be proven by showing that the iterated and the a posteriori version of our approach behave similarly in terms of µ-selection.
The µ-values selected by the TDP, the ADP, and the NEDP applied a posteriori will be denoted by µ (T) , µ (A) , µ (NE) , respectively, while the output µ-value of the ADP-ADMM and of the NEDP-ADMM scheme will be denoted byμ (A) andμ (NE) , respectively.
The quality of the restorationsx with respect to the original uncorrupted imagex will be assessed by means of two scalar measures, namely the Structural Similarity Index (SSIM) [17] and the Improved-Signal-to-Noise Ratio (ISNR), defined by ISNR( x,x) = 10 log 10 For all tests, the iterations of the ADMM-based scheme in Algorithm 1 are stopped as soon as and the ADMM penalty parameters β λ , β g , β z are set manually so as to achieve fast convergence. More precisely, in each test, the same triplet (β λ , β g , β z ) of ADMM penalty parameters is used for the three compared discrepancy principles TDP, ADP, and NEDP, with β λ , β g , β z ∈ [0.5, 2]. We consider the four test images, each with pixel values between 0 and 1, shown in Figure 5. The acquisition process (1) has been simulated as follows. First, the original image is multiplied by a factor κ ∈ N \ {0} representing the maximum emitted photon count, i.e., the maximum expected value of the number of photons emitted by the scene and hitting the image domain. Clearly, the lower κ, the lower the SNR of the observed noisy image and the more difficult the image restoration problem. For each image, several values κ ranging from 3 to 1000 have been considered. Then, the resulting images have been corrupted by space-invariant Gaussian blur, with a blur kernel generated by the Matlab routine fspecial, which is characterized by two parameters, namely the band parameter, representing the side length (in pixels) of the square support of the kernel, and sigma, which is the standard deviation (in pixels) of the isotropic bivariate Gaussian distribution defining the kernel in the continuous setting. We consider two different blur levels characterized by the parameters band = 5, sigma = 1 and band = 13, sigma = 3. The blurred noiseless image λ = Hx + b is then generated by adding to the blurred image a constant emission background b of value 2 × 10 −3 . The observed image y = Poiss(λ) is finally obtained by pseudo-randomly generating an m-variate independent Poisson realization with mean vector λ.
The black solid curves plotted in Figure 6a,c represent the function D(µ; y) as defined in (11) and (12) for the first test image cameraman and κ = 5, for the less severe (first row) and more severe (second row) blur level. They have been computed by solving the TV-KL model in (8) for a fine grid of different µ-values, and then calculating D(µ; y) for each µ. The horizontal dashed cyan and green lines represent the constant discrepancy values ∆ (T) and ∆ (A) used in (13) and (15), respectively, while the dashed magenta curve represents the discrepancy value function ∆ (NE) (µ) defined in (37). We remark that ∆ (NE) (µ) has been obtained in the same way as D(µ; y), i.e., by computing the expression in (37) for each µ of the selected fine grid. One can clearly observe that the intersection points between the curve ∆ (NE) (µ) and the function D(µ; y) and between the line representing ∆ (T) and D(µ; y) are very close, and both at a significant distance from the intersection point detected by ∆ (A) .
cameraman (256 × 256) brain (253 × 238) phantom (246 × 245) cells (236 × 236) In Figure 6b,d, we show the INSR and SSIM values achieved for different µ-values with κ = 5. The vertical cyan, green, and magenta lines correspond to the µ-values detected by the intersection of D(µ; y) and ∆ (T) , ∆ (A) , ∆ (NE) (µ), respectively. As a reflection of the behavior of the discrepancy function and of the three curves, the ISNR/SSIM corresponding to µ (T) and µ (NE) are very close to each other and almost reach the maximum of the two curves. We also highlight that, when considering the more severe blur case, the ADP selects a very large µ-value, which returns very low ISNR and SSIM values-see the thumbnail image in the right-hand corner of Figure 6d.
We are also interested in verifying that the proposed NEDP-ADMM scheme outlined in Algorithm 1 succeeds in automatically selecting such optimal µ in a robust and efficient way: the blue and red markers in Figure 6b,d represent the final ISNR and SSIM values, respectively, of the image restored via NEDP-ADMM. Clearly, the markers are plotted in correspondence ofμ (NE) , which is, as we recall, the output µ-value of the iterative scheme NEDP-ADMM; one can clearly observe thatμ (NE) is very close to the optimal µ (NE) detected a posteriori by the NEDP.
As a further analysis, at the bottom of Figure 6, we report the output µ-values, the ISNR and the SSIM values for the two blur levels, and the 11 κ-values considered to be obtained by the ADP-ADMM (first column) and the NEDP-ADMM (second column). To facilitate the comparison, we also report in blue/red the increments/decrements of the ISNR and SSIM achieved by our method with respect to the approximate criterion. Notice that the NEDP outperforms the ADP both in terms of ISNR and SSIM for the low-count acquisitions. However, when the κ increases, the two methods behave very similarly, with the ISNR and SSIM values obtained by the ADP-ADMM being slightly larger than those obtained by the NEDP-ADMM. In accordance with this analysis, the outputμ (A) andμ (NE) are significantly different in low-count regimes, similar in mid-count regimes, and particularly close in high-count regimes. Notice that this behavior can be easily explained in light of the analysis carried out in Section 2, where we have shown that the approximation provided by ∆ (A) becomes more and more accurate as the number of pixels with large values increases.
For a visual comparison, in Figures 7 and 8, we show the observed images (left column), the restorations via ADP-ADMM (middle column) and via NEDP-ADMM (right column) for the less and more severe blur level, respectively, and when different photon count regimes, ranging from very low to very high, are considered. As already observed from the ISNR and SSIM values reported at the bottom of Figure 6, we notice that for low-count acquisitions, the µ-value selected by the ADP does not allow for a proper regularization, so that NEDP-ADMM clearly outperforms the competitor. However, starting from κ = 15for the first blur level-and from κ = 40-for the second blur level-the two approaches return similar output images.
For the second test image, brain, we report in Figure 9 the behavior of the discrepancy function D(µ; y) and of the ISNR/SSIM curves obtained by applying the TDP, the ADP, and the NEDP, for κ = 5 and for the two considered blur levels. In addition, in this case, the NEDP and the TDP behave similarly and they almost achieve the maximum of the ISNR and of the SSIM curves. In contrast, µ (A) appears to be largely underestimated with respect to the optimal µ-which can be intended as the one maximizing either the ISNR or the SSIM. As for the first test image, the blue and red markers, indicating the output ISNR and SSIM, respectively, of the iterated version of our approach, are very close to the ones achieved by applying the NEDP a posteriori, suggesting that alsoμ (NE) and µ (NE) are very close.
From the table reported at the bottom of Figure 9, we observe that the proposed µ-selection criterion, for every κ, returns restored images outperforming the ones achieved via the ADP both in terms of ISNR and SSIM. The poor behavior of the ADP can be related to the nature of the processed image, which, either for the low-count or high-count acquisitions, presents few pixels with large values so that the approximation in (15) is particularly inaccurate. As a signal of this, note that the outputμ (A) is always smaller-or significantly smaller-thanμ (NE) .
The restored images in Figures 10 and 11 reflect the values recorded in the table as, for the two considered blur levels, the output of the NEDP-ADDM appears to be remarkably sharper than the final restoration by ADP-ADMM, especially in low-and mid-count regimes.  In Figure 12, for the test image phantom, we report the curve of the discrepancy function D(µ; y) obtained a posteriori, as well as the curves of the ISNR and of the SSIM for κ = 3 and the two blur levels considered. As for the test image brain, in this case, the ADP also selects a µ-value that is far from the optimal one, either if measured in terms of ISNR or SSIM. On the other hand, µ (T) and µ (NE) are very close and, in particular, one can notice that the output of the NEDP-ADMM represents the optimal compromise in terms of ISNR and SSIM. Notice also that, when considering the larger blur level, the output valueμ (NE) of the NEDP-ADMM, detected by the red and blue markers, is larger than the one selected by the a posteriori version of the NEDP. However, the difference in terms of ISNR and SSIM is not particularly significant. This behavior is due to the use of different penalty parameters β λ , β g , β z in the ADMM for the a posteriori and the iterated version of our approach. In fact, when considering a large blur level, the convergence of the ADMM is particularly slow and can be achieved upon suitable selection of the penalty parameters, whose values may not coincide in the two scenarios addressed.
The mismatch observed in Figure 12 between the curves of the ISNR and of the SSIM is reflected in the values reported in the bottom part of the figure, whence we can conclude that the NEDP-ADMM outperforms the ADP-ADMM in terms of ISNR for each κ-value, while the ADP-ADMM returns slightly better results in terms of SSIM for high-count acquisitions. As for the test image brain, in this case, the outputμ (A) also appears to be significantly small. Once again, this behavior can be related to the considered image, which is mostly characterized by pixels with very small values.
From the restorations shown in Figures 13 and 14, one can also notice that the slight improvement in terms of SSIM does not correspond to any significant visual improvements. In fact, along the whole range of considered photon counts κ, the NEDP-ADMM is capable of returning sharper restorations. This reflects the tendency of ADP-ADMM applied on the current test image in selecting not sufficiently large µ-values, so that, in the TV-KL, the regularization term takes over. We also remark that for the current test image, the SSIM value does not seem to be particularly meaningful.
For the fourth and final test image, cells, we show in Figure 15 the behavior of the discrepancy function D(µ; y), as well as of the ISNR and SSIM values in the a posteriori framework for the two blur levels and κ = 3. Note that, in the a posteriori setting, for both blur levels, the NEDP achieves higher ISNR and SSIM values when compared to the ADP. However, we observe that when considering the larger blur level, the outputμ (NE) of the NEDP-ADMM is smaller than µ (NE) ; this behavior can be ascribed, once again, to the ADMM convergence issues and the different values selected for the penalty parameters.
From the values reported in the bottom part of Figure 15, we notice that the NEDP-ADMM outperforms the ADP-ADMM in every photon count regime. Clearly, the closer µ (A) andμ (NE) , the smaller the difference in terms of ISNR and SSIM.
The restorations computed by the ADP-ADMM and the NEDP-ADMM are shown in Figure 16 for the smaller blur level and in Figure 17 for the larger one. The obtained results confirm the values reported in the bottom of Figure 15. Moreover, also from a visual viewpoint, the difference between the two performances increases when going from high-to low-count regimes.

Conclusions and Future Work
We propose an automatic selection strategy for the regularization parameter of variational image restoration models under Poisson noise corruption based on a nearly exact version of the approximate discrepancy principle originally proposed in [5]. Our approach relies on Monte Carlo simulations, which have been designed with the purpose of providing meaningful insights into the limitations of the original approximate strategy, especially in the low-count Poisson regime. The proposed version of the discrepancy principle has then been derived by means of a weighted least-square fitting and embedded along the iterations of an efficient ADMM-based optimization scheme. Our approach has been extensively tested on different images and for different photon count values, ranging from very low to high values. When compared to the original approximate selection criterion, the proposed strategy has been shown to drastically improve the quality of the output restorations in low-count regimes and in mid-count/high-count regimes on images characterized by few large pixel values.
From an analytical point of view, investigating the uniqueness of the regularization parameter value satisfying the proposed discrepancy principle will certainly constitute future work. From a modeling and applicative perspective, the effectiveness of the proposed approach when applied to variational models containing regularizers other than TV or aimed at solving inverse problems other than image restoration will be the subject of future analysis. Finally, from an algorithmic viewpoint, a matter that deserves further investigation is the (possibly automatic) selection of the three ADMM penalty parameters, which can significantly affect the speed of convergence of the numerical solution scheme.