Generalized Structure Preserving Preconditioners for Frame-Based Image Deblurring

We are interested in fast and stable iterative regularization methods for image deblurring problems with space invariant blur. The associated coefficient matrix has a Block Toeplitz Toeplitz Blocks (BTTB) like structure plus a small rank correction depending on the boundary conditions imposed on the imaging model. In the literature, several strategies have been proposed in the attempt to define proper preconditioner for iterative regularization methods that involve such linear systems. Usually, the preconditioner is chosen to be a Block Circulant with Circulant Blocks (BCCB) matrix because it can be efficiently exploit Fast Fourier Transform (FFT) for any computation, including the (pseudo-)inversion. Nevertheless, for ill-conditioned problems, it is well known that BCCB preconditioners cannot provide a strong clustering of the eigenvalues. Moreover, in order to get an effective preconditioner, it is crucial to preserve the structure of the coefficient matrix. On the other hand, thresholding iterative methods have been recently successfully applied to image deblurring problems, exploiting the sparsity of the image in a proper wavelet domain. Motivated by the results of recent papers, we combine a nonstationary preconditioned iteration with the modified linearized Bregman algorithm (MLBA) and proper regularization operators. Several numerical experiments shows the performances of our methods in terms of quality of the restorations.


Introduction
In image deblurring we are concerned in reconstructing an approximation of an image from blurred and noisy measurements. This process can be modeled by an integral equation of the form where f : R 2 → R is the original image and g : R 2 → R is the observed imaged which is obtained from a combination of a convolution operator, represented by the convolution kernel κ : R 4 → R, and the add of some (unavoidable) noise η : R 2 → R coming from perturbations on the observed data, measurement errors and approximation errors, for example. By assuming the convolution kernel κ to be compactly supported and considering the ideal case η = 0 equation (1) becomes where K is a compact linear operator. In this contest, the convolution kernel κ is generally called point spread function (PSF) and if it is spatially invariant, as it is the case in many applications, then it assumes the following expression κ(x, y, x ′ , y ′ ) = κ(x − x ′ , y − y ′ ), with κ : R 2 → R.
Considering an uniform grid, images are represented by their color intensities measured on the grid (pixels). In this paper, for the sake of simplicity, we will deal only with square and gray-scale images, even if all the techniques presented here carry over to images of different sizes and colors as well.
Collected images are available only in a finite region, the field of view (FOV), and the measured intensities near the boundary are affected by data which lie outside the FOV. Denoting by g and f the stack ordered vectors corresponding to the observed image and the true image, respectively, the discretization of (1) by a rectangular quadrature rule with uniform grid (for example) leads to the under-determined linear system where the matrix K is of size m 2 × k 2 . The matrix K is often called the blurring matrix. When imposing proper Boundary Conditions (BCs), the matrix K becomes square m 2 × m 2 and in some cases, depending on the BCs and the symmetry of the PSF, it can be diagonalized by discrete trigonometric transforms. Indeed, specific BCs induce specific matrix structures that can be exploited to lessen the computational costs using fast algorithms. Of course, since BCs are artificially introduced, their advantages could come with drawbacks in terms of reconstruction accuracy, depending on the type of problem. The BCs approach forces a functional dependency between the elements of f external to the FOV and those internal to this area. If the BC model is not a good approximation of the real world outside the FOV, the reconstructed image can be severely affected by some unwanted artifacts near the boundary, called ringing effects; see, e.g., [23]. The choice of the different BCs can be driven by some additional knowledge on the true image and/or from the availability of fast transforms to diagonalize the matrix K within O(m 2 log(m)) arithmetic operations. Indeed, the matrix-vector product can be always computed by the 2D FFT, after a proper padding of the image to convolve, (see, e.g., [29]), while the availability of fast transforms to diagonalize the matrix K depends on the BCs. Among the BCs present in the literature, we consider the following ones, but our approach can be extended to other BCs like, e.g., synthetic BCs [1] or high order BCs [15,17].
• Zero (Dirichlet): the image outside the FOV is supposed to be null, i.e., zero pixel-valued. By using Zero BCs, the operator K turns to be a Block Toeplitz with Toeplitz Blocks (BTTB) matrix. We will use the symbol T to denote this class of matrices. The Zero BCs can be useful for some applications in astronomy, where an empty dark space surrounds a well located object. On the other hand, they give rise to high ringing effects close to the boundary of the restored image in other classical imaging applications, where the background is not uniformly black.
• Periodic: the image inside the FOV is periodically repeated outside the FOV. By using Periodic BCs the operator K turns to be Block Circulant with Circulant Blocks (BCCB) matrix. We will use the symbol C to denote this class of matrices. Periodic BCs are computational favorable since the matrix K can be easily diagonalized by Fast Fourier Transform (FFT), but in the restoration process they may also generate artifacts along the boundaries as well.  [27]. We will use the symbol R to denote this class of matrices.
• Anti-Reflective: in the Anti-Reflective BCs model instead, the image inside the FOV is anti-reflected outside the FOV so that other than the continuity of the image at the boundary even the continuity of the normal derivatives are preserved at the boundary. The corresponding blurring matrix K is a Block Toeplitz plus Hankel with Toeplitz plus Hankel blocks plus a low rank correction, which can be diagonalized by Anti-Reflective Transform (ART) when the PSF is symmetric; see [31]. We will use the symbol AR to denote this class of matrices.
See Figure 2 for an illustration of the described BCs. Both for Reflective and Anti-Reflective BCs a fast transform is available, but only if the PSF is quadrantally symmetric, i.e. symmetric in both horizontal and vertical direction. In all these cases, the matrix-vector product can be done in O(m log m) by FFT, using a proper pad of the vector in agreement with the BCs imposed and then performing a circulant convolution of double size. In some cases, Reflective and Anti-Reflective BCs are even cheaper, since they require only real operations instead of complex ones without needing any padding; see [2].
On the other hand, since equation (2) is the product of the discretization of a compact operator, K is severely ill-conditioned and may be singular. Such linear systems are commonly referred to as linear discrete ill-posed problems; see, e.g., [22] for a discussion. Therefore a good approximation of f cannot be obtained from the algebraic solution (e.g., the least-square solution) of (2), but regularization methods are required. The basic idea of regularization is to replace the original ill-conditioned problem with a nearby well-conditioned problem, whose solution approximates the true solution. One of the popular regularization techniques is Tikhonov regularization and it amounts in solving where · 2 denotes the vector 2-norm and µ > 0 is a regularization parameter to be chosen. The first term in (3) is usually refereed to as fidelity term and the second as regularization term. This approach is computationally attractive, since it leads to a linear problem and indeed several efficient methods have been developed for computing its solution and for estimating µ; see [22]. On the other hand, the edges of restored image are usually over-smoothed. To overcome this unpleasant property, nonlinear strategies have been employed, like total variation (TV) [30] and thresholding iterative methods [13,21]. Anyway, several nonlinear regularization methods have an inner step that apply a least-square regularization and hence can benefit from strategies previously developed for such simpler model. In the present paper, both the regularization strategies that we propose share two common ingredients: wavelet decomposition and ℓ 1 -norm minimization on the regularization term. This is motivated by the fact that most real images usually have sparse approximations under some wavelet basis. In particular, in this paper we consider the tight frame systems previously used in [9,11,12]. The redundancy of the tight frame system leads to robust signal representation in which partial loss of the data can be tolerated without adverse effects. In order to obtain the sparse approximation, we minimize the weighted ℓ 1 -norm of the tight frame coefficients. Let W * be a wavelet or tight-frame synthesis operator (W * W = I), the wavelets or tight-frame coefficients of the original image f are x such that f = W * x, and the blurring operator becomes A = KW * .
Within this frame set, the model equation (2) translates into If we require to deal with positive (semi-)definite matrices, instead of the system (4), we can solve the system of the normal equations where A * is the conjugate transpose of A. This choice allows us to use many iterative methods, such as the iterated version of (3), i.e., the iterated Tikhonov scheme, or the Conjugate Gradient (CG) and its generalizations. Moreover, all iterative methods, when are applied to the normal equations (5), become more stable, i.e. less sensitive with respect to data noise. Unfortunately, in solving (5) instead of (4), the rate of convergence slows down. In this respect, the conventional technique to speed up the convergence is to consider the preconditioned system where D is the so-called preconditioner, whose role is to suitably approximate the (generalized) inverse of the normal matrix A * A [28]. In [14] it was proposed a new technique that uses a single preconditioning operator directly applied to the system 4. The new preconditioner, called as reblurring matrix P , according to the terminology introduced in [19], leads to the new preconditioned system P g = P Ax.
As pointed out in [14], the aim of the preconditioner P is to allow iterative methods to become more stable (as well as usually obtained through the normal equations involving A * ) without slowing the convergence (so that no subsequent accelerating operator D is needed), especially in the so-called signal space, i.e. the subspace less sensitive to the data noise. Combining this approach with a soft-thresholding technique such as the modified linearized Bregman splitting algorithm [33], in order to mimic the ℓ 1 -norm minimization, leads to reformulate iterative methods as the Landweber method replacing the following preconditioned iterative scheme where τ is a positive relaxation parameter and S µ (·) is the soft-thresholding function as defined in 7. In the following we fix τ = 1, by applying an implicit rescaling of the preconditioned system matrix P A. The paper is organized as follows: in Section 1 we propose a generalization of an approximated iterative Tikhonov scheme that was firstly introduced in [18] and then developed and adapted into different settings in [5,10]. Here the preconditioner P takes the form where B is an approximation of A, in the sense that B = CW * with C the discretization of the same problem (1) as the original blurring matrix K but imposing Periodic BCs. The operator ΛΛ * can be a function of CC * or the discretization of a differential operator. The method is nonstationary and the parameter α n is computed by solving a nonlinear problem with a computational cost of O(m 2 ). Related work on this kind of preconditioner can be found in [6,7,24]. In Section 2 we define a class of preconditioners P endowed with the same structure of the system matrix A, as initially proposed in [16] and then further developed in [3]. It is called structure preserving reblurring preconditioning strategy and we combine it with the generalized regularization filtering approach of the preceding Section 1. The idea is to preserve both the informations carried over by the spectra of the operator A and the structure itself of the operator induced by the best fitting BCs. Section 3 contains a selection of significant numerical examples which confirm the robustness and quality of the proposed regularization schemes. Section 4 provides a resume of the techniques presented in this work and draws some conclusions. Finally, in Appendix A are provided proofs of convergence and regularization properties of the proposed algorithms.

Preliminary definitions
Before proceeding further, let us introduce here some definitions and notations that will be used even in the forthcoming sections. We consider to be the discretization of a compact linear operator where the Euclidean 2-norm · is induced by the standard Euclidean inner product Hereafter, we will specify the vector space where the inner product ·, · acts only whenever it is necessary for disambiguation. The analysis that will follow in the next sections will be performed generally on a perturbed data g δ , namely with g δ = g + η, and where η is a noise vector such that η = δ, δ is called the noise level.
be the discretization of a compact linear operator that approximates A, in a sense that will be specified later. Let Let us introduce the following matrix norm. Given a generic linear operator where · ∞ is the sup norm, let us define the matrix norm |||·||| as Finally, let µ ≥ 0 and let S µ : R s → R s be such that with S µ the soft-thresholding function

General regularization operator as h(CC * )
Let h : [0, CC * 2 ] → R be a continuous function such that Define c := c 1 /c 2 . We can now introduce the following algorithm.
Compute α n such that Compute Compute A rigorous and full detailed analysis of the preceding algorithm will be performed in Appendix A. In order to prove all the desired properties we will need a couple of assumptions on the operators K , C, and on the parameter µ, that we present here below. and with a fixed 0 < ρ < c/2, where δ = η is the noise level and where |||·||| is the operator norm defined in (6).
Let us observe that Equation (10a) translates into Let us spread some light on the preceding conditions. Assumption (10a), or equivalently (11), is a strong assumption. It may be hard to satisfy it for every specific problem, as it implies or equivalently that is, K and C are spectrally equivalent. Nevertheless, in image deblurring the boundary conditions have a very local effect, i.e., the approximation error C − K can be decomposed as where E is a matrix of small norm (and the zero matrix if the PSF is compactly supported), and R is a matrix of small rank, compared to the dimension of the problem. This suggests that Assumption (10a) needs to be satisfied only in a relatively small subspace, supposedly being a zero measure subspace. In particular only for every e n δ , with n ≥ N and N fixed, such that Proposition 4 could hold. All the numerical experiments are consistent with this observation but for a deeper understanding and a full treatment of this aspect we refer the reader to [18,Section 4].
On the other hand instead, Assumption (10b) is quite natural. It is indeed equivalent to require that B (u − S µ (u)) ≤ ρδ that is, the soft-thresholding parameter µ = µ(δ) is continuously noise-dependent and it holds that µ(δ) → 0 as δ → 0.

General regularization operator as ΛΛ *
In image deblurring, in order to better preserve the edges of the reconstructed solution, it is usually introduced a differential operator ΛΛ * , where Λ : X → Y is chosen as a first or second order differential operator which holds in its kernel all these functions which posses the key features of the true solution that we wish to preserve. In particular, since we are interested to recover the edges and curves of discontinuities of the true image, it is a common choice to rely on the Laplace operator with Neumann BCs, see [20]. In these recent papers [4,26], observing the spectral distribution of the Laplacian, it was proposed to substitute ΛΛ * with Adding some new assumptions, we propose a modified version of the preceding Algorithm 1 that can take into account directly the operator Λ.
Compute α n such that α n (CC * + α n ΛΛ * ) −1 r n = q n r n . Compute Compute We skip all the proofs of convergence since they can be recovered easily adapting the proofs in Section A with [5, Section 4].

Structured PISTA with general regularizing operator
The structured case is a generalization of what developed in [3,16], merging these ideas with the general approach described in Section 1. We skip some details since they can be easily recovered from the aforementioned papers.
The creation of the blurring matrix K is based on two ingredients: the PSF and the BCs enforced in the discretization. As already sketched in the Introduction, the latter choice gives rise to different types of structured matrices. For notational simplicity we consider a square PSF H κ ∈ R k×k and we suppose that the position of the PSF center is known.
Given the pixels κ i,j of the PSF, it is possible to associate the so-called generating function κ : R 2 → C as follows whereî 2 = −1 and with the assumption that κ i,j = 0 if the element (κ i,j ) does not belong to H κ [17]. Note that κ j,j are the Fourier coefficients of κ ∈ span{eî (ix 1 +jx 2 ) , i, j = −k, . . . , k}, so that the generating function κ contains the same information of H.
Summarizing the notation that we set in the Introduction about the BCs, we have Zero BCs: We notice that in all these four cases K has a Toeplitz structure T m (κ) which depends on κ and given by the shift-invariant structure of the continuous operator, plus a correction term B X m (κ), X = C, R, AR depending on the chosen BCs.
In conclusion, we employ the unified notation K = M m (κ), where M(·) can be any of the classes of matrices just introduced (i.e. T , C, R, AR). This notation highlights the two crucial ingredients that form K: the blurring phenomena associated with the PSF described by κ and the involved BCs represented by M.
Given the generating function κ (15) associated to the PSF H κ , let us compute the eigenvalues u i,j of the corresponding BCCB matrix C m (κ) := C by the means of a 2D-FFT, where i, j = 0, · · · , m − 1. Fix a regularizing (differential) operator ΛΛ * as in Section 1, and suppose that the Assumptions 1 and 2 holds. The differential operator can be of the form ΛΛ * = h(CC * ), as in Algorithm 1 as well. Let now be the new eigenvalues after the application of the Tikhonov filter to u i,j , where σ i,j are the eigenvalues (singular values) of Λ and α n is computed as in Algorithm 1-2. Let us compute now the coefficientsκ i,j ofκ by the means of a 2D-iFFT and, finally, let us define where M(·) corresponds to the most fitting BCs for the model problem (1). We are ready now to formulate the last algorithm.
In the case that ΛΛ * = h(CC * ), then the algorithm is modified in the following way: ρ ∈ (0, c/2), q ∈ (2ρ, c), α n (CC * + α n ΛΛ * ) −1 r n = q n c 1 r n where 0 < c 1 ≤ h(σ 2 ) ≤ c 2 , c := c 1 /c 2 . We will denote this version by Struct-PISTA h . We will not provide a direct proof of convergence for this last algorithm. Let us just observe that the difference between (17) and (14)- (9) is just a correction of small rank and small norm.
Compute v i,j = u i,j |u i,j | 2 +αn|σ i,j | 2 . Get the mask H of the coefficientsκ i,j ofκ of (16) by computing an IFFT of {v i,j } m−1 i,j=0 . Generate the matrix P := M m (κ) from the coefficient mask H and BCs. Compute Compute

Numerical experiments
We now compare the proposed algorithms with some methods from the literature. In particular, we consider the AIT-GP algorithm described in [5] and the ISTA algorithm described in [13]. The AIT-GP method can be seen as Algorithm 2 with µ = 0, while the ISTA algorithm is equivalent to iterations of Algorithm 2 without the preconditioner. These comparisons allow us to show how the quality of the reconstructed solution is improved by the presence of both the soft-thresholding and the preconditioner. The ISTA method and our proposals require the selection of a regularization parameter. For all these methods we select the parameter that minimizes the relative restoration error defined by For the comparison of the algorithms we consider the Peak Signal to Noise Ratio (PSNR) defined by where m 2 is the the number of elements of f and M denotes the maximum value of f true . Moreover, we consider the Structure SIMilarity index (SSIM), the definition of the SSIM is involved, here we recall that this index measures how accurately the computed approximation is able to reconstruct the overall structure of the image. The higher the value of the SSIM the better the reconstruction is, and the maximum value achievable is 1; see [32] for a precise definition of the SSIM. We now describe how we construct the operator W . We use the tight frames determined by linear B-splines; see, e.g., [8]. For one-dimensional problems they are composed by a low-pass filter W 0 ∈ R m×m and two high-pass filters W 1 ∈ R m×m and W 2 ∈ R m×m . These filters are determined by the masks are given by Imposing reflexive boundary conditions we determine the analysis operator W so that W * W = I. Define the matrices Then the operator W is defined by To construct the two-dimensional framelet analysis operator we use the tensor products The matrix W 00 is a low-pass filter; all the other matrices W ij contain at least one high-pass filter. The analysis operator is given by In PISTA h , following [26], we set All the computations are performed on MATLAB R2018b running on a laptop with an Intel i7-8750H @2.20 GHz CPU and 16GB of RAM.
Cameraman We first consider the cameraman image in Figure 3(a) and we blur it with the non-symmetric PSF in Figure 3(b). We then add 2% white Gaussian noise obtaining the blurred and noisy image in Figure 3(c). Note that we crop the boundaries of the image to simulate real data; see [23] for more details. Since the image is generic we impose reflexive BCs.
In Table 3 we report the results obtained with the different methods. We can observe that Struct-PISTA h provides the best reconstruction of all considered algorithms. Moreover, we can observe that, in general, the introduction of the structured preconditioner improves the quality of the reconstructed solutions, especially in term of SSIM. From the visual inspection of the reconstructions in Figure 4 we can observe that the introduction of the structured preconditioner allows us to evidently reduce the boundary artifacts as well as avoid the amplification of the noise.
Grain We now consider the grain image in Figure 5(a) and blur it with the PSF, obtained by the superposition of two motions PSF, in Figure 5(b). After adding 3% of white Gaussian noise and cropping the boundaries we obtain the blurred and noisy image in Figure 5(c). According to the nature of the image we use reflexive bc's.
Again in Table 3 we report all the results obtained with the considered methods. In this case ISTA provides the best reconstruction in terms of RRE and PSNR. However, Struct-PISTA h provides the best reconstruction terms of SSIM and very similar results in term of PSNR and RRE. In Figure 6 we report some of the reconstructed solution. From the visual inspection of these reconstruction we can see that the introduction of the structured preconditioner reduces the ringing and boundary effects in the computed solutions. Satellite Our final example is the atmosphericBlur30 from the MATLAB toolbox Restore-Tool [29]. The true image, PSF, and blurred and noisy image are reported in Figures 7(a), (b), and (c), respectively. Since we know the true image we can estimate the noise level in the image, which is approximately 1%. Since this is an astronomical image we impose zero bc's.
From the comparison of the computed results in Table 3 we can see that the Struct-PISTA h method provides the best reconstruction among all considered methods. We can observe that, in this particular example, ISTA provides a very low quality reconstruction both in term of RRE and SSIM. We report in Figure 8 some reconstructions. From the visual inspection of the computed solutions we can observe that both the approximations obtained with PISTA h and Struct-PISTA h do not present heavy ringing effects, while the reconstruction obtained by AIT-GP presents very heavy ringing around the "arms" of the satellite. This allows us to show the benefits of introducing the soft-thresholding into the AIT-GP method.

Conclusions
This work develops further and brings together all the techniques studied in [3,5,10,16,25]. The idea is to combine thresholding iterative methods, an approximate Tikhonov regularization scheme depending on a general (differential) operator and a structure preserving approach, with the main goal in mind to reduce the boundary artifacts which appear in the resulting de-blurred image when imposing artificial boundary conditions. The numerical results are promising and show improvements with respect to known state-of-the-art deblurring algorithms. There are still open problems, mainly concerning the theoretical assumptions and convergence proofs which will be furtherly investigated in future works.

A Proofs
Hereafter we analyze Algorithm 1, aiming to prove its convergence. Most of the following results are a collection of revised results that appeared in [5,10,18]. Since the proofs are very technical, we will present a full treatment leaving no details, in order to make this paper self-contained and easily readable. Following up Section 1.1, we need to set some more notations. Let us consider the singular value decomposition (SVD) of C as the triple (U, V, Σ) such that where O(m 2 , R) is the orthonormal group and V * is the adjoint of the operator V , i.e., V f 1 , f 2 = f 1 , V * f 2 for every pair f 1 , f 2 ∈ R m 2 . We will indicate the spectrum of CC * with Hereafter, without loss of generality we will assume that C = 1 and h(CC * ) = max 1] h(σ 2 ) = 1.
The first issue we have to consider is the existence of the sequence {α n }.

Lemma 3.
Let r n > τ δ. Then for every fixed n there exists α n that satisfies (8). It can be computed by the following iteration where Φ(α) := α(CC * + αh (CC * )) −1 r n 2 , The convergence is locally quadratic. The existence of the regularization parameter α n and the locally quadratic convergence of the algorithm above are independent and uniform with respect to the dimension m 2 .
Proof. The existence of α is an easy consequence of the monotonicity of φ α (σ 2 ) = α σ 2 + αh(σ 2 ) −1 with respect to α. Indeed, let us rewrite (8) as follows where d r n (·) is the discrete spectral measure associated to r n with respect to the SVD of C and σ ∈ σ(C) are the singular values of the spectrum of C. Since dφα dα > 0 for every α ≥ 0, then by monotone convergence it holds that Indeed, it is not difficult to prove that q n /c 1 < 1 whenever ρ ∈ (0, c 1 /2) and r n > τ δ, as assumed in the hypothesis. Since for α = 0 the left hand-side of (19) is zero, then we conclude that there exists an unique α n > 0 such that equality holds in (8). Due to the generality of our proof and the fact that we could pass the limit under the sign of integral, the existence of such an α n is granted uniformly with respect to the dimension m 2 . Since fixing γ = α −1 , let us now define the following function Since then there exists two constants d 1 , d 2 independents of γ such that and in particular d 1 , d 2 ∈ L 1 ([0, 1], d r n ) for every n and m. Therefore, if we define Ψ(γ) := ψ γ (CC * )r n 2 , it holds that Then the Newton iteration applied to Ψ(γ) = q 2 n r n 2 yields the iteration By (20), Ψ(γ) is a decreasing convex function in γ. Since γ n = lim k→∞ γ k+1 n = 1/α n , obviously we have that If then necessarily we would have that CC * r n = 0. Hence, (γ n CC * + h(CC * )) −1 r n = h(CC * )r n , and consequently Ψ(γ n ) = h(CC * )r n 2 .
From (23) we would deduce that q n ≥ c 1 , but this is absurd since as already observed above, q n < c 1 if r n > τ δ. Therefore Ψ ′ (γ n ) = 0 and by standard properties of the Newton iteration, γ k n converges to the minimizer γ n from below and the convergence is locally quadratic. Finally, defining Φ(α) = Ψ(1/α), then we get (18), α k n converges monotonically from above to α n and the convergence is locally quadratic. Again, thanks to (21) and (22), the rate of convergence is uniform with respect to the dimension of Y.
From now on, instead of working with the error e n δ = x−x n δ , in order to simplify the following proofs and notations, it is useful to consider the partial error with respect to z n δ , namelỹ This will not affect the generality of our proofs, thanks to the continuity of S µ (·) with respect to the noise level δ.
Combining the preceding proposition with (8), we are going to show that the sequence ẽ n δ is monotonically decreasing. We have the following result.

Corollary 6.
Under the assumptions (10), there holds for some constant c > 0, depending only on ρ and q in (8).
Proof. The first inequality follows by taking the sum of the quantities in (26) from n = 0 up to n = n δ − 1.
Finally, we are ready to prove a convergence and regularity result.
Theorem 7. Assume that z 0 is not a solution of the linear system g = AW * x, (29) and that δ m is a sequence of positive real numbers such that δ k → 0 as k → ∞. Then, if Assumption 1 is valid, the sequence {x n(δ k ) δ k } k∈N , generated by the discrepancy principle rule (28), converges as k → ∞ to the solution of (29) which is closest to z 0 in Euclidean norm.
Proof. We are going to show convergence for the sequence {z n(δ k ) δ k } k∈N and then the thesis will follow easily from the continuity of S µ(δ) , i.e., The proof of the convergence for the sequence {z n(δ k ) δ k } can be divided into two steps: at step one, we show the convergence in the free noise case δ = 0. In particular, the sequence {z n } converges to a solution of (29) that is the closest to z 0 . At the second step, we show that given a sequence of positive real numbers δ k → 0 as k → ∞, then we get a corresponding sequence {z n(δ k ) δ k } converging as k → ∞.
Step 1: Fix δ = 0. It follows that r n δ = r n , and the sequence {z n } will not stop, i.e., n → ∞, since the discrepancy principle will not be satisfied by any n, in particular n δ → ∞ for δ → 0. Set n > l > j, with n, l, j ∈ N. It holds that z n − z l 2 = ẽ n −ẽ l 2 = ẽ n 2 − ẽ l 2 − 2 ẽ l ,ẽ n −ẽ l = ẽ n 2 − ẽ l 2 + 2 ẽ l , z n − z l = ẽ n 2 − ẽ l 2 + 2 where the last inequality comes from (12). At the same time, we have that due to the continuity of the operator g → z n for every fixed n. Therefore, let us choose k = k(ǫ) large enough such that δ k < δ and such that n(δ k ) > n for every k > k. Such k does exists since δ k → 0 and n δ → ∞ for δ → 0. Hence, for every k > k, we have where the first inequality comes from Proposition 5 and the last one from (32) and (33).