A note on the convergence of multigrid methods for the Riesz-space equation and an application to image deblurring

In the past decades, a remarkable amount of research has been carried out regarding fast solvers for large linear systems resulting from various discretizations of fractional differential equations (FDEs). In the current work, we focus on multigrid methods for a Riesz-space FDE whose theoretical convergence analysis of such multigrids is currently limited to the two-grid method. Here we provide a detailed theoretical convergence study in the case of V-cycle and W-cycle. Moreover, we discuss its use combined with a band approximation and we compare the result with both $\tau$ and circulant preconditionings. The numerical tests include 2D problems as well as the extension to the case of a Riesz-FDE with variable coefficients. Finally, we apply the best-performing method to an image deblurring problem with Tikhonov regularization.


Introduction
In the present work, we are interested in the fast numerical solution of special large linear systems stemming from the approximation of a Riesz fractional diffusion equation (RFDE).It is known that using standard finite difference schemes on equispaced gridding, in the presence of one-dimensional problems we end up with a dense real symmetric Toeplitz linear system, whose dense structure is inherited from the non-locality of the underlying symmetric operator.We study a one-dimensional RFDE discretized with a numerical scheme with precision order one mainly for the simplicity of the presentation.Nevertheless, the same analysis can be carried out either when using higher-order discretization schemes [1,2] or when dealing with more complex higher-dimensional fractional models [3,4].
While the matrix-vector product involving dense Toeplitz matrices has a cost of O(M log M ) with a moderate constant in the "big O" and with M being the matrix size, the solution can be of higher complexity in the presence of ill-conditioning.In connection with the last issue, it is worth recalling that the Toeplitz structure carries a lot of spectral information (see [5,6] and references therein).As shown in [7,2] the Euclidean conditioning of the coefficient matrices grows exactly as M α .Therefore the classical conjugate gradient (CG) method would need a number of iterations exactly proportional to M α/2 to converge within a given precision.
In this perspective, a large number of preconditioning strategies for the preconditioned CG (PCG) have been proposed.One possibility is given by band-approximations of the coefficient matrices, which as discussed in [7,8] does not ensure a convergence within a number of iterations independent of M , but still prove to be efficient for some values of the fractional order and whose cost stays linear within the matrix-size.As an alternative, the circulant proposal given in [9] has a cost proportional to the matrix-vector product and ensures convergence in the presence of 1D problems that do not include variable coefficients.Another algebraic preconditioner proven to be optimal consists in the τ preconditioning explored in [10,11].In a multi-dimensional setting, the multilevel circulants are not well-suited [12], while the τ preconditioning still shows optimal behavior.For other effective preconditioners, we refer the reader to [13] where the proposed preconditioner is based on a rational approximation of the Riesz operator.
Other approaches to the efficient solution of the ill-conditioned linear systems we are interested in include multigrid methods.Optimal multigrid proposals for fractional operators are given e.g. in [14,15,16,8].However, a theoretical convergence analysis of such multigrids is currently limited to the two-grid method.Here we provide a detailed theoretical convergence study in the case of V-cycle and W-cycle.Moreover, as multigrid methods are often applied as preconditioners themselves, and often to an approximation of the coefficient matrix instead of to the coefficient matrix itself, on the same line of what has been done in [8], we discuss a band approximation to be solved with a Galerkin approach which guarantees linear cost in M instead of O(M log M ) and that inherits the V-cycle optimality.We compare the resulting methods with both τ and circulant preconditioning and apply the best performing strategy to an image deblurring problem with Tikhonov regularization.
The paper is organized as follows.In Section 2 we present the problem, the chosen numerical approximation, and the resulting linear systems.The main spectral features of the considered coefficient matrices are also recalled.In Section 3 we describe the essentials of multigrid methods and give the description of our proposal and its analysis, while in Section 4 we discuss its use applied to a band approximation of the coefficient matrix.Section 5 contains a wide set of numerical experiments that compare the multigrid with state-of-the-art preconditioners and include 2D problems as well as the extension to the case of an RFDE with variable coefficients.Section 6 is devoted to applying the best preconditioning strategy to an image deblurring problem.Future research directions and conclusions are given in Section 7.

Preliminaries
The current section briefly introduces the continuous problem, its numerical approximation scheme, and the resulting linear systems.The main spectral features of the considered coefficient matrices are recalled as well.
As anticipated in the introduction, we consider the following one-dimensional Riesz Fractional Diffusion Equation (1D-RFDE) given by with boundary conditions where d > 0 is the diffusion coefficient, m(x) is the source term, and ∂ α u( x) ∂|x| α is the 1D-Riesz fractional derivative for α ∈ ( 1, 2), defined as For the approximation of the problem, we define the uniform space partition on the interval [ a, b], that is Here the left and right fractional derivatives in (2.3) are numerically approximated by the shifted Grünwald difference formula as being the alternating fractional binomial coefficients, whose expression is By employing the approximation in (2.5) to equation in (2.1) and using (2.3) and (2.6), we obtain the required scheme as follows By defining u = [ u 1 , u 2 , . . ., u M ] T and m = [ m 1 , m 2 , . . ., m M ] T , the resulting approximation equations (2.7) can be written in matrix form as and, by defining with being a lower Hessenberg Toeplitz matrix.The fractional binomial g (α) k coefficients satisfy few properties summarised in the following Lemma 2.1 (see [3,17,18]).Lemma 2.1.For 1 < α < 2, the coefficients g (2.10) Using Lemma 2.1, it has been proven in [18] that the coefficient matrix A α M is strictly diagonally dominant and hence invertible.To determine its generating function and study its spectral distribution, tools on Toeplitz matrices have been used (see [7]).A short review of these results follows.Definition 1. [5] Given f ∈ L 1 (−π, π) and its Fourier coefficients we define T M (f ) ∈ C M ×M the Toeplitz matrix generated by f .In addition, the Toeplitz sequence {T M (f )} M ∈N is called the family of Toeplitz matrices generated by f .
For instance, the tridiagonal Toeplitz matrix associated with a finite difference discretization of the Laplacian operator is generated by Note that for Toeplitz matrices T M (f ), M ∈ N, as in (2.12), to have a generating function associated to the whole Toeplitz sequence, we need that there exists f ∈ L 1 (−π, π) for which the relationship (2.11) holds for every k ∈ Z.
In the case where the partial Fourier sum a k e ιkx converges in infinity norm, f coincides with its sum and is a continuous 2π periodic function, given the Banach structure of this space equipped with the infinity norm.A sufficient condition is that , the generating function belongs to the Wiener class, which is a closed sub-algebra of the continuous 2π periodic functions.Now, according to (2.9), we define Hence the partial Fourier sum associated to M is symmetric, Toeplitz, and is generated by the following real-valued even function on [−π, π], defined in the following lemma.
By using the extremal spectral properties of Toeplitz matrix sequences (see e.g.[20,21,22]) and by combining it with Lemma 2.2, for α ∈ (1, 2), the obtained spectral information can be summarized in the following items (see again [7]): ) is monotonic strictly increasing converging to r = max f α , as M tends to infinity; ) is monotonic strictly decreasing converging to 0 = min f α , as M tends to infinity; • by taking into account that the zero of the generating function f α has exactly order α ∈ (1, 2], it holds that , where, for nonnegative sequences α n , β n , the writing α n ∼ β n means that there exist real positive constants 0 As a consequence of the third and of the fourth items reported above, the Euclidean conditioning of h α A α M grows exactly as M α .Therefore, classical stationary methods fail to be optimal, in the sense that we cannot expect convergence within a given precision, with a number of iterations bounded by a constant independent of M : for instance, the Gauss-Seidel iteration would need a number of iterations exactly proportional to M α .In addition, even the Krylov methods will suffer from the latter asymptotic ill-conditioning.In reality, looking at the estimates by Axelsson and Lindskög [23], O(M α/2 ) iterations would be needed by a standard CG for reaching a solution within a fixed accuracy.In the present context of a zero of noninteger order, even the preconditioning by using band-Toeplitz matrices, will not ensure optimality as discussed in [7] and detailed in Section 4. On the other hand, when using circulant or the τ preconditioning, the spectral equivalence is guaranteed also for a noninteger order of the zero at x = 0 if the order is bounded by 2 [9,10,11].In the latter case, the distance in rank is less with respect to other matrix-algebras as emphasized in [24].In a multi-dimensional setting, only the τ preconditioning leads to optimality using the PCG under the same constraint on the order of the zero, while multilevel circulants are not well-suited [12].Finally, since we consider also problems in imaging we recall that the τ algebra emerges also in this context when using high precision boundary conditions (BCs) such as the anti-reflective BCs [25,26].

Multigrid methods and level independency
Multigrid methods (MGMs) have already proven to be effective solvers as well as valid preconditioners for Krylov methods when numerically approaching FDEs [15,8,14].A multigrid method combines two iterative methods called smoother and coarse grid correction.The two basic procedures when carefully chosen are spectrally complementary and in this sense, multigrid procedures are always of multi-iterative type [27].The coarser matrices can either be obtained by projection (Galerkin approach) or by rediscretization (geometric approach).In the case where only a single coarser level is considered we obtain Two Grids Methods (TGMs), while in the presence of more levels, we have multigrid methods, in particular, V-cycle when only a recursive call is performed and W-cycle when two recursive calls are applied.
Deepening what has been done in [7,15], in this section, we investigate the convergence of multigrid methods in Galerkin form for the discretized problem (2.8).
To apply the Galerkin multigrid to a given linear system where K k is the down-sampling operator and p k is a properly chosen polynomial.The coarser matrices are then If at the recursive level k, we simply apply one post-smoothing iteration of a stationary method having iteration matrix S k the TGM iteration matrix at the level k is given by The following theorem states the convergence of TGM at a generic recursion level k.
Theorem 3.1 (Ruge-Stüben [28]).Let A M k be a symmetric positive definite matrix, D k the diagonal matrix having as main diagonal that of A M k , and S k be the post-smoothing iteration matrix.Assume that and ∃ δ k > 0 such that min ) The inequalities in equations (3.3) and (3.4) are well known as the smoothing property and approximation property, respectively.To prove the multigrid convergence (recursive application of TGM) it is enough to prove that the assumptions of Theorem 3.1 are satisfied for each k = 0, 1, . . ., ℓ − 1.On the other hand, to guarantee a W-cycle convergence in a number of iterations independent of the size M [29], we have to prove that the following level independency condition lim inf holds with β independent of k.
Concerning (multilevel) Toeplitz matrices, multigrid methods have been extensively investigated in the literature [30,31,32].While the smoothing property can be easily carried on for a simple smoother like weighted Jacobi, the approximation property usually requires a detailed analysis based on a generalization of the local Fourier analysis, see [33].
If we now apply the multigrid with Galerkin approach to the solution of (2.8) and refer to the matrices at the coarser level k as A α M k , they are still Toeplitz and, based on the theoretical analysis developed in [34,35] for τ matrices, are associated to the following symbols Since the function f 0,α = f α has only a zero at the origin of order smaller or equal to 2, according to the convergence analysis of multigrid methods in [34,35], we define the symbols of the projectors P k in (3.1) as We observe that the symbols f k,α at each level are nonnegative and they vanish only at the origin with order α thanks to [ with γ independent of k.
For the smoothing property (3.3), in order to ensure σ k > σ > 0 for all k, will be crucial the choice of C k+1,α as a constant such that f k+1,α ( π) = f k,α ( π).
In conclusion, to prove the level independency condition (3.5), we need a detailed analysis of the symbol f k+1,α , which is performed in the next subsection.

Behaviour of symbols f k,α
The analysis of f k+1,α in (3.6) requires a detailed study of the constant C k+1,α in p k,α chosen imposing the condition According to the classical analysis, in the case of discrete Laplacian, i.e., l(x) = 2 − 2 cos x, the value of C k+1,2 at each level is √ 2.
Since the symbol p k,2 of the projector is the same at each level, by recursive relation we obtain the required result f ℓ,2 (x) = . . .= f 1,2 (x) = f 0,2 (x) = l(x). where converges to 4 as k diverges, then we have as can be seen in Figure 1.Moreover, the convergence of C k,α is quite fast, especially for α close to 2.
We observe that the sequence of the coarser symbols satisfies for any α ∈ (1, 2), as can be seen in Figure 2. Combining this fact with for any α ∈ (1, 2), we obtain that the ill-conditioning of the coarser linear systems decreases moving on coarser grids.

Smoothing Property
If weighted Jacobi is used as smoother, then the smoothing property (3.3) is satisfied whenever the smoother converges [31,28].Since the matrix A α M k is symmetric positive definite, the weighted Jacobi method 0 is the Fourier coefficient of order zero of f 0,α , and thus it holds since a then there exist σ k such that the smoothing property in (3.3) holds true.
Moreover, choosing it holds . Moreover, for ω k chosen as in (3.15), the following condition in the proof of Lemma 4.2 in [15] is satisfied for all σ k ≥ σ in (3.16).
From the previous lemma, following the same idea as given in [37], we propose to use the following Jacobi parameter which provides a good convergence rate as confirmed in the numerical results in Section 5.

Approximation property and level independency
According to Remark 1, the uniform boundness of δ k is ensured by proving equation (3.9).
First of all, we prove that for k ∈ N 0 , the sequence of functions g k,α (x) = f k,α (x) is monotonic increasing in x.Recalling the expression of p k,α in (3.7), differentiating g k,α we have and hence it is equivalent to proving that is nonnegative for all k ∈ N 0 .This result can be proved by induction on k.
For k = 0, replacing the expression of f 0,α (see equation (2.14)) in g k,α , by direct computations we obtain Assuming that for k = n it holds g n,α (x) ≥ 0, ∀ x ∈ [0, π], we have to prove that the same is true for k = n + 1.
Applying the recurrence (3.6) to f n+1,α , we have where p(x) = 1 + cos(x), as defined in (3.7), and Thanks to the inductive assumption and direct computation, we obtain the desired result g n+1,α (x) ≥ 0, In conclusion, by the monotonicity of g k,α (x), for all k ≥ 0, we have max and since the sequence C k,α is bounded thanks to equation (3.12) (see also Figure 1), we have that {γ k } k , and hence {δ k } k , is bounded.
Combining the previous result on δ k with equation (3.16) for the smoothing analysis, it holds (3.5), i.e., the level independency is satisfied.

Band Approximation
Multigrid methods are often applied as preconditioners for Krylov methods, and often to an approximation of A α M instead of at the original coefficient matrix.On the same line as what has been done in [8], in this section, we discuss a band approximation of A α M to combine with the multigrid method discussed in the previous section.The advantage of using a band matrix is in the possibility of applying the Galerkin approach with a linear cost in M instead of O(M log M ) and inheriting the V-cycle optimality from the results in [35].
Starting from the truncated Fourier sum of order s of the symbol f α we consider as band approximation of A α M the associated Toeplitz matrix s B α M = T M (g s ) explicitly given by . . .g 2) The emerging banded matrix is defined as As already discussed in [15,8], concerning the application of our multigrid method to a linear system with the coefficient matrix s Ãα M , we have the following feature thanks to the results in [35,32]: • The computation of the coarser matrices by the Galerkin approach (3.2), using the projector defined in (3.1), preserves the band structure at the coarser levels, see [32,Proposition 2].
• If the function g s is nonnegative, then the V-cycle is optimal, i.e., has a constant convergence rate independent of the size M and a computational cost per iteration proportional to O(sM About the spectral features of preconditioned Toeplitz matrices with Toeplitz preconditioners, we recall the well-known results of both localization and distribution type in the sense of Weyl in the following theorem (see [38,5]).
Theorem 4.1.Let f be real-valued, 2π-periodic, and continuous and g be nonnegative, 2π-periodic, continuous, with max g > 0. Then • T M (g) is positive definite for every M ; distributed in the Weyl sense as f, g, f g , respectively, i.e. for all F in the set of continuous functions with compact support over C it holds In other words, a good Toeplitz preconditioner for a Toeplitz matrix generated by f should be generated by g such that f g is close to 1 in a certain metric, for instance in L ∞ or in L 1 norm.About the band preconditioner in (4.2), Figure 4 shows the graphs of g = g s and f = f α for two different values of α, while δ s = f α − g s are depicted in Figure 5, for some values of s.Obviously, the quality of the approximation grows increasing s, but it is good enough already for a small s, in particular when α approaches the value 2 (see the scale of the y−axis in Figure 5).More importantly, in the light of Theorem 4.1, Figure 6 depicts the functions κ s − 1 with κ s = fα gs .This quantity measures the relative approximation, which is the one determining the quality of the preconditioning since (κ s − 1, [0, π]) is the distribution function of the shifted preconditioned matrix-sequence in the sense of Weyl and in accordance again with Theorem 4.1.Note that the function κ s − 1 is almost zero except around the zero of f α at the origin.

Numerical Examples
In this section, we present some numerical examples, to verify the effectiveness of the MGM used both as a solver and preconditioner.In what follows: • V ν2 ν1 consists of a Galerkin V-cycle with linear interpolation as grid transfer operator and ν 1 and ν 2 iterations of pre and post-smoother weighted Jacobi, respectively; • P C and P S represent the Chan [39] and Strang circulant preconditioner, respectively; ν1 denotes the banded preconditioner defined as in equation (4.2) and inverted using MGM with Galerkin approach V ν2 ν1 ; Figure 6: Plots of κ s − 1.
• P V ν2 ν1 and P V ν2 ν1 are multigrid preconditioners, both as Galerkin and Geometric approach, respectively.The aforementioned preconditioners are combined with CG and GMRES computationally performed using built-in pcg and gmres Matlab functions.The stopping criterion is chosen as where ∥ • ∥ denotes the Euclidean norm, r k is the residual vector at the k-th iteration.The initial guess is fixed as the zero vector.All the numerical experiments are run by using MATLAB R2021a on HP 17-cp0000nl computer with configuration, AMD Ryzen 7 5700U with Radeon Graphics CPU and 16GB RAM.
Example 1.Consider the following 1D-RFDE given in equation (2.1), with Ω = [ 0, 1 ] and the source term built from the exact solution u(x) = x 2 ( 1 − x ) 2 . (5.1) Table 1 shows the number of iterations of the proposed multigrid methods and preconditioners for three different values of α.
Regarding multigrid methods, the observations are as follows: • The Galerkin approach is robust, even as V-cycle, in agreement with the previous theoretical analysis.
• The geometric approach is robust only when α > 1.5.
• The robustness of the geometric multigrid can be improved by using it as a preconditioner, as seen in the column P V 1 1 .In particular, it is even more robust than the band multigrid preconditioner P s V 1 1 .When comparing the circulant preconditioners, the Strang preconditioner P S is preferable to the optimal Chan preconditioner P C .
Using the shifted Grünwald difference formula for both fractional derivatives in x and y leads to the matrixvector form with the following (diagonal times two-level Toeplitz structured) coefficient matrix  The τ based preconditioner of the resulting linear system (5.4) is Table 2 presents the number of iterations and CPU times required by the proposed multigrid methods and preconditioners.The τ preconditioner P τ stands out as the most robust choice, both in terms of iteration number and CPU time.The geometric multigrid used as a preconditioner is also a good option, although the robustness of multigrid methods declines in the case of anisotropic problems, i.e., when (α, β) = (1.7,1.9).In such a case, it is advisable to adopt the strategies proposed in [8].Due to the nonsymmetry of the coefficient matrix caused by diffusion coefficients, the CG method has been replaced with GMRES.Nevertheless, the numerical results in Table 3 are comparable to those in Table 2. Indeed, the τ preconditioner remains the best option, and the geometric multigrid preconditioner provides a good and robust alternative, especially when the problem is isotropic, i.e., when α ≈ β, also when applied to the band approximation.

Application to image deblurring
Fractional differential operators have been investigated to enhance diffusion in image denoising and deblurring [42,43,44].In this section, we investigate the effectiveness of the previous preconditioning strategies to the Tikhonov regularization problem min where m is the noisy and blurred observed image, B M is the discrete convolution operator associated with the blurring phenomenon, and µ > 0 is the regularization parameter.
The solution of the least square problem (6.1) can be obtained by solving the linear system by the preconditioned CG method since the coefficient matrix is positive definite.The matrix B M exhibits a block Toeplitz with Toeplitz blocks structure as the matrix A (α,β) M , but the spectral behavior is completely is given.This specific application will be considered in future research works, especially considering numerical methods for nonlinear models that require solving a linear problem as an inner step, see e.g.[47].

) with c( α) = − 1 2 cos απ 2 >
0, while a D α x u( x) and x D α b u( x) are the left and right Riemann-Liouville fractional derivatives.

Proposition 3 . 2 .Figure 1 :
Figure 1: Plot of C k,α vs the level k, for different values of α.

Figure 3 :
Figure 3: Plot of g n,α (x) for different values of α.

Figure 3
Figure 3 depicts the function g k,α for different values of k and α.

Figure 4 :
Figure 4: Plots of f and g s .

Table 2 :
Example 2: Number of iterations for different values of α and β with M 1 = M 2 .

Table 3 :
Example 3: Number of iterations for different values of α and β with M 1 = M 2 .