Serial and Parallel Iterative Splitting Methods: Algorithms and Applications to Fractional Convection-Diffusion Equations

: The beneﬁts and properties of iterative splitting methods, which are based on serial versions, have been studied in recent years, this work, we extend the iterative splitting methods to novel classes of parallel versions to solve nonlinear fractional convection-diffusion equations. For such interesting partial differential examples with higher dimensional, fractional, and nonlinear terms, we could apply the parallel iterative splitting methods, which allow for accelerating the solver methods and reduce the computational time. Here, we could apply the beneﬁts of the higher accuracy of the iterative splitting methods. We present a novel parallel iterative splitting method, which is based on the multi-splitting methods, The ﬂexibilisation with multisplitting methods allows for decomposing large scale operator equations. In combination with iterative splitting methods, which use characteristics of waveform-relaxation (WR) methods, we could embed the relaxation behavior and deal better with the nonlinearities of the operators. We consider the convergence results of the parallel iterative splitting methods, reformulating the underlying methods with a summation of the individual convergence results of the WR methods. We discuss the numerical convergence of the serial and parallel iterative splitting methods with respect to the synchronous and asynchronous treatments. Furthermore, we present different numerical applications of ﬂuid and phase ﬁeld problems in order to validate the beneﬁt of the parallel versions.


Introduction
Nowadays, iterative splitting methods are important solver methods to solve large systems of ordinary, partial, or stochastic differential equations, see [1][2][3][4][5][6].Iterative splitting methods are based on two solver ideas: In the first part, we separate the full operators into different sub-operators and reduce the computational time for such sub-computation.An additional benefit is the iterative technique, which allows for solving a relaxation problem, as in the waveform-relaxation method or Picard's iterative method, see [7][8][9][10].Both parts reduce the computational time and the complexity as if we solved all parts (full operator and direct method) together, see [11].Such iterative splitting methods can be used to compute, with less computational burden, an approximate solution of the ordinary differential equations (ODEs) or semi-discretized partial differential equations (PDEs), see [10,11].Moreover, we consider parallel splitting methods, which are nowadays important to solve large problems, see [3].Such parallel splitting methods are applied to the splitted subproblems and computed independently by the different processors.Therefore, in a second part we consider the parallelization-techniques, which are given with the multi-splitting approach, see [5].Such approaches allow for embedding the splitting techniques of the first part into a parallelized version, see [12].Based on the parallel versions of our methods, we can consider large computational problems.
For the computational problems, we consider the interesting part of higher dimensional and nonlinear PDEs as nonlinear and fractional convection-diffusion equations, see [13,14].The nonlinear PDEs are applied in nonlinear flow problems, for example, to simulate traffic-flow problems, see [15] and flow problems that are related to Navier-Stokes equations, see [16], which can be modeled with the Burgers' equation.
Furthermore, we take into account the fractional PDEs, e.g., fractional convection-diffusion equations.Nowadays, the fractional calculus is applied to several fields of science and engineering, for example, in visco-elastic and thermal diffusion in fractal domains, see [17,18], or in phase-field models with mixtures of fluids [19,20], or in biological models to deal with fractional multi-models [21].It is also interesting on a theoretical aspect, e.g., physical and geometrical interpretation [22], fractional Dirac operators [23].
The treatment of such delicate fractional and nonlinear partial differential equations (PDEs) needs additional spatial and nonlinear approximations of higher order, see [24].We apply flexible iterative splitting methods, see [13,25], which can be extended to parallel algorithms.
The underlying modeling equations are given as nonlinear fractional convection-diffusion equations: u(x, y, 0) = u 0 (x, y, 0), (x, y) ∈ Ω, u(x, y, t) = 0, (x, y, t) ∈ ∂Ω × [0, T], where Ω ∈ I R d is the spatial domain with d as the dimension and T ∈ I R + .We denote by ν ∈ I R + a scalar parameter of the convection part and by µ ∈ I R + the viscosity of the diffusion part, while f is a right hand side function.
Further interesting examples that are related to fractional and nonlinear PDE can be found in [24,28].
Equations ( 1)-( 3) can be derived in a more general setting with the idea of the least action principle, see [29,30].Here, we are not restricted by the specific Lagrangian or Hamiltonian principle, such that we can account for a much larger number of fractional differential equations.
Here, the fractional Laplace operator replaces the standard Laplacian operator, see [14], and it is denoted as a sum of one-dimensional spatial operators, L α,β = L α x + L β y : L α x u := where we have defined the Riemann fractional derivative of order α, as: In the next step, we apply a semi-discretisation with higher order finite-difference methods for α = β = 2, see [31] or with higher order Grünwald formula for the fractional operators with α, β ∈ (1, 2), see [27,32].Thus, we could reduce the computational cost of the time-splitting approaches, which are given as an iterative splitting approach, see [33].With such an approximation and the consideration of the semi-discretization with higher order schemes, we obtain the nonlinear differential equation system in a Cauchy-form, which is given as: c(0) = c 0 , (7) where c ∈ X ⊂ I R M , while M = m d , where d is the dimension and m is the number of grid-points in each spatial direction.Furthermore, Ã(c) : X → X × X is a nonlinear matrix, which includes the spatial discretization of the nonlinear convection in Equation (1) with the boundary conditions and with higher order spatial finite difference schemes, see [31].The operator B ∈ X × X is a linear matrix, which includes the spatial discretization of the fractional diffusion term with the boundary conditions and with the underlying fractional discretization methods, see [24,32].The function F(t): I R + → X is the discretized right-hand side.We assume X to be an appropriate Banach space with a vector and induced matrix norm || • ||, see [10,34,35].
For the nonlinear term, we apply the linearisation that is based on the Picard's iteration, which is also a special iterative splitting approach, see [7,36].We have: where j = 1, 2, . . ., J with some J ∈ IN (number of the iterations for the nonlinear part) and starting condition v 0 (t) = c 0 .Furthermore, we assume to have a sufficiently large J, such that ||v J − v J−1 || ≤ err, where err is an error-bound.For simplification, we assume F(t) = 0, while F(t) is a right hand side, which can be approached by an integral equation, see [35].Therefore, we obtain a linearised differential equation system, which is solved by the outer-iterations with j = 1, . . ., J.In the following, we concentrate on solving such linearised evolution equations by decomposing them with respect to their underlying matrix-operator in submatrix-operators, so that we obtain the following differential equation, where A = Ã(v J ) + B ∈ I R M × I R M is the full operator, and A l are the sub-operators.Further, c ∈ C 1 ([0, T]; I R M ) is the solution and c 0 ∈ I R M is the initial condition.The bottleneck of the iterative methods is due to the large sizes of the iteration matrices, see [4,5]; therefore, we consider parallel versions of the iterative splitting method, see [3,12].Based on the decomposition of the large scale differential equation with operator A into different smaller sub-differential equations with operators A l , where l = 1, . . ., L, and L is the number of processors, we distribute the computational time to many processors and reduce the overall computational time, see [5].Furthermore, we modify the synchronous parallel splitting method with chaotic (asynchronous) ideas, such that the computation and communication of the various processors can be done independently, see [37].
In addition, we have considered different examples of the literature that also discuss optimized computational cost and error bounds, see [28].Here, we deal with models of viscous Burgers' equation, which are applied in flow-problems, and fractional convection-diffusion equations, which are applied in diffusion interface phase field problems, see [20].
The outline of this paper is as follows.Section 2 explains the serial iterative splitting method.Section 3 introduces the parallel iterative splitting method.We discuss the theoretical results in Section 4. The numerical examples are presented in Section 5.In Section 6, we discuss the theoretical and practical results.

Serial Iterative Splitting Method
We consider a two-level iterative splitting method, which is discussed for two operators in [2] and for L operators in [38].
In this work, we also apply the recent theoretical results of our work in [26].While in that work we took the adaptivity of the splitting results into account, here we consider the application of multi-operator splitting approaches and multi-splitting methods.
Based on the differential Equation ( 10), we have the following decomposition of the operator A: where we have l = 1, . . ., L.
The serial iterative splitting method is given in the following algorithm.Here, we consider discretization step-size τ = t n+1 − t n , which can also be set adaptively, see [26].We assume to have time-interval [t n , t n+1 ] with n = 0, 1, . . ., N, where t 0 is the initial time and t N = T is the end time.We consecutively solve the following sub-problems for i = 0, L, . . ., (m − 1)L: . . .( 13) where we can assume for the first initialisation c 0 (t) = 0, or a different initial-function, see [2].c n = c(t n ) is the known split approximation, which is computed in the previous iterative procedure.The current split approximation at time t = t n+1 is defined as c n+1 = c (m−1)L (t n+1 ), where (m − 1)L is the maximal number of iterative steps.The stopping criterion is ||c (m−1)L − c (m−2)L || ≤ err and, then we have the solution c(t n+1 ) = c (m−1)L (t n+1 ).
We define the error-function as e i (t) = c(t) − c i (t) with e 0 (t) = c(t) − c 0 , the maximum-norm Theorem 1.Consider the bounded operators A l ∈ I R m × I R m for l = 1, . . ., L, where L is the number of operators.The iterations ( 11)-( 14) for the Cauchy-problem (10), which are applied with i = 0, L, . . ., (m − 1)L, are of order O(τ mL ).
Proof.The result for the 2-level method is given in [38].We apply a recursive argument to the iterative scheme with L operators and obtain: where C l is given with C l = O(τ), see also [38].
Remark 2. The operators are based on the spatial discretization with higher order methods, e.g., [27,31].Further, we assume that all of the operators, including the fractional discretized operators, are bounded, see [24].
Based on such assumption, we could generalize the results with respect to fractional operators.

Parallel Iterative Splitting Method
In the following, we parallelise the serial iterative splitting approach.We present the following approaches: • multi-splitting iterative approach, and • two operator iterative splitting approach.

Multi-Splitting Iterative Approach
The problem is given as

The idea is a multiple decomposition of
where A l is a non-singular and B l is the rest matrix.Further, we have the decomposition of the parallel computable vectors: where c i is the i-th iterative solution and c i,l are the parallel computable solutions in the i-th iterative step.E is the identity matrix and E l are diagonal matrices with positive entries.The multisplitting iterative approach is given as: where the initialisation is c 0 (t) = c(t n ), and i = 1, . . ., I are the iterative steps.The stopping criterion is ||c i − c i−1 || ≤ err and then we have the solution c(t n+1 ) = c i (t n+1 ).
The splitting error of the iterative splitting is of k + 1 order, i.e.O(τ k+1 ), with where Here, we consider the classical version of the parallel splitting algorithm that is applied with synchronisation, see also [45].We assume to deal with L processors and i = 1, . . ., I are the iterative steps, while I is the maximum number of iterative steps.
We deal with the parallel splitting in the synchronous version as: . . .( 27) where we assume for the initialisation of the first step c 0 (t) = 0. E is the identity matrix and E l are diagonal matrices with positive entries.A l and B l are given in the Equations ( 20) and (21).Furthermore, c n = c(t n ) is the known split approximation at the time-level t = t n , which is computed in the previous iterative process.The split approximation at the time-level t = t n+1 is defined as c n+1 = c i (t n+1 ), which is computed in the current iterative process.The solutions are given as: . . ., The stopping criterion is given as: The integrals can be solved by higher order integration-rules, see Remark 1.
In the classical algorithm, we have a synchronisation point, so that the next iterative step can only start if all processors have submitted the results, see Equation (34).Such a hard point allows for applying a simpler stopping criterion, which is given in Equation (35).
The bottleneck of the synchronous algorithm is that the finished processors could not go on with the next iterative step and the computational time is wasted.Therefore, we explain the ideas of the asynchronous algorithms, which are used to apply the parallel splitting method in an asynchronous version.

Asynchronous Algorithm
The idea of an asynchronous algorithm is that each processor works independently and has access to a common memory.If one processor needs an update of a solution of another processor, then he can read the solution in the common memory.The processors are independent and the convergence is given with the weighted norm, see [3].
An asynchronous algorithm is defined, as follows: Definition 1.For i ∈ IN, let {I(i)} i∈IN be the subset indicating which components are computed at the i-th iteration.Additionally, we have an iteration count s l (i) ∈ IN 0 (prior to i), which indicates the iteration when the l-th component was computed in processor l.
For i ∈ IN, we have I(i) ∈ {1, . . ., L}, where L is the number of processors and (s |{i Subsequently, we define the asynchronous algorithm: where x 0 = (x 0 1 , . . ., x 0 L ) t are the initialisations and H l is the solver function of the l-th component, see [3].
In the following, the convergence of the asynchronous algorithm is presented with respect to the weighted norm, see [3].Definition 2. We assume that each component space E i has a normed linear space (E i , || • || i ).We can define an appropriate norm for the parallel methods, which can be given as the weighted maximum norm: where the vector w = (w 1 , . . ., w m ) t is positive in each component w i > 0 for all i = 1, . . ., m.
Theorem 2. We assume that we have a fixed point Further, we assume that there exists γ ∈ [0, 1) and a positive vector w ∈ I R m , such that: Based on the assumptions, we conclude that the asynchronous iterates x k converge to x * , which is the unique fixed point of Proof.The proof is given in [3].

Parallel Splitting: Modern Version (Asynchronous Version)
We have to apply the following asynchronous algorithm, which is given in Definitions 1 and 2 and with the convergence results in Theorem 2. We assume to deal with L processors and i = 1, . . ., I are the iterative steps, while I is the maximum number of iterative steps.
We deal with the parallel splitting in the synchronous version as: . . .( 44) where we assume for the initialisation of the first step c 0 (t) = 0. E is the identity matrix and E l are diagonal matrices with positive entries.A l and B l are given in Equations ( 20) and (21).Further, c n is the known split approximation at time t = t n , which is computed in the previous iterative process.
The stopping criterion is given as: where w = (w 1 , . . ., w L ) t and we have the maximum-norm with w l = 1 for all l = 1, . . ., L.
The integrals can be solved by higher order integration-rules, see Remark 1.
In the modern algorithm, we do not have a synchronisation point, so that each processor can work independently.For the stopping criterion, we apply a weighted norm, which means that all of the single results of the processors have to be lower than the given error-bound, see the stopping criterion in Equation (54).

Theoretical Results
In the following, we deal with the m-dimensional initial value problem in the non-homogeneous form, also see the homogeneous form in Equation (10): where A is conveniently decomposed in two operators A = M + N, and f is the right hand side.Further, we deal in the following with the proof-ideas that are related to Waveform-relaxation methods, see [37,45].
The initial value problem (55) is solved with the multisplitting Waveform-relaxation method, which is given as: where A is given in Equation (10).Further, c 0 (t) = c 0 is the starting condition.
For the multisplitting approach, we have the following Definition: Definition 3. Let L ≥ 1 be the number of splittings, and A, A l , B l , E l real-valued m × m matrices.We say that (A l , B l , E l ) for l = 1, . . ., L is a multisplitting triple if: The matrices E l are non-negative diagonal matrices and satisfy: ∑ L l=1 E l = I , where I is the identity matrix, and • s l (i + 1) ≤ i + 1 indicates the iteration, where the l-th component is computed prior to i + 1.

•
The multisplitting approach that is based on the Waveform-relaxation in the classical version is given by: • The multisplitting approach based on the Waveform-relaxation in the modern version is given by:

Stability Analysis
We deal with the following system: where we have where k l (t) = exp(tA l ) B l for l = 1, . . ., L, and we apply the multisplitting notation (58), with: where k(t) = ∑ L l=1 E l k l (t) and we obtain the standard Waveform-relaxation method as: We can rewrite into an recursive notation and without loss of generality, we assume f (t) = 0, then we obtain: Given a well-conditioned system of eigenvectors,we can consider the eigenvalues λ 1,l of A l and λ 2,l of B l instead of the operators A l , B l themselves, for l = 1, . . ., L. For the matrices E l we have the eigenvalues λ E l with 0 ≤ λ E l ≤ 1 and ∑ L l=1 λ E l = 1.We can rewrite into the eigenvalue-notation and obtain: We assume that all of the initial values c i (t n ) = c approx (t n ) with i = 0, 1, 2, . . ., are as ||c approx (t n ) − c n || ≤ O(τ I ) where I is the order, following the ideas in the iterative splitting approach [46].Further, we also assume that the pairs λ 1,l = λ 2,l for l = 1, . . ., L, otherwise we do not consider the iterative splitting approach, while the time-scales are equal, see [34].
In the following, we apply the A(α)-stability.
We have the following proposition 1: Proposition 1. Starting with c(t n ) = c n and a time-step τ = t n+1 − t n , we obtain: where S i is the stability function of the scheme.The S i are given as: and with i-iterations.
Proof.We apply the complete induction.
We start with i = 0 and obtain: We apply the induction step i → i + 1, while we apply Equation (68): we apply the Equation (71) and obtain: and we obtain the results.
Let us consider the A(α)-stability that is given by the following eigenvalues in a wedge: For the A-stability we have . This means that we have the stiff operator, where we assume that z 2 is the non-stiff operator.
The stability of the two iterations is given in the following theorem with respect to A and A(α)-stability.
Theorem 3. We have the following stability for the iterative operator splitting scheme (71): For the stability function S i , where i is the iterative step, we have the following A-stability with ω ∈ [0, 1], the initialization is given as c −1 = 0 and the initial conditions are c i (t n ) = c n .
Proof.We consider the z 1 → −∞, while z 2 ≤ 0 is bounded as a nonstiff operator.We apply the complete induction.We start with i = 0 : with the assumption of the nonstiff operators λ 2,l ≤ 0 with l = 1, . . ., L.
Further we also have i = 1 : Subsequently, we apply the induction step for i → i + 1: where we applied || ∑ i j=0 S j (z 1 , z 2 , t)|| = 0. Afterwards, we obtain our results, also see the ideas of the stability proofs in [47].

Convergence Analysis
In order to apply the multisplitting Waveform-relaxation method (57) and (58), we write the solutions of the individual Equation (57) as: where we have where k l (t) = exp(tA l )B l for l = 1, . . ., L. Further, we apply the multisplitting notation (58) and obtain the summations: where k(t) = ∑ L l=1 E l k l (t) and we obtain the standard Waveform-relaxation method as: We assume that Lemma 1 is fulfilled, see also [45].
Lemma 1.The following items are equivalent: We define ||c|| T = max t∈[0,T] |c(t)| as maximum norm and we also denote by || • || the matrix norm induced by the vector norm | • |.
Based on these assumptions, in the following we derive the errors and convergence results.
In Theorem 4, we derive the error of the i-th approximation, see also [45].
Theorem 4.There exists a constant C := ∑ L l=1 C l , which is given to estimate the kernel k of the multisplitting waveform-relaxation operator, such that we obtain ||k|| T = C. Subsequently, the error of the i-th approximation of the classical multisplitting WR method (57) and ( 58) is given by Proof.We have given We apply the Lemma 1 and follow with the iterative approach: where is K i u(t) is the i-times convolution There exists: and we have We apply the estimation of the Waveform-relaxation, see [8], and obtain: where we have lim The error estimate is then given as We apply Subsequently, we obtain the estimation We have the following new convergence Theorem 5 based on the extension of the classical convergence Theorem version 4.
Theorem 5.There exists a constant C := ∑ L l=1 C l , which is given to estimate the kernel k of the multisplitting waveform-relaxation operator, such that we obtain ||k|| T = C. Subsequently, the convergence of the modern multisplittting WR method (59) and ( 60) is given by where i min = min L l=1 s l (i), where s l (i) ≤ i are the retarded iterations of the l-th processor.
Proof.We start with the estimation of the i-th iteration: Afterwards, we have the recursion: where we apply ||K i min || T ≤ (C T) i min i min !, based on the idea in [8], and we obtain: where i min = min L l=1 s l (i).
Remark 3.For the parallel error, we obtain order O(τ m ) if we assume that all processors have at least m iterative cycles, while, for the serial error, we have order O(τ mL ).Thus, in the serial version, we have to apply mL iterative steps in sum to obtain the result, while in the parallel version, we only apply m iterative steps, while L processors share the computation to solve the L sub-equations.Furthermore, we can assume that the sub-equations are faster to solve, because the sub-operators are much smaller and simpler to handle.Subsequently, we have t sub ≤ t f ull L , where t sub is the time to solve a sub-problem and t f ull the time to solve the full problem.Therefore, we have a benefit in the parallel distribution and obtain faster the higher order O(τ mL ) than with the serial version.

Remark 4.
While the classical version of the parallel splitting method has order O(τ m ), see the Theorem 4, the modern version of the parallel splitting method can improve the order partially to higher order.We obtain O(τ m partial ) with m ≤ m partial ≤ mL f ast , when we split into slow and fast convergent processors and assume that the faster results of the fast convergent processors are sufficient for the update.We can define a new c i = ∑ l∈Fast Ẽl c i,l with E = ∑ l∈Fast Ẽl and the set Fast is given with the fast convergent processors, see also the ideas of [37,48].Therefore, we circumvent the slow processors and, later, we redistribute the decomposition of the matrices to gain a more balanced load of the processors.

Numerical Examples
In what follows, we deal with different numerical examples, which are motivated by real-life applications in fluid-flow and phase-field problems.We verify and test our theoretical results of the novel parallel iterative splitting approaches.
We deal with:

First Example: Matrix Problem
In the first test example, we consider the following matrix equation, We split the matrix as: • Two operator approach • Multiple operator approach where the E 1 and E 2 are given as: We include Tables 1 and 2 that correspond to multi-splitting iterative approach classical and modern version with the above partitions and using different discretizations in [0, 1] of step h allowing for a maximum of 10 iterations and a tolerance of 10 −3 .We can see in the results the relative and absolute errors for each component of the solution and the average iterations performed in order to reach the tolerance.
Figure 1 shows the influence of the tolerance value in the error of the modern version of the algorithm.Each group of bars represents the error for the different step sizes indicated in the legend.It can be seen that the error decreases with the step h if the tolerance is small enough.This is the case, except for tol = 10 −2 , where higher errors appear for smaller steps.Looking at the bars that correspond to the same step and different tolerances, it can be observed that the error for a given temporal step h reaches a minimum as the tolerance decreases and, beyond this point, the errors stabilize or even increase slightly for smaller tolerances.Remark 5. We applied the multisplitting method with the classical (synchronous) and modern (chaotic) approach.We receive the same accuracy of the numerical results, which means that the methods are equally accurate.We obtain some more benefits of the modern method if we apply large time-steps, such that the solution of one sub-problem can be achieved faster and benefit the solution of the second sub-problem.For such small computational unbalances, the modern approach is more efficient.

Second Example: Diffusion Problem
We deal with the following diffusion problem: where we have the analytical solution u an (x, t) = exp(−3t) sin x sin y sin z, with x = (x, y, z) t and In operator notation, we write: where ∂z 2 and we assume that the zero-boundary conditions (Dirichlet boundary conditions) are fulfilled.
The problem is discretized by using a four-dimensional (4-D) mesh in Ω × [0, T].Denote by u i,j,k,t the approximated value of the solution at node (x i , y j , z k , t) for a given t.For the time-integration, we apply the integral formulation, see Equations ( 15)- (18).
For the spatial discretization, we test a second order scheme: and a fourth order scheme: where we have the analogous operators for the y and z derivatives.We compute the solution u(∆x, h) obtained using spatial and temporal steps ∆x = ∆y = ∆z and h, respectively, in order to establish the convergence of the algorithms.We use different measures to estimate the convergence.On one hand, we can compare the outcome of the method u(∆x, h) with the exact solution u ana for every point of the mesh, which shows the convergence of the method.On the other hand, we can compare u(∆x, h) with the result that was obtained halving the time steps, h/2, at the points shared by the corresponding meshes.This allows for analyzing how the results depend on these steps.
Denote by e i,j,k (∆x, h) the difference between the results at a mesh point in the final time T, (x i , y j , z k , T), obtained using time steps h and h/2, and by δ i,j,k (∆x, h) the difference with the analytical solution at the same point.In the tables, we will denote the maximum errors by and and the mean errors by and where N is the number of spatial nodes at time T.
In the following, we discuss different decompositions of the multi-operator splitting approach: • Directional decomposition: We decompose into the different directions: Here, we have the benefit of decomposing the different directions.
The drawback is related to the unbalanced decomposition, where the matrices have different sparse entries.Therefore, the exponential matrices of the operators are different in their sparse behaviour and the error can not be optimally reduced.
We can reduce the unbalanced problem, if we deal with the idea to use ∆t ≈ ∆x, see [38].Subsequently, we obtain at least a second order scheme (related to the spatial discretization).
We compare our sequential and parallel iterative splitting methods with standard ones such as the sequential operator splitting [49] and the Strang-Marchuk splitting [50].We apply the splitting algorithms with directional decomposition in [0, 1].The splitting is iterated until a tolerance of 10 −8 or a maximum of 10 iterations is reached.The values that are shown in the tables correspond to maximum or mean values at T = 1.Tables 3 and 4 present the results obtained using different number of temporal steps and the second and fourth order schemes for the spatial discretization, respectively.
The standard methods, sequential operator, and Strang-Marchuk splitting give almost the same results independently of the number of time steps, because of the linearity of the equation.The e error estimates are negligible, whereas the δ error estimates are independent of the time step, reflecting the spatial discretization error.Tables 3 and 4 show that the estimated errors e max and e mean of the iterative splitting methods are proportional to h for the second order scheme of discretization of the directional derivatives, whereas they are proportional to h 2 for the fourth order scheme.The mean differences with the analytical solution, δ mean , of the iterative splitting methods tend to those of the non iterative ones as h decreases.The δ error estimates are more than one order of magnitude better for the fourth order scheme than for the second order scheme.Figures 2 and 3 depict these results.Now, we analyze the influence of the number of spatial intervals on the convergence properties of the iterative splitting algorithms.The spatial step is reduced simultaneously in the three dimensions, in order to keep the increments equal, because the performance of the method is better in this case.The time step is small enough to ensure the convergence in the case of the smaller spatial subinterval and is the same in all of the cases to allow the comparison depending only on the spatial step.
For the second order scheme (see Table 5), the e error estimates are proportional to the number of spatial subintervals in each dimension, whereas the δ error estimates decrease, indicating a better approximation to the analytical solution as the spatial step decreases.The parallel methods obtain slightly less approximation to the analytical result and they require more iterations than the serial iterative method.The modern parallel method needs more iterations per step to converge than the classical parallel method.
For the fourth order scheme (see Table 6), the e error estimates are unaffected by the number of spatial subintervals, so that the convergence is maintained.The δ error estimates are proportional to a power of the spatial step of degree between 3 and 5, greatly improving the approximation of the second order scheme.• Balanced decomposition: We decompose into: Here, we have the benefit of equal load balances of the matrices, such that the exp-matrices have the same sparse structure.The results of the splitting algorithms with balanced decomposition are shown in Tables 7 and 8 for the second and fourth order schemes for the spatial discretization, and in Figures 6 and 7, respectively.The numerical behaviour is similar to that of the directional decomposition.• Mixed decomposition: We decompose into: where = [0, 2/3].For = 0, we have the directional decomposition, while, for = 2/3, we have the balanced decomposition.

Remark 6.
Based on the balanced decomposition with = 2/3, we do not have problems with the splitting approaches and obtain optimal results.For the pure unbalanced decomposition, which means = 0, we decompose into different directions.Here, we have to restrict us to the exact second order approach, which is ∆t ≈ ∆x.
Remark 7. We obtain the benefit of the classical and modern parallel iterative splitting method that is based on larger time-steps and more iterative steps.In such an optimal version, we are much faster than the serial version and also the result is more accurate.For very fine time-steps, we do not see an improvement in the accuracy, but we see a benefit in the computational time; the parallel versions are faster.

Third Example: Mixed Convection-Diffusion and Burgers' Equation
We consider a partial differential equation, which is a two-dimesnional (2D) example of a mixed convection-diffusion and Burgers' equation.Such mixed PDEs are used to model fluid flow problems in traffic or population dynamics, see [15,16,51].For testing the numerical methods, we consider a Burgers' equation, where we can find an analytical solution.The model problem is: u(x, y, 0) = u ana (x, y, 0), (x, y) ∈ Ω, (127) where 25, and µ is the viscosity.The analytical solution is given by where we compute f (x, y, t) accordingly.As in the previous example, denote, by u(∆, h), the numerical solution obtained using spatial subintervals of amplitude ∆ = ∆x = ∆y, time steps h and a tolerance of tol = 10 −6 , allowing a maximum of 40 iterations.On one hand, we will compare the numerical solution with the exact one u ana for every point of the mesh, which shows the convergence of the method.On the other hand, we will compare u(∆, h) with the result obtained halving the time steps, h/2, at the points that are shared by the corresponding meshes.Denote by e i,j,k (∆, h) the difference between the results obtained using two different time steps, h and h/2, at a common mesh point (x i , y j , t k ), and by δ i,j,k (∆, h) the difference with the analytical solution at the same point.In the tables, we will use the error estimates given by and and the mean errors by and where N is the total number of nodes (x i , y j , t k ).
We measure the time consumed by processor l in each iteration m, in [t k , t k+1 ], tp k,l,m in order to obtain the temporal cost of the parallel schemes.In the classical parallel scheme, the processors synchronize at each iteration, so the cost for this time interval is tp k = ∑ m max l=1,2 tp k,l,m , whereas, in the modern parallel scheme, the processors iterate independently in [t k , t k+1 ] performing m l , l = 1, 2 iterations until convergence, thus the cost is tp k = max l=1,2 ∑ m tp k,l,m .The final cost is obtained adding the results of all time subintervals.
In the following, we discuss different decompositions of the multi-operator splitting approach:

• Directional decomposition
We decompose into the different directions (x and y direction): Let us first analyze the influence of parameter µ on the convergence of the algorithms.Tables 9 and 10 show the error estimations of the considered algorithms for different values of µ using the second and the fourth order discretization scheme, respectively, taking 10 subintervals in each spatial dimension and 640 time steps.The results for the different algorithms are similar.As it could be expected, higher viscosity values give lower error estimates.Figure 8 shows the dependence on µ of the values of e mean for the different algorithms, using second and fourth order approximations for the spatial derivatives.In contrast with the former example, the use of higher order approximations for the spatial derivatives produces only a slight improvement in the error estimations, in both cases being of the same order with respect to the time step.As we can observe, the mean error is approximately proportional to the inverse of the viscosity, µ, and it is not much affected by the order of approximation of the spatial derivatives.The number of iterations and the execution time are not affected by the viscosity changes, except for the case of µ = 2 in the modern parallel algorithm, where the maximum number of iterations is reached in each step, also consuming more execution time.In what follows, we will take the viscosity parameter µ = 0.5 and use second order approximations for the spatial derivatives.
We will now analyze the influence of the number of time steps on the convergence of the algorithms.
Table 11 shows that the estimated error is roughly proportional to the time step, although, in the end, the differences δ with the analytical solution decrease more slowly, due to the discretization error.All of the considered methods have errors of the same order.The sequential operator splitting method presents higher estimates than the iterative schemes for the maximum error, but lower estimates of the mean error, as seen in Figure 9.   Now, we analyze the dependence on the number of spatial subintervals.Table 12 displays the errors obtained varying the number of space subintervals, considering 1280 time steps, µ = 0.5, tolerance 10 −6 and maximum number of iterations 40. Figure 10 shows, on the left, that the convergence properties degrade with the number of spatial steps for a fixed time step and, on the right, that the best approximations are obtained for 10 subintervals in each spatial dimension.The error increment for more space intervals can be due to the fact that the condition number of the exponential matrix utilized in order to solve the differential system increases fast with the number of spatial intervals, making the solution less reliable.

• Convection and diffusion decomposition
Here, we decompose to an explicit part, which is the convection, and into an implicit part, which is the diffusion.
Table 13 analyzes the convergence of the different splitting methods for the convection and diffusion decomposition varying the time step.The errors are proportional to the time step, as shown in the log-log diagrams in Figure 11.The iterative methods have slightly better accuracy than the reference methods, particularly better than the sequential operator splitting method.The modern parallel version does not converge for 160 time steps.Table 14 analyzes the dependence on the spatial step.The behavior is similar to the case of the directional decomposition, presenting an increment of the estimated error with the number of spatial subintervals for a fixed time step.In the second order scheme, the δ-errors decrease with the space step.The temporal cost is relatively high in the case of 20 subintervals, due to the computational overhead for dealing with big matrices.Figure 12 compares the estimated mean errors e mean and δ mean of the different methods with the convection-diffusion decomposition and second order approximation of the spatial derivatives for different number of spatial steps.The sequential operator splitting has poorer convergence properties than the other methods, and its result differs more from the analytical solution as the number of spatial nodes increases.

• Balanced decomposition
We decompose into: where is an arbitrary parameter that can be tuned in order to achieve the maximum efficiency.We first examine the influence of parameter on the convergence of the different splitting schemes, see Table 15, The method has the same behavior for parameter values symmetric with respect to 0.5.The results are quite uniform for in the range [−0.1, 1.1], whereas, for other parameter values, the method may diverge.The classical parallel algorithm yields results for = 2, whereas the serial and the modern parallel iterative methods fail for = 2.The dependence on the time and space steps of the algorithms with the convection-diffusion decomposition is similar to that of the previously analyzed decompositions.Table 16 compares the behavior of the considered methods with different decompositions.Figure 13 shows that the non-iterative splitting methods give better results with the -balanced decomposition, whereas the iterative methods give similar results with all of the decompositions, being slightly better for the directional decomposition.
Figure 14 compares the temporal costs that are shown in Table 16.The results indicate that the directional decomposition is better than the convection-diffusion decomposition and the -balanced decomposition for all the considered splitting algorithms.
In operator notation, we write: where A 1 = d(x) ∂ α ∂x α , A 2 = e(x) ∂ β ∂y β + q(x, t) and we assume that the Dirichlet boundary conditions are embedded into the operators.
We apply the normalized Grünwald weights by: for the right-shifted Grünwald formula, see [32].We apply: e(x i , y j ) ∂ β u(x i , y j , t) In order to establish the convergence of the algorithms, we compute the solution u(∆, h) obtained while using spatial and temporal steps ∆ = ∆x = ∆y and h,, respectively.We use different measures to estimate the convergence.On one hand, we compare the outcome of the method u(∆, h) with the exact solution u an for every point of the mesh, which shows the convergence of the method.On the other hand, we can compare u(∆, h) with the result obtained halving the time or space steps, h/2, at the final time T = 1.Denote, by e i,j (∆, h) , the difference between the results at a mesh point (x i , y j , 1), obtained using two different time steps, h and h/2, and by δ i,j (∆, h) the difference with the analytical solution at the same point.In the tables, we will denote the maximum errors by and and the mean errors by and where N is the number of spatial nodes at time T.
In the following, we discuss different decompositions of the multi-operator splitting approach: • Directional decomposition: We decompose into the different directions: The directional decomposition allows obtaining the solution solving linear systems of size n x − 1, where n x is the number of spatial subintervals.Table 17 shows the results of the considered splitting algorithms for 20 spatial subintervals and different time steps, allowing for a maximum of three iterations and using tolerance 10 −4 .Figure 15 shows, on the left, that the parallel iterative methods perform slightly better than the serial iterative method and, on the right, the approximation to the analytical solution for different number of time steps.Table 18 shows the results of the considered splitting algorithms for 320 time steps and different number of spatial subintervals, allowing for a maximum of three iterations and using tolerance 10 −4 .Figure 16 presents graphically the results.• Balanced decomposition: We decompose into: Here, we have the benefit of equal load balances of the matrices, such that the exp-matrices have the same sparse structure.
Table 19 shows the results of the considered splitting algorithms for 20 spatial subintervals and different time steps, allowing a maximum of three iterations and using tolerance 10 −4 .Figure 17 shows, on the left, that the parallel iterative methods perform slightly better than the serial iterative method and, on the right, the approximation to the analytical solution for different number of time steps.Table 20 shows the results of the considered splitting algorithms for 320 time steps and different number of spatial subintervals, allowing a maximum of three iterations and using tolerance 10 −4 .Figure 18 graphically presents the results.

• Mixed decomposition
We decompose into the different directions: where = [0, 1/2].For = 0, we have the directional decomposition, while, for = 1/2, we have the balanced decomposition.Table 21 shows the influence of on the convergence of the different splitting methods with mixed decomposition for the fractional diffusion problem using 20 subintervals in each spatial direction and 320 time steps.The same information can be seen in Figure 19.serial method.Based on the parallel distribution, we have additional iterative steps for each processor and we distribute such a memory to all processors.For large scale numerical experiments, we could present the benefit of the parallel resources.
In our proposed iterative methods, we gain accuracy if we apply more iterative cycles, so that we have to devote additional computational time.We could reduce the computational cost more than with serial iterative methods due to the application of parallel ideas.On the other hand, it is important to optimize the parallel amount of work with additional adaptive and distributed ideas that improve the efficiency of the proposed parallel methods.
In summary, we optimize the reduction of computational time and the results accuracy with the help of parallel iterative splitting methods.In the future, we will consider more real-life problems and stochastic processes.Furthermore, we will extend the parallel iterative methods to more efficient adaptive schemes.
while A l are the sub-operators for the iterative part (solver part), and • and B l = A − A l are the sub-operators for the relaxation part (right hand side part).

Figure 1 .
Figure 1.Matrix problem.Precision of the modern version of the multi-splitting iterative approach in terms of the time step h and the tolerance.

Figure 8 .
Figure 8. Mixed convection-diffusion and Burgers' equation, directional decomposition.Precision of the proposed methods: sequential iterative SI, classical parallel CP, and modern parallel MP, compared with the classical ones: sequential operator SO and Strang-Marchuk SM, for different viscosities µ.Left: order 2 approximation and right: order 4 approximation for the discretization of the spatial derivatives

Figure 9 .
Figure 9. Mixed convection-diffusion and Burgers' equation, directional decomposition.Precision of the proposed methods: sequential iterative SI, classical parallel CP, and modern parallel MP, compared with the classical ones: sequential operator SO and Strang-Marchuk SM, for different number of time steps.

Figure 10 .
Figure 10.Mixed convection-diffusion and Burgers' equation, directional decomposition.Precision of the proposed methods: sequential iterative SI, classical parallel CP, and modern parallel MP, compared with the classical ones: sequential operator SO and Strang-Marchuk SM, for different number of spatial steps in the directional decomposition.

Figure 11 .
Figure 11.Mixed convection-diffusion and Burgers' equation, convection-diffusion decomposition.Precision of the proposed methods: sequential iterative SI, classical parallel CP, and modern parallel MP, compared with the classical ones: sequential operator SO and Strang-Marchuk SM, for different number of time steps.

Figure 12 .
Figure 12.Mixed convection-diffusion and Burgers' equation, convection-diffusion decomposition.Precision of the proposed methods: sequential iterative SI, classical parallel CP, and modern parallel MP, compared with the classical ones: sequential operator SO and Strang-Marchuk SM, for different number of spatial steps.

Figure 13 .Figure 14 .
Figure 13.Mixed convection-diffusion and Burgers' equation with 10 spatial intervals and 1280 temporal steps.Precision of the proposed methods: sequential iterative SI, classical parallel CP, and modern parallel MP, compared with the classical ones: sequential operator SO and Strang-Marchuk SM, for different decomposition methods.

Figure 16 .
Figure 16.Fractional diffusion equation, directional decomposition.Precision of the proposed methods: sequential iterative SI, classical parallel CP, and modern parallel MP, compared with the classical ones: sequential operator SO and Strang-Marchuk SM, for 320 temporal steps and different number of spatial subintervals.

Figure 17 .
Figure 17.Fractional diffusion equation, balanced decomposition.Precision of the proposed methods: sequential iterative SI, classical parallel CP, and modern parallel MP, compared with the classical ones: sequential operator SO and Strang-Marchuk SM, for different number of time steps.

Figure 18 .
Figure 18.Fractional diffusion equation, balanced decomposition.Precision of the proposed methods: sequential iterative SI, classical parallel CP, and modern parallel MP, compared with the classical ones: sequential operator SO and Strang-Marchuk SM, for 320 temporal steps and different number of spatial subintervals.

Table 4 .
Diffusion problem.Results of the directional decomposition using the fourth order scheme for the spatial discretization, with 10 subintervals in each dimension and different number of temporal steps.10 −16 2.3056 × 10 −17 2.6024 × 10 −4 4.7017 × −5 1.0000 40 2.9143 × 10 −16 2.6939 × 10 −17 2.6024 × 10 −4 4.7017 × −5 Diffusion problem, directional decomposition.Precision of the proposed methods: sequential iterative SI, classical parallel CP, and modern parallel MP, as compared with the classical ones: sequential operator SO and Strang-Marchuk SM, with second order approximations for the spatial derivatives for different number of time steps.
Figure 3. Diffusion problem, directional decomposition.Precision of the proposed methods: sequential iterative SI, classical parallel CP, and modern parallel MP, compared with the classical ones: sequential operator SO and Strang-Marchuk SM, with fourth order approximations for the spatial derivatives for different number of time steps.

Table 6 .
Diffusion problem, directional decomposition.Results using the fourth order scheme for the spatial discretization, with 320 temporal steps and different number of subintervals in each dimension.Diffusion problem, directional decomposition.Precision of the proposed methods: sequential iterative SI, classical parallel CP, and modern parallel MP, compared with the classical ones: sequential operator SO and Strang-Marchuk SM, with second order approximations for the spatial derivatives with 320 time steps and different number of spatial steps.

Table 7 .
Diffusion problem.Results of the balanced decomposition using the second order scheme for the spatial discretization, with 10 spatial subintervals and different number of temporal steps.

Table 8 .
Diffusion problem.Results of the balanced decomposition using the fourth order scheme for the spatial discretization, with 10 spatial subintervals and different number of temporal steps.Diffusion problem, balanced decomposition.Precision of the proposed methods: sequential iterative SI, classical parallel CP, and modern parallel MP, compared with the classical ones: sequential operator SO and Strang-Marchuk SM, with second order approximations for the spatial derivatives for different number of time steps.

Table 9 .
Mixed convection-diffusion and Burgers' equation.Results of the directional decomposition using the second order scheme for the spatial discretization, with 10 spatial subintervals, 640 temporal intervals and different values of µ.

Table 10 .
Mixed convection-diffusion and Burgers' equation.Results of the directional decomposition using the fourth order scheme for the spatial discretization, with 10 spatial subintervals, 640 temporal intervals and different values of µ.

Table 11 .
Mixed convection-diffusion and Burgers' equation.Results of the directional decomposition with 10 spatial subintervals and different number of temporal steps.

Table 12 .
Mixed convection-diffusion and Burgers' equation.Results of the directional decomposition using the second order scheme for the spatial discretization, with 1280 time subintervals and different number of spatial steps.

Table 13 .
Mixed convection-diffusion and Burgers' equation.Results for the convection diffusion decomposition using the second order scheme for the spatial discretization with 10 spatial subintervals and different number of temporal steps.

Table 15 .
Mixed convection-diffusion and Burgers' equation, balanced decomposition.Results for the balanced decomposition using the second order scheme for the spatial discretization with 640 temporal subintervals and different values of parameter .

Table 17 .
Fractional diffusion equation.Results for the directional decomposition with 20 spatial subintervals and different number of temporal steps.10 −3 2.1918 × 10 −4 7.4079 × 10 −3 4.3302 × 10 −4 Fractional diffusion equation, directional decomposition.Precision of the proposed methods: sequential iterative SI, classical parallel CP, and modern parallel MP, compared with the classical ones: sequential operator SO and Strang-Marchuk SM, for different number of time steps.