Why Improving the Accuracy of Exponential Integrators Can Decrease Their Computational Cost?

: In previous papers, a technique has been suggested to avoid order reduction when integrating initial boundary value problems with several kinds of exponential methods. The technique implies in principle to calculate additional terms at each step from those already necessary without avoiding order reduction. The aim of the present paper is to explain the surprising result that, many times, in spite of having to calculate more terms at each step, the computational cost of doing it through Krylov methods decreases instead of increases. This is very interesting since, in that way, the methods improve not only in terms of accuracy, but also in terms of computational cost.


Introduction
Exponential methods have become a valuable tool to integrate initial boundary value problems due to the recent development of techniques which allow us to calculate exponential-type functions in a more efficient way. From the paper in [1] to that in [2] twenty-five years later, some advances were made. However, apart from that, Krylov methods have become especially interesting in the context of the space discretization of boundary value problems because the matrices which turn up there are sparse. These iterative methods reduce the computation of an exponential-type function of those large matrices applied over a certain vector to the computation of a function of a much smaller matrix. In that sense, not only polynomial Krylov methods [3][4][5][6][7][8][9][10] have been considered, but also rational Krylov methods [11][12][13][14][15][16][17][18], which lead to a much better convergence rate, but imply to solve linear systems during the iteration process.
On the other hand, exponential methods were first introduced and analysed for the case of homogeneous boundary conditions [19]. Even in that case, order reduction is observed with respect to their classical counterparts unless more stringent conditions of annihilation on the boundary or periodic boundary conditions are considered (see [20,21] for the analysis with Lawson methods and [22] for splitting ones). With exponential Runge-Kutta methods, stiff order conditions have been derived [23] which may increase the number of stages which are necessary to get a given order.
In order to avoid order reduction with all these methods (without increasing the number of stages and even for time-dependent boundary conditions), some techniques have been suggested in [24][25][26][27][28][29][30]. What we have observed in some numerical experiments with the techniques in those papers is that, when avoiding order reduction, it does not only happen that the error diminishes more quickly with the stepsize, but also that, for a same moderate stepsize, the error is smaller and, what is more impressive, that the computational cost for a same stepsize is smaller. This is striking because increasing the order of the method implies calculating more terms at each step. However, by using the Krylov subroutine phipm.m in [8] in MATLAB to calculate those terms, we have checked in some cases that the computational cost is reduced. More precisely, Figure 6.1 in [30] shows this phenomenon when avoiding order reduction with a second-order Lawson method and Figure 1 [28] does the same with a second-order exponential Runge-Kutta method.
The aim of the paper is to understand why that reduction in computational cost for a same stepsize happens when avoiding order reduction. For that, we centre on symmetric discretizations of the differential operator of the problem at hand. Moreover, we will consider both the standard Krylov method and the purely rational Krylov one.
The paper is structured as follows. Section 2 gives some preliminaries on the problem to integrate and the technique to avoid order reduction. In Section 3, the standard Krylov technique and the commonly used adaptive subroutine phipm.m is described. It is then analysed how the subroutine works when the technique to avoid order reduction is applied in the naive way. Then, another more clever way to apply the same subroutine is suggested, as well as an improvement of the subroutine for the case at hand. Finally, in Section 4, the behaviour of the purely rational Krylov method is also studied when applied inside the technique to avoid order reduction, and it is again justified that increasing the order (and therefore accuracy) may be cheaper than not doing it. The main conclusions are stated in Section 5.

Preliminaries
The technique in [21,[24][25][26][27]29,30] to avoid order reduction when integrating linear and non-linear initial boundary value problems with exponential splitting, Lawson and Runge-Kutta methods is based on applying the method of lines in the reverse order than usual: Integrating firstly in time and then in space.
More precisely, we assume that the problem to integrate is where ' corresponds to differentiation with respect to time, X and Y are function spaces, ∂ : D(A) ⊂ X → Y corresponds to a boundary operator, A : D(A) → X to a differential space operator, and f : [0, T] × D(A) → X is a smooth source term. For a more detailed description of the assumptions, look at [21,[24][25][26][27]29,30]. When considering the time integration of this problem with exponential methods, exponential-type functions turn up. We remind that ϕ 0 (z) = e z , and that, for j ≥ 1, For example, ϕ 1 (z) = (e z − 1)/z, ϕ 2 (z) = (e z − 1 − z)/z 2 . Then, as the exponentials and ϕ j -functions should be applied over a differential operator instead of a matrix, those terms were suggested to be substituted by the solution of initial boundary value problems with appropriate boundaries. More precisely, e τ A α has been understood as the solution of problem when α ∈ D(A r+1 ), while ϕ j (τA)α (j ≥ 1) has been understood as the solution of problem Suitably increasing the value of r in these expressions implies increasing the local order of the method until the classical order is achieved (i.e., the order obtained when a non-stiff ordinary differential system is integrated).
Then, a space discretization for A is considered. For each elliptic problem of the form with source term F and boundary condition g, the approximation is given through a vector V h ∈ R N which satisfies a system of the form for some matrix A h,0 ∈ R N × R N , some operator C h : Y ∈ R N , and some approximation vector P h F ∈ R N of F. We will assume throughout the paper that A h,0 is symmetric and negative definite, and that, for small enough h, there exists a constant λ such that ρ(A h,0 ) ≤ λ < 0 (see (H1) in [26]).
In such a way, if ∂A l α(l = 0, . . . , r) can be calculated in terms of data, the space discretization of (3) becomes and that of (4), By using the variation-of-constants formula and the definition of the ϕ j -functions, the solution of these systems can then be written, respectively, as In order to achieve a certain local order p + 1, the value of r to be considered for the approximation of each of the terms which define the time integrator should be p or p − 1. That depends on whether these terms are multiplied or not by the factor k, which corresponds to the timestepsize in the definition of the time integrator. In any case, increasing the value of r implies getting a higher local order of the method (until the classical local order is achieved), but that means calculating more terms in each of the expressions of the form (6) to (7).

An Example as a Motivation
In order to better understand how Krylov subroutines behave when calculating those terms, we have considered a specific example.
In (1), we take X = C[0, 1], Y = R 2 , A the second-order space derivative, ∂ a Dirichlet boundary condition and, for the space discretization, we take A h,0 and C h as the standard symmetric second-order difference scheme, so that, for the boundary conditions g 0 and g 1 at x = 0 and x = 1, respectively, and P h is the nodal projection of a function on the nodes (h, 2h, . . . , Nh) with (N + 1)h = 1.
Then, we concentrate on calculating an expression like (6). More precisely, we take α = cos(x)/ 0.5 + 0.25 sin(2), which has unit L 2 -norm in the interval [0, 1]. We consider τ = 0.1, h = 10 −3 , and the values r = −1, 0, 1, 2. For that, we have used subroutine phipm.m in [8], which is explicitly constructed to calculate for some vectors b 0 , . . . , b r+1 , and therefore it seems very suitable for our purposes. As the subroutine is adaptive, it calculates this expression except for an estimated error, which is assured to be less than a parameter tol, and which is given as an input of the subroutine. We have considered two values for the tolerances: tol = 10 −7 and tol = 10 −13 . The results in terms of CPU time in a laptop computer can be seen in Table 1, which clearly shows that, for the first tolerance tol = 10 −7 , in spite of having to calculate more terms, the subroutine phipm.m takes less cpu time when r increases. However, when tol = 10 −13 , the cpu time increases with r. We give an explanation for this in the following section. The subroutine phipm.m is based on the standard Krylov method, which consists of the following: For a symmetric matrix B, a scalar τ > 0 and a vector v, where β = v , V m and T m are obtained from a Lanczos iteration process [31] and e 1 = [1, 0, . . . , 0] T . More precisely, the columns of V m are the elements of an orthonormal basis of the space < v, Bv, . . . , B m−1 v >, whose first element is v/β and T m is a tridiagonal (m × m)-matrix which satisfies In order to calculate ϕ l (τT m )e 1 , it is usual to consider the idea of Saad [9] (generalized by Sidje [10]) in which an augmented matrixT m is constructed, which has the form where I is the (l − 1) × (l − 1) identity matrix. Then, the top m entries of the last column of exp(τT m ) yield the vector τ l ϕ l (τT m )e 1 . In phipm.m, this exponential of the small matrix τT m is calculated through the function expm from MATLAB, which calculates it considering a 13-degree diagonal Padé approximant combined with scaling and squaring [32]. According to [9], the error when approximating ϕ l (τB)v through (8) is bounded by for some constant C for large enough m. Therefore, when ρ(τB) is large, m must be very big in order to approximate the exact value. More precisely, in [3,5,6] it is stated that, for symmetric matrices, only when m ≥ τ B superlinear convergence is guaranteed. As for our purposes, B = A h,0 comes from the space discretization of an unbounded differential operator, A h,0 is very large, and the way to manage convergence without increasing m too much is by diminishing τ through time-stepping. The main idea is that is the solution of the differential problem which corresponds to (5) and (6) for specific B and b l . In such a way, if we are interested in calculating u(τ end ), a grid 0 = τ 0 < τ 1 < · · · < τ n = τ end can be considered and (12) must be solved from τ k to τ k+1 taking u(τ k ) as initial condition. According to [33], where s k = τ k+1 − τ k . Then, phipm.m is based on considering the recurrence relation so that this expression can be written just in terms of ϕ r+1 and not the rest of functions ϕ l . More precisely, where w j can be recursively calculated through Then, subroutine phipm.m is adaptive in the sense that, taking an estimate for the error in Krylov iteration, it changes at each step the value s k or the number m of Krylov iterations to calculate ϕ r+1 (s k B) in (15) so that the estimate of the error at the final time τ end is below the given tolerance tol. Moreover, it chooses one or another possibility in order to minimize the computational cost. The estimate in [8,10] for the error when approximating ϕ l (τB)v is taken as the norm of where v m+1 is the unit vector originated in the Lanczos process which is orthogonal to all columns of V m . In phipm.m, they even add this estimation of the error to the approximation in order to be more accurate. More precisely, they take where ϕ l+1 (τT m ) is computed through the exponentiation of τ times the augmented matrix (9) with l substituted by l + 1 and taking into account that ϕ l (τT m )e 1 also appears in the result. In such a way, just a matrix of size m + l + 1 needs to be exponentiated.

Application to Our Problem
What we firstly observe when trying to apply standard Krylov subroutines to the formulas which turn up when avoiding order reduction (6) and (7) is that there are two types of terms: those which contain P h α, and those which contain C h ∂A l α. In the first case, as in the example of Section 2.1, α is taken as a continuous function of unit L 2 -norm, and therefore its nodal projection P h α is also uniformly bounded on h in the discrete L 2 -norm by an amount near 1. However, in the second case, C h ∂A l α is very big. In fact, the smallest h is to integrate more accurately in space, the biggest those terms are. (These remarks are general for other regular functions although not bounded by 1).
In order to see the convergence with the number of iterations for each type of terms with the standard Krylov method, in Figure 1 we represent the discrete L 2 -error against the number of iterations when h = 10 −3 and τ = 10 −2 , 10 −4 , 10 −6 . The top figure corresponds to approximating ϕ j (τA h,0 )P h α (j = 0,1,2,3), and the bottom one to ϕ j (τA h,0 )C h ∂α (j = 1,2,3). (We notice that, in our precise example, ∂A l α = ±∂α for any natural l). Apart from seeing that the size of errors is much bigger in the bottom plot, we can also check in both figures that the smaller τ is, the quicker the convergence with m. This corroborates the result in [3,5,6], which guarantees superlinear convergence for m ≥ τ A h,0 . We would also like to remark here that Figure 1 also shows that, in spite of the fact that the convergence with m is similar, when the subindex of the function ϕ grows, the size of errors is smaller. Regarding the estimate of error (17), that may be due to the fact that ϕ l+1 evaluated at the eigenvalues of τT m,h is smaller when l grows. By looking at (14), we notice that the exponential-type functions are smaller near zero when l grows, and the convergence to zero at −∞ is also a bit quicker. Look at Figure 2. Moreover, we do have the following result for the first iteration, which implies that the square of the error of the first Krylov iteration is an average of the square of the difference of ϕ evaluated at each eigenvalue minus its evaluation on an average of those eigenvalues: Theorem 1. If {u 1 , . . . , u N } is an orthonormal basis of eigenvectors of B, with corresponding eigenvalues {λ 1 , . . . , λ N }, and v = µ 1 u 1 + · · · + µ N u N is a unit-norm vector, it happens that, for every analytic function defined over an interval which contains the eigenvalues, Proof. On the one hand, it is clear that On the other hand, by Lanczos iteration, t 11 = v T Bv. Therefore, where the fact that {u 1 , . . . , u N } is an orthonormal basis of eigenvectors of B has been used in the last equality. Then, from what the result follows. Because of this, the less ϕ changes in the interval where the eigenvalues of B stay, the smaller the error will be. Obviously, ϕ l changes less in (−∞, 0) the bigger l is, which explains the relative behaviour of these functions in Figure 1.
On the other hand, in order to better understand why the errors when v = C h ∂α are much bigger and, furthermore, that the smaller τ is the bigger difference with respect to the case v = P h α, we can consider the estimation of error (17). We notice that the fact that v is much bigger for v = C h ∂α than for v = P h α cannot be the only reason to explain the behaviour in Figure 1 since that factor does not depend on τ. Because of that, we have decided to look into the factor [ϕ l+1 (τT m,h )] m,1 , where T m,h is the tridiagonal matrix which turns up after applying Lanczos iteration to B = A h,0 . As T m,h is a tridiagonal symmetric matrix whose numerical range is contained in that of B = A h,0 [31], the eigenvalues of T m,h are also contained in the numerical range of A h,0 , and therefore are negative and less than a value λ for a certain λ < 0. Denoting by D m,h the diagonal matrix containing the eigenvalues of T m,h , it happens that for some orthogonal (m × m)-matrix U m,h , and it seems that The behaviour of the functions ϕ l+1 , as plotted in Figure 2, explains that this factor increases from the top to the bottom, i.e., when τ decreases. In order to understand what happens when m increases, we have observed that t m+1,m (C h ∂α)/t m+1,m (P h α) is constant for m ≥ 2. Moreover, we represent λ m,h (P h α) and λ m,h (C h ∂α) in Figure 3. They both seem to converge to the same quantity but, in any case, for the values of m = 1, . . . , 50 (which are considered in Figure 1), λ m,h (C h ∂α)/λ m,h (P h α) is still very big, which explains the different relative behaviour of both plots in Figure 1 also for m = 2, . . . , 50.
As for the behaviour when h gets smaller, although not shown here for the sake of brevity, it is already well-known that the standard Krylov iteration does not give uniform convergence and, the smaller h is, the bigger m must be to get a given error for the approximation [6,14].  What subroutine phipm.m does in order to calculate (6) is not to apply Krylov iteration to each of the terms in that expression. The more natural seems to choose b j in (11) as b 0 = P h α, b l+1 = C h ∂A l α, l = 0, . . . , r, and the corresponding time-stepping (15) and (16) is performed, where Krylov iteration is just applied to approximate ϕ r+1 (s k B 0 )w r+1 . Doing things that way, Table 1 is obtained. On the one hand, the fact that just the biggest ϕ r+1 is considered in (6) for Krylov iteration is positive, since the errors are smaller when r grows. On the other hand, from what can be observed at least for tol = 10 −7 , the bigger errors coming from the consideration of the terms on C h ∂α seem to be ameliorated by the factors s l k in (16), since the more terms of that type are considered, the less CPU time the subroutine requires to get a given accuracy.

Suggestion to Improve the Result with Phipm.m
In order to simplify time-stepping, what we suggest in this section is to apply recurrence (14) to (6) and (7) before attempting to use subroutine phipm.m. More precisely, we take profit of the following result. Proposition 1. For r ≥ 0, formulas (6) and (7) can be written, respectively, as Proof. The proof follows by induction on r. Notice that, for r = 0 and formula (6), using the first formula of (14) with j = 0, Then, assuming that (19) is the same as (6), the same equivalence is true for r + 1, since where, for the second equality, ϕ r+1 has been substituted from its expression when solving from the first formula in (14) with j = r + 1. It is then easy to see that this expression is equivalent to (19) with r changed by r + 1. The proof follows in a similar way for (20).
Let us now concentrate on (19). Calling subroutine phipm.m just for the last term of this expression, i.e., taking many calculations in the time-stepping procedure (15) and (16) are avoided and, in some way, calculated just once and for all at the beginning in expression (19). We also notice that each bracket in this expression is approximating A j α (j = 1, . . . , r + 1) when h → 0, with the only remark that, the bigger j is, the more relevant the errors coming from the ill-posedness of numerical differentiation in space may be. Nevertheless, for smaller values of j, those brackets are controlled and, under enough regularity of α, do not grow in norm for the values of h, which are usually required for enough accuracy in space. Another important remark is that each of those brackets can be recursively calculated from the previous one just by multiplying by A h,0 and summing a term like C h ∂A l α. Therefore, many fewer calculations than those apparent are really necessary.
The results doing things that way for the example in Section 2.1 are shown in Table 2, where the improvement with respect to the results on cpu time in Table 1 are evident. Moreover, for the smallest tolerance, not only the necessary time to achieve a similar error (related to tol = 10 −13 ) is smaller, but we also notice that increasing the value of r, that time is smaller (as it also happened with tol = 10 −7 in Table 1). This seems to suggest that rounding errors due to time-stepping in Table 1 were too important. Table 2. CPU time to calculate (6) for the data in Section 2.1 using formula (19) and subroutine phipm.m just for the last term. In order to better understand why taking bigger r implies that CPU time is smaller, we have applied Krylov iteration to Notice that ϕ r+1 (τA h,0 )v r+1 is the last term in (19) except for the factor τ r+1 and that, in exact arithmetic, when h → 0, v r+1 ≈ A r+1 α, r = 0, 1, 2.
In Figure 4, we again represent the L 2 -discrete error against the number of Krylov iterations for different values of τ = 10 −2 , 10 −4 , 10 −6 . We can again see that the smaller τ is, the quicker the convergence with m. As for the size of errors, for r = 0, 1, they are quite similar to the first plot in Figure 1, and therefore not as big as in the second plot of the same figure. However, for r = 2, the errors are quite bigger than the corresponding ones in that first plot and, in any case, bigger than those of r = 0, 1. This is due to the fact that v 3 can be numerically seen to be of order 10 5 in spite of the fact of being presumable approximating A 3 α = −α. This is caused by roundoff errors, since numerical differentiation is a badly posed problem and a sixth-order space derivative is being considered.
On the other hand, the other terms in (19) are exactly calculated except for roundoff. Therefore, the error when calculating the whole expression should be the error which was represented in Figure 4 multiplied by τ r+1 . This explains that, in Figure 5, the errors are so small even for τ = 10 −1 , 10 −2 , 10 −4 . Moreover, the error is smaller when r increases, not only from r = 0 to r = 1, but also from r = 1 to r = 2 (although in a smaller proportion). This is the reason why taking a bigger value of r can decrease the computational cost when using phipm.m to approximate (6).  To better illustrate that when trying to approximate (6) withτ = 0.1, we represent in Figure 6 (0.1/τ) times the error when approximating (6) through (19) against (0.1/τ) times the CPU required to calculate (6) for τ = 0.1, 10 −2 , 10 −4 and different values of m. In such a way, we are implicitly assuming that, in order to approximate (6) withτ = 0.1, These assumptions are not completely satisfied but, in such a way, we can have an overview of the total cost to calculate (19) for the different values of r. As Figure 6 shows, r = 2 seems cheaper than r = 1 and this one cheaper than r = 0. Although not shown here for the sake of brevity, similar arguments apply for the case in which (7) is to be approximated with the same subroutine.

Modification of Subroutine Phipm.m
Estimate (17) was originally proposed in [9] in a context where it was reasonable just when τB < 1. For our purposes, and as it was also stated in [7], that is not useful. In fact, in Figure 7, we represent the discrete L 2 -error against the number of Krylov iterations when approximating ϕ r+1 (τA h,0 )P h α using the extrapolation (18) and when not using it (8). We can see that, only when τ is very small so that τ A h,0 is near 1, the extrapolation formula improves the standard one. However, in [7], by relating it to the error which is committed when integrating linear systems with the standard Krylov method, the following similar estimate was obtained: where the difference comes from the subindex in the ϕ-function evaluated at T m , which is now l instead of l + 1. Notice that, in such a case, the augmented matrix of size m + l + 1 does not need to be calculated, and it is enough to consider that in (9), which just has dimension m + l. As the authors mention in [7], (23) is not the first term of a series for the whole error, but it is just a rough estimate. Therefore, extrapolation has no sense in this case. If we change subroutine phipm.m accordingly, without considering the extrapolation (18) but just (8), and leaving the same adaptive strategy but with the new error estimation (23), the results in Table 3 are obtained when approximating (6), where again expression (19) and the choice (21) are used.  Table 3. CPU time to calculate (6) for the data in Section 2.1 using formula (19) and a modification of subroutine phipm.m (with error estimate (23) and without extrapolation) just for the last term. r = 0 r = 1 r = 2 tol = 10 −7 3.9e-01 3.4e-01 3.5e-01 tol = 10 −13 1.0e+00 5.9e-01 5.3e-01 In this table, we can see that the CPU time which is required to achieve a given tolerance is smaller than in Table 2 for tol = 10 −7 . This agrees with the fact that, when the stepsize is not very small, not doing extrapolation is more accurate and cheap than doing it. For the smaller tolerance tol = 10 −13 , the CPU time which is required when r = 0 is a bit bigger than in Table 2, which may be due to the fact that, in this particular case, the stepsizes which the subroutine has to take are so small that the extrapolation is worth doing. In any case, CPU time continues to diminish from r = 0 to r = 1 for both tolerances and, in the same way than in Table 2, it diminishes also for tol = 10 −13 from r = 1 to r = 2.

Purely Rational Krylov Method
Exponential methods are in principle constructed so that linear systems do not have to be solved when integrating stiff linear problems with standard implicit methods. For nonlinear stiff problems, standard Rosenbrock methods are usually used, which avoid to solve a non-linear problem at each stage by the use of the Jacobian of the problem. In any case, the calculation of this Jacobian at each step can be difficult or expensive and, therefore, this part may be more costly than the one of having to solve a linear system at each stage. Because of this, extended Krylov methods which require the calculation of powers of the inverse of a certain sparse matrix applied over a vector are justified when applied inside exponential methods which aim to solve non-linear problems. More precisely, although the calculations of those terms imply solving linear systems with sparse matrices, its cost may be justified in non-linear problems because the Jacobian of the non-linear function does not need to be calculated. We notice that, if Gaussian elimination for sparse matrices is used, as the matrix is always the same, the LU factorization is done once and for all at the very beginning. Moreover, if multigrid methods are used, the computational cost is just of the order of the number of grid points in the space discretization (the same order of the number of computations which are required when multiplying a sparse matrix times a vector in standard Krylov methods).
Extended Krylov subspace methods for symmetric matrices were described in [11] and estimates of convergence where given in terms of the smallest and biggest of the eigenvalues. In such a way, similarly to standard Krylov methods, they are not uniform convergence estimates on the spacegrid when the matrices come from the discretization of an unbounded operator but, as distinct, those estimates were smaller than those of standard Krylov methods and, in practice, the true error was even much smaller than those estimates. Much better estimates of the error observed in practice were obtained in [16] for both symmetric and nonsymmetric B, although just for the case in which the corresponding Krylov space contains the same number of positive and negative powers of B and for matrix functions, which do not include the exponential-type functions in which we are interested.
In this paper, we will centre on the purely rational Krylov method, which is based on considering as Krylov subspace for a certain real parameter γ for which the inverse of γI − B exists. An analysis of this method is performed in [14] when the matrix B is substituted by an unbounded operator with a field of values on the left half plane. In such a way, estimates of the error are obtained which are valid for discretizations of such an operator which are independent of the grid size. Furthermore, the bounds of the error which are proved in [14] are smaller when the index of the function ϕ l grows. More precisely, they decrease like although in practice, an exponential decay can be observed. Using both powers of B and (γI − B) −1 has been proven to be advantageous when some powers B l v are bounded. More precisely, in the case that v discretizes a function which vanishes at the boundary and which satisfies that several of the powers of A exist and also vanish at the same boundary, in [13] it is shown that the rate of convergence is when, in the Krylov subspace, those powers of B are considered as well as those on (γI − B) −1 . Numerically, even a higher order rate of convergence can be observed, which approximates that of the purely rational Krylov method.
In any case, as we are just interested in problems with non-vanishing boundary conditions, we will directly consider the purely rational Krylov method, which is based on constructing iteratively an orthonormal basis of the Krylov subspace (24) through Lanczos iteration.
If we denote by V m also to the matrix whose columns are that orthonormal basis, the projection onto that space (24) is given by Q m = V m V H m . The approximation to ϕ l (τB)v is then given by ϕ l (τB m )v, where B m = Q m BQ m . Therefore, and ϕ l (τS m )e 1 is calculated through expm(τŜ m ), whereŜ m is likeT m in (9) with T m substituted by S m .

Application to Our Problem
The aim of this section is to apply the purely rational Krylov method just described to our problem in Section 2.1.
In a similar way to Figures 1 and 4 in Section 3.2 for the standard Krylov method, in Figure 8 we represent, for r = 0, 1, 2, the error when approximating ϕ r+1 (τA h,0 )P h α, ϕ r+1 (τA h,0 )C h ∂A r α and ϕ r+1 (τA h,0 )v r+1 with v r+1 in (22) against the number of iterations. We have taken now τ = 10 −1 , 10 −2 , 10 −3 and, as distinct as what happened with the standard Krylov method and not explained in the literature, the convergence with m is now quicker when τ is bigger and, in any case, exponential from the very beginning (m = 1) although with a smaller rate when τ diminishes. Because of this, time-stepping has no sense with this method and the best approach is to apply Krylov iteration directly with the value of τ in which we are interested in (6).
As for the behaviour for the different values of the index l in ϕ l , in the same way as with standard Krylov iteration, the errors are smaller when l grows. We notice that, for m = 1, this method coincides with standard Krylov iteration and therefore, the explanation for that is given in Theorem 1. On the other hand, the rate of convergence also seems to be more advantageous when l grows, as in accordance with bound (25). Although not shown here for the sake of brevity, we have also run our experiments with smaller values of the grid size h and we have seen that the errors are very similar. This agrees with the analysis performed for rational Krylov methods in [14], and this behaviour makes this method more interesting than the standard one, which does not show uniform convergence on the space grid size. (Only when calculating ϕ 3 (τA h,0 )v 3 , the errors are a bit bigger when h diminishes for the higher values of m, and that is due to the fact that the calculation of v 3 is already quite difficult when h is small).
In any case, what is important is the behaviour of the error when approximating (19) against the number of iterations m for the different values of r = 0, 1, 2.
That can be observed in Figure 9. We notice that the error now is that of the bottom graph of Figure 8 multiplied by τ r+1 . We restrict here to the case τ = 0.1, since the aim is to approximate (6) with that value of τ. What we can observe in Figure 8 is that the error is smaller with r = 1 than with r = 0, and also slightly smaller with r = 2 than with r = 1.

General Conclusions
• At least for the range of values in which we move in the example of Section 2.1, calculating (6) or (7) with a given required accuracy is likely to be cheaper when r increases, in spite of the fact that, apparently, more terms need to be calculated. • In the case that having to solve several linear systems at each step with a same matrix is not a drawback for having chosen an exponential method to integrate a problem like (1), we recommend to approximate (6) or (7) by using (19) or (20) and a purely rational Krylov method to calculate the last term there. • Whenever having to solve several linear systems at each step is a drawback for using exponential methods to integrate a problem like (1), we recommend using the modification of the adaptive subroutine phipm.m, which is described in Section 3.4, although the subroutine phipm.m applied as described in Section 3.3 also gives quite acceptable results, at least in our precise example. Funding: This research was funded by Ministerio de Ciencia e Innovación and Regional Development European Funds through project PGC2018-101443-B-I00 and by Junta de Castilla y León and Feder through project VA169P20.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.