Good (and Not So Good) Practices in Computational Methods for Fractional Calculus

: The solution of fractional-order differential problems requires in the majority of cases the use of some computational approach. In general, the numerical treatment of fractional differential equations is much more difﬁcult than in the integer-order case, and very often non-specialist researchers are unaware of the speciﬁc difﬁculties. As a consequence, numerical methods are often applied in an incorrect way or unreliable methods are devised and proposed in the literature. In this paper we try to identify some common pitfalls in the use of numerical methods in fractional calculus, to explain their nature and to list some good practices that should be followed in order to obtain correct results.


Introduction
The increasing interest in applications of fractional calculus, together with the difficulty of finding analytical solutions of fractional differential equations (FDEs), naturally forces researchers to study, devise and apply numerical methods to solve a large range of ordinary and partial differential equations with fractional derivatives.
The investigation of computational methods for fractional-order problems is therefore a very active research area in which, each year, a large number of research papers are published.
The task of finding efficient and reliable numerical methods for handling integrals and/or derivatives of fractional order is a challenge in its own right, with difficulties that differ in character but are no less severe than those associated with finding analytical solutions. The specific nature of these operators involves computational challenges which, if not properly addressed, may lead to unreliable or even wrong results.
Unfortunately, the scientific literature is rich with examples of methods that are inappropriate for fractional-order problems. In most cases these are just methods that were devised originally for standard integer-order operators then applied in a naive way to their fractional-order counterparts; without a proper knowledge of the specific features of fractional-order problems, researchers are often unable to understand why unexpected results are obtained.
The main aims of this paper are to identify a few major guidelines that should be followed when devising reliable computational methods for fractional-order problems, and to highlight the main peculiarities that make the solution of differential equations of fractional order a different-but surely more difficult and stimulating-task from the integer-order case. We do not intend merely to criticize weak or wrong methods, but try to explain why certain approaches are unreliable in fractional calculus and, where possible, point the reader towards more suitable approaches.
This paper is mainly addressed at young researchers or scientists without a particular background in the numerical analysis of fractional-order problems but who need to apply computational methods to solve problems of fractional order. We aim to offer in this way a kind of guide to avoid some of the most common mistakes which, unfortunately, are sometimes made in this field.
The paper is organized in the following way. After recalling in Section 2 some basic definitions and properties, we illustrate in Section 3 the most common ideas underlying the majority of the methods proposed in the literature: very often the basic ideas are not properly recognized and common methods are claimed to be new. In Section 4 we discuss why polynomial approximations can be only partially satisfactory for fractional-order problems and why they are unsuitable for devising high-order methods (as has often been proposed). The major problems related to the nonlocality of fractional operators are addressed in Sections 5 and Section 6 discusses some of the most powerful approaches for the efficient treatment of the memory term. Some remarks related to the numerical treatment of fractional partial differential equations are presented in Section 7 and some final comments are given in Section 8.

Basic Material and Notations
With the aim of fixing the notation and making available the most common definitions and properties for further reference, we recall here some basic notions concerning fractional calculus.
For α > 0 and any t 0 ∈ R, in the paper we will adopt the usual definitions for the fractional integral of Riemann-Liouville type for the fractional derivative of Riemann-Liouville type (2) and for the fractional derivative of Caputo type with m = α the smallest integer greater than or equal to α. We refer to any of the many existing textbooks on this subject (e.g., [1][2][3][4][5][6]) for an exhaustive treatment of the conditions under which the above operators exist and for their main properties. We just recall here the relationship between RL D α t 0 and C D α t 0 expressed as where T m−1 [ f ; t 0 ] is the Taylor polynomial of degree m − 1 for the function f about the point t 0 , Moreover, we will almost exclusively consider initial value problems of Cauchy type for FDEs with the Caputo derivative, i.e., for some assigned initial values y 0 , y . A few general comments will also be made regarding problems associated with partial differential equations.

Novel or Well-Established Methods?
Quite frequently, one sees papers whose promising title claims the presentation of "new methods" or "a family of new methods" for some particular fractional-order operator. Papers of this type immediately capture the attention of readers eager for new and good ideas for numerically solving problems of this type.
But reading the first few pages of such papers can be a source of frustration, since what is claimed to be new is merely an old method applied to a particular (maybe new) problem. Now it is understandable that sometimes an old method is reinvented by a different author, maybe because it can be derived by some different approach or because the author is unaware of the previously published result (perhaps because it was published under an imprecise or misleading title). In fractional calculus, however, a different and quite strange phenomenon has taken hold: well-known and widely used methods are often claimed as "new" just because they are being applied to some specific problem. It seems that some authors are unaware that it is the development of new ideas and new approaches that leads to methods that can be described as new-not the application of known ideas to a particular problem. Even the application of well-established techniques to any of the new operators, obtained by simply replacing the kernel in the integral (1) with some other function, cannot be considered a truly novel method, especially when the extension to the new operator is straightforward.
Most of the papers announcing "new" methods are instead based on ideas and techniques that were proposed and studied decades ago, and sometimes proper references to the original sources are not even given.
In fact, there are a few basic and powerful methods that are suitable and extremely popular for fractional-order problems, and many proposed "new methods" are simply the application of the ideas behind them. It may therefore be useful to illustrate the main and more popular ideas that are most frequently (re)-proposed in fractional calculus, and to outline a short history of their origin and development.

Polynomial Interpolation and Product-Integration Rules
Solving differential equations by approximating their solution or their vector field by a polynomial interpolant is a very old and common idea. Some of the classical linear multistep methods for ordinary differential equations (ODEs), specifically those of Adams-Bashforth or Adams-Moulton type, are based on this approach.
In 1954 the British mathematician Andrew Young proposed [7,8] the application of polynomial interpolation to solve Volterra integral equations numerically. This approach turns out to be suitable for FDEs since (6) can be reformulated as the Volterra integral equation The approach proposed by Young is to define a grid t n on the solution interval [t 0 , T] (very often, but not necessarily, equispaced, namely t n = t 0 + hn, h = (T − t 0 )/N) and to rewrite (7) in a piecewise way as then to replace, in each interval [t j , t j+1 ], the vector field f (u, y(u)) by a polynomial that interpolates to f on the grid. This approach is particularly simple if one uses polynomials of degree 0 or 1 because then one can determine the approximation solely on the basis of the data at one of the subinterval's end points (degree 0; the product rectangle method) or at both end points (degree 1; the product trapezoidal method); thus, in these cases one need not introduce auxiliary points inside the interval or points outside the interval. Neither of these methods can yield a particularly high order of convergence, but as we shall demonstrate in Section 4, the analytic properties of typical solutions to fractional differential equations make it very difficult and cumbersome to achieve high-order accuracy irrespective of the technique used. Consequently, and because these techniques have been thoroughly investigated with respect to their convergence properties [9] and their stability [10] and are hence very well understood, the product rectangle and product trapezoidal methods are highly popular among users of fractional order models.
Higher-order methods have occasionally been proposed [11,12] but-as indicated above and discussed in more detail in Section 4-they tend to require rather uncommon properties of the exact solutions to the given problems and therefore are used only infrequently. We also have to notice that the effects of the lack of regularity on the convergence properties of product-integration rules have been studied since 1985 for Volterra integral equations [13] and since 2004 for the specific case of FDEs [14].

Approximation of Derivatives: L1 and L2 Schemes
A classical numerical technique for approximating the Caputo differential operator from (3) is the so-called L1 scheme. For 0 < α < 1, the definition of the Caputo operator becomes The idea ( [15], Equation (8.2.6)) is to introduce a completely arbitrary (i.e., not necessarily uniformly spaced) mesh t 0 < t 1 < t 2 < . . . < t N and to replace the factor f (τ) in the integrand by the approximation This produces the approximation formula For smooth functions f (but only under this assumption!) and an equispaced mesh t j = t 0 + jh, the convergence order of the L1 method is O(h 2−α ).
By construction, the L1 method is restricted to the case 0 < α < 1. For α ∈ (1, 2), the L2 method ( [15], §8.2) provides a useful modification. In its construction, one starts from the representation which is valid for these values of α. Using now a uniform grid t j = t 0 + jh, one replaces the second derivative of f in the integrand by its central difference approximation, A disadvantage of this method is that it requires the evaluation for f at the point t n+1 = (n + 1)h which is located outside the interval [0, t n ].
The central difference used in the definition of the L2 method is symmetric with respect to one of the endpoints of the associated subinterval [t k , t k+1 ], not with respect to its mid point. If this is not desired, one may instead use the alternative on this subinterval. This leads to the L2C method for k = −1, Like the L2 method, the L2C method also requires the evaluation of f outside the interval [0, t n ]; one has to compute f ((n + 1)h) and f (−h). Both the L2 and the L2C method exhibit O(h 3−α ) convergence behavior for 1 < α < 2 if f is sufficiently well behaved; the constants implicitly contained in the O-terms seem to be smaller for the L2 method in the case 1 < α < 1.5 and for the L2C method if 1.5 < α < 2.
In the limit case α → 1, the L2 method reduces to first-order backward differencing, and the L2C method becomes the centered difference of first order; for α → 2 the L2 method corresponds to the classical second-order central difference.

Fractional Linear Multistep Methods
Fractional linear multistep methods (FLMMs) are less frequently used since their coefficients are, in general, not known explicitly but it is necessary to devise some algorithm for their (technically often difficult) computation. Nevertheless, since these methods allow us to overcome some of the issues associated with other approaches, it is worth giving a short presentation of their properties.
FLMMs were proposed by Lubich in 1986 [17] and studied in the successive works [18][19][20]. They extend to fractional-order integrals the quadrature rules obtained from standard linear multistep methods (LMMs) for ODEs.
Let us consider a classical k-step LMM of order p > 0 with first and second characteristic polynomials FLMMs generalizing LMMs (9) for solving FDEs (7) are expressed as where the convolution weights ω (α) n are obtained from the power series expansion of δ(ξ) −α , namely and the w n,j are some starting weights that are introduced to deal with the lack of regularity of the solution at the origin; they are obtained by solving, at each step n, the algebraic linear systems The intriguing property of FLMMs is that, unlike product-integration rules, they are able to preserve the same convergence order p of the underlying LMMs if the LMM satisfies certain properties: it is required that δ(ξ) has no zeros in the closed unit disc |ξ| ≤ 1 except for ξ = 1, and | arg δ(ξ)| < π for |ξ| < 1. Thus, high-order FLMMs are possible without requiring the imposition of artificial smoothness assumptions as is required for methods based on polynomial interpolation.
But the price to be paid for this advantage may be not negligible: the convolution weights ω (α) n are not known explicitly and must be computed by some (possibly sophisticated) method (a discussion for the general case is available in [17][18][19][20] while algorithms for FLMMs of trapezoidal type are presented in [21]). Moreover, high-order methods may require the solution of large or very large systems (11) depending on the equation order α and the convergence order p of the method; in some cases these systems are so ill-conditioned as to affect the accuracy of the method, a problem addressed in depth in [22].
One of the simplest methods in this family is obtained from the backward Euler method, whose generating function is δ(ξ) = (1 − ξ). Its convolution weights are hence the coefficients in the asymptotic expansion of (1 − ξ) −α , i.e., they are the coefficients in the binomial series and no starting weights are necessary since the convergence order is p = 1 and hence A p is the empty set. One recognizes easily that the so-called Grünwald-Letnikov scheme is obtained in this case. Although this scheme was discovered in the nineteenth century in independent works of Grünwald and Letnikov, its interpretation as an FLMM may facilitate its analysis.

Classical Approximations Will Not Give High-Order Methods
Solutions of fractional-derivative problems typically exhibit weak singularities. This topic is discussed at length in the survey chapter [23] and it is known since earlier works on Volterra integral equations [24,25]. This singularity is a consequence of the weakly singular behavior of the kernels of integral and fractional derivatives and its importance, from a physical perspective, is related to the natural emergence of completely monotone (CM) relaxation functions in models whose dynamics is governed by these operators [26,27]; CM relaxation behaviors are indeed typical of viscoelastic systems with strongly dissipative energies [28].
In the present section we shall examine the effects of the singular behavior on numerical methods, in the context of initial value problems such as (6).
To grasp quickly the main ideas, we focus on a very simple particular case of (6): the problem where 0 < α < 1 and, for the moment, we do not prescribe the initial condition at t = 0. The general solution of (12) is This solution lies in This implies that standard techniques for integer-derivative problems, which require that y ∈ C 1 [0, T] (or a higher degree of regularity), cannot be used here without some modification. In particular one cannot perform a Taylor series expansion of the solution around t = 0 because y (0) does not exist. What about the initial condition? If we prescribe a condition of the form y(0) = y 0 we get b = y 0 in (13), but the solution is still not in C 1 [0, T]. One might hope that a Neumann-type condition of the form y (0) = 0 would control or eliminate the singularity in the solution, but a consideration of (13) shows that it is impossible to enforce such a condition; that is, the problem C D α 0 y(t) = 1 on (0, T] with y (0) = 0 has no solution. This seems surprising until we recall a basic property of the Caputo derivative from ([1], Lemma 3.11): if m − 1 < β < m for some positive integer m and z ∈ C m [0, T], then lim t→0 (12) one has y ∈ C 1 [0, T], then taking the limit as t → 0 in (12) we get 0 = 1, which is impossible. That is, any solution y of (12) cannot lie in C 1 [0, T].
One can present this finding in another way: for the problem C D α , then one must have f (0) = 0. This result is a special case of ([1], Theorem 6.26).
Conditions such as f (0) = 0 (and the even stronger conditions listed in Remark 1) impose an artificial restriction on the data f that should be avoided. Thus we continue by looking carefully at the consequence of dealing with a solution of limited smoothness.
Returning to (12) and imposing the initial condition y(0) = b, the unique solution of the problem is given by (13), where b is now fixed. Most numerical methods for integer-derivative initial value problems are based on the premise that on any small mesh interval [t i , t i+1 ], the unknown solution can be approximated to a high degree of accuracy by a polynomial of suitable degree. But is this true of the function (13)? We now investigate this question.
Consider the interval [0, h], where h = t 1 . This is the mesh interval where the solution (13) is worst behaved. Proof. Our hypothesis is that |t α − (c 0 + c 1 t)| ≤ Ch β for all t ∈ [0, h] and some constant C that is independent of h and t. Consider the values t = 0, t = h/2 and t = h in this inequality: we get . But this cannot be true unless β ≤ α, since the left-hand side is simply a multiple of h α because α = 1.
Lemma 1 says that the approximation of t α on [0, h] by any linear polynomial is at best O(h α ). But the order of approximation O(h α ) of t α on [0, h] is also achieved by the constant polynomial 0. That is: using a linear polynomial to approximate t α on [0, h] does not give an essentially better result than using a constant polynomial. In a similar way one can show that using polynomials of higher degree does not improve the situation: the order of approximation of t α on [0, h] is still only O(h α ). This is a warning that when solving typical fractional-derivative problems, high-degree polynomials may be no better than low-degree polynomials, unlike the classical integer-derivative situation.
One can generalize Lemma 1 to any α > 0 with α not an integer, obtaining the same result via the same argument. Furthermore, our investigation of the simple problem (12) can be readily generalised to the much more general problem (6); see ( [1], Section 6.4).

Implications for the Construction of Difference Schemes
The discussion earlier in Section 4 implies that, to construct higher-order difference schemes for typical solutions of problems such as (12) and (6), one must use non-classical schemes, since the classical schemes are constructed under the assumption that approximations by higher-order polynomials gives greater accuracy. The same idea is developed at length in [29], one of whose results we now present.
Note: although [29] discusses only boundary value problems, an inspection reveals that its arguments and results are also valid (mutatis mutandis) for initial value problems such as (6) when f = f (t), i.e., when the problem (6) is linear.
Let α > 0 be fixed, with α not an integer. Consider the problem D α y = f on [0, T] with y(0) = 0. Assume that the mesh on [0, T] is equispaced with diameter h, i.e., x i = ih for i = 0, 1, . . . , N. Suppose that the difference scheme used to solve D α y = f at each point It is reasonable to assume that |a ij | = O(h −α ) for all i and j since we are approximating a derivative of order α (one can check that almost all schemes proposed for this problem have this property).

Theorem 1.
Assume that our scheme achieves order of convergence p for some p > α when f (t) = Ct k for all k ∈ {0, 1, . . . , p − α − 1 }. Then for each fixed positive integer i, the coefficients of the scheme must satisfy the following relationship: Proof. Fix k ∈ {0, 1, . . . , p − α − 1 }. This implies that k < p − α. Choose for simplicity Then the true solution of our initial value problem is y(t) = t k+α . Fix a positive integer i. Then Hence, using the hypothesis that our scheme achieves order of convergence p and |a Theorem 1 implies that schemes that fail to satisfy (14) cannot achieve an order of convergence greater than O(h α ) at each mesh point. (This is consistent with the approximation theory result of Lemma 1.) For example, in the case 0 < α < 1, it follows from Theorem 1 that the well-known L1 scheme is at best O(h α ) accurate.

Remark 2.
To avoid the consequences of results such as Theorem 1, one can impose data restrictions such as f (0) = 0. This is discussed in ( [29], Section 5), where theoretical and experimental results show an improvement in the accuracy of standard difference schemes, but only for a restricted class of problems.

Failed Approaches to Treat Non-Locality
Non-locality is one of the major features of fractional-order operators. Indeed, fractional integrals and derivatives are often introduced as a mathematical formalism with the primary purpose of encompassing hereditary effects in the modeling of real-life phenomena when theoretical or experimental observations suggest that the effects of external actions do not propagate instantaneously but depend on the history of the system.
On the one hand, non-locality is a very attractive feature that has driven most of the interest and success of the fractional calculus; on the other hand, non-locality introduces severe computational difficulties that researchers try to overcome in different ways.
Unfortunately, some attempts to treat non-locality are unreliable and lead to wrong results. This is the case of the naive implementation of the "finite memory principle" consisting in simply neglecting a large amount of the history solution; since on the basis of this technique it is however possible to devise more sophisticated and accurate approaches, we postpone its discussion to Section 6.
We have also to mention methods based on some kind of fractional Taylor expansion of the solution, such as where the coefficients Y k are determined by some suitable numerical technique.
When solving integer-order differential equations, it is possible to use Taylor expansions to approximate the solution at a given point t 1 and hence reformulate the same expansion by moving the origin to the new point t 1 , thus generating a step-by-step method in which the approximation at t n+1 is evaluated on the basis of the approximation at t n (or at additional previous points).
With fractional-order equations, instead, the above expansion holds only with respect to the point t 0 (the initial or starting point of the fractional differential operator) and it is not possible to generate a step-by-step method. Expansions of this type are therefore able to provide an accurate approximation only locally, i.e., very close to the starting point t 0 ; consequently, as discussed in [30], methods based on these expansions are usually unsuitable for FDEs.
Another failed approach is based on an attempt to exploit the difference between y(t n+1 ) and y(t n ) in the integral formulation (7): rewrite the solution at t n+1 as some increment of the solution at t n , i.e., y(t n+1 ) = y(t n ) + G n (t, y(t)), then approximate the increment by replacing the vector field f (t, y(t)) in both integrals of (15b) by its (first-order) interpolating polynomial at the grid points t n−1 and t n . Methods of this kind read as y n+1 = y n + P n (y n−1 , y n ), (16) with P n a known function obtained by standard interpolation techniques. Approaches of this kind are called two-step Adams-Bashforth methods and attract researchers since they apparently transform the non-local problem into a local one (and thus, a difficult problem into a much easier one); in (15b) G n (t, y(t)) is still a non-local term but these methods are strangely becoming quite popular despite the fact that, as discussed in [31], they are usually unreliable because in most cases they attempt to approximate the (implicitly) non-local contribution G n (t, y(t)) by some purely local term. Using interpolation at the points t n−1 and t n to approximate f (t, y(t)) over the much larger intervals [t 0 , t n ] and [t 0 , t n+1 ] is completely inappropriate. It is well known that polynomial interpolation may offer accurate approximations within the interval of the data points, in this case in [t n−1 , t n ]; but outside this interval (where an extrapolation is made instead of an interpolation), the approximation becomes more and more inaccurate as the integration intervals [t 0 , t n ] and [t 0 , t n+1 ] in (15b) become larger and larger, i.e., as the integration proceeds and n increases.
The consequence is that completely untrustworthy results must be expected from methods based on this idea.
Note that the fundamental flaw of this approach is not the decomposition (15) but the local (and hence inappropriate) way (16) in which the history is handled. Indeed, it is possible to construct technically correct and efficient algorithms on the basis of (15), for example if one treats the increment term (15b) by a numerical method that is cheaper in computational cost than the method used for the local term [32].

Some Approaches for the Efficient, and Reliable, Treatment of the Memory Term
The non-locality of the fractional-order operator means that it is necessary to treat the memory term in an efficient way. This term is commonly identified to be the source of a computational complexity which, especially in problems of large size, requires adequate strategies in order to keep the computational cost at a reasonable level, and indeed this observation has led to many investigations of (more or less successful) approaches to reduce the computational cost. It should be noted however that the high number of arithmetic operations is not the only potential difficulty that the memory term introduces. There is another more fundamental issue, which seems to have attracted much less attention: the history of the process not only needs to be taken into account in the computation but, in order to be properly handled, also needs to be stored in the computer's memory. While the required amount of memory is usually easily available in algorithms for solving ordinary differential equations, the memory demand may be too high for efficient handling in the case of, e.g., time-fractional partial differential equations where finite element techniques are used to discretize the spatial derivatives.
Most finite-difference methods for FDEs require at each time step the evaluation of some convolution sum of the form y n = φ n + n ∑ j=0 c j y n−j or y n = φ n + n ∑ j=0 c j f (t n−j , y n−j ), n = 1, 2, . . . , N, where φ n is a term which mainly depends on the initial conditions or other known information.
A naive straightforward evaluation of (17) has a computational cost proportional to O N 2 and, when integration with a small-step size or on a large integration interval is required, the value of N can be extremely large and leads to prohibitive computational costs.
For this reason different approaches for a fast, efficient and reliable treatment of the memory term in non-local problems have been devised. We provide here a short description of some of the most interesting methods of this type. The influence of these approaches on the memory requirements will be addressed as well.

Nested Mesh Techniques
Several different concepts can be subsumed under the heading of so-called nested meshes. The general idea is based on the observation that the convolution sum in Equation (17) stems from a discretization of a fractional integral or differential operator that uses all the previous grid points as nodes. One can then ask whether it is really neccessary to use all these nodes or whether one could save effort by including only a subset of them by using a second, less fine mesh-i.e., a mesh nested inside the original one.

The Finite Memory Principle
The simplest idea in this class is the finite memory principle ([5], §7.3). It is based on defining a constant τ > 0, the so-called memory length, and replacing (for t > t 0 + τ) the memory integral term that extends over the interval [t 0 , t] by the integral over [t − τ, t] with the same integrand function. Technically speaking, this amounts to "forgetting" the entire history of the process that is more than τ units of time in the past, so the memory has a finite and fixed length τ instead of the variable length t − t 0 that may, in a long running process, be very much longer. From an algorithmic point of view, the finite memory method truncates the convolution sum in Equation (17) to a sum where j runs from n − ν to n for some fixed ν. This has a number of significant advantages:

•
The computational complexity of the nth time step is reduced from O(n) to O(1). Therefore, the combined total complexity of the overall method with N time steps is reduced from O(N 2 ) to O(N). • At no point in time does one need to access the part of the process history that is more than ν time steps in the past. Therefore, all those previous time steps can be removed from the active memory, and the memory requirement also decreases from O(N) to O(1).
Unfortunately, this idea also has severe drawbacks. Specifically, it has been shown in [33] that the convergence order of the underlying discretization technique is lost completely. In other words, one cannot prove that the algorithm converges as the (maximal) step size goes to 0. Therefore, the method is not recommended for practical use.

Logarithmic Memory
To overcome the shortcomings of the finite memory principle, two related but not identical methods, both of which are also based on the nested mesh concept, have been developed in [33,34]. The common idea of both these approaches is the way in which the distant part of the memory is treated. Rather than ignoring it completely as the finite memory principle does, they do sample it, but on a coarser mesh; indeed the fundamental principle is to introduce not just one coarsening level, but to use, say, the step size h on the most recent part of the memory, step size wh (with some parameter w > 1) on the adjacent region, w 2 h on the next region, etc. The main difference between the two approaches of [33,34] then lies in the way in which the transition points from one mesh size to the next are chosen.
Specifically, as indicated in Figure 1, the method of Ford and Simpson [33] starts at the current time and fills subintervals of prescribed lengths from right to left with appropriately speced mesh points. This will lead to a reduction of the computational cost to O(N log N) while retaining the convergence order of the underlying scheme [33]. However, as indicated in Figure 1, it is common that the left end point of the leftmost coarsely subdivided interval does not match the initial point. In this case, one can either fill the remaining subinterval at the left end of the full interval with a fine mesh (which increases the computational cost but also reduces the error) or simply ignore the contribution from this subinterval (which reduces the computational complexity but slightly increases the error; however, since the memory length still grows with the number of steps, this does not imply the complete loss of accuracy observed in the finite memory principle). In either case, grid points from the fine mesh that are not currently used in the nested mesh may become active again in future steps. Therefore, all previous grid points need to be kept in memory, so the required amount of memory space remains at O(N). In contrast, the approach of Diethelm and Freed [34] starts to fill the basic interval from left to right, i.e., it begins with the subinterval with the coarsest mesh and then moves to the finer-mesh regions. The final result is also a method with an O(N log N) computational cost, and with the same convergence order as the Ford-Simpson method; but its selection strategy for grid points implies that points that are inactive in the current step will never become active again in future steps, and consequently the history data for these inactive points can be eliminated from the main memory. This reduces the memory requirements to only O(log N).

A Method Based on the Fast Fourier Transform Algorithm
An effective approach for the fast evaluation of the convolution sums in (17) was proposed in [35,36]. The main idea is to split each of these sums in a way that enables the exploitation of the fast Fourier transform (FFT) algorithm. To provide a concise description, let us introduce the notations T p (n) = n ∑ j=p c n−j g j , S p,q (n) = q ∑ j=p c n−j g j , n ≥ p, where g j = y j or g j = f (t j , y j ) according to the formula used in (17). Thus the numerical methods described by (17) can be recast as y n = φ n + T 0 (n), n = 1, 2, . . . , N.
The algorithm described in [35,36] is based on splitting T 0 (n) into one or more partial sums of type S p,q (n) and just one final convolution sum T p (n) of a maximum (fixed) length r. Thus, the computation is simply initialized as T 0 (n) = n ∑ j=0 c n−j g j n ∈ {1, 2, . . . , r − 1} and the following r values of T 0 (n) are split into the two terms T 0 (n) = S 0,r−1 (n) + T r (n) n ∈ {r, r + 1, . . . , 2r − 1}.
For clarity, the diagram in Figure 2 illustrates the way in which the computation on the main triangle T 0 = (n, j) : 0 ≤ j ≤ n ≤ N is split into partial sums identified by the (red-labeled) squares S p,q = (n, j) : q + 1 ≤ n ≤ q + (p, q) , p ≤ j ≤ q and final blocks denoted by the (blue-labeled) triangles T p = (n, j) : p ≤ j ≤ n ≤ p + r − 1 . Each of the final blocks T r (n), n = r, r + 1, . . . , ( + 1)r − 1, is computed by direct summation requiring r(r + 1)/2 floating-point operations. The evaluation of the partial sums S q,p (n) can instead be performed by the FFT algorithm (see [37] for a comprehensive description) which requires a number of floating-point operations proportional to 2 log 2 2 , with = (p, q) the length of each partial sum S q,p (n), since r is a power of 2.
In the optimal case in which both r and N are powers of 2, each partial sum S p,q that must be computed together with its length, number and computational cost is described in Table 1. Table 1. Partial sums, their length, number and computational cost for the evaluation of T 0 (N).

Partial Sums
Len. Furthermore, N/r final blocks T r , each of length r, are also computed in r(r + 1)/2 floating-point operations and hence the total amount of floating point operations is proportional to

No. Cost
which turns out, for sufficiently large N, to be consistently significantly smaller than the number O N 2 required by the direct summation of T 0 (N). Although the whole procedure may appear complicated and requires some extra effort in coding, it turns out to be quite efficient since it can be applied to different methods of the form (17) and does not affect their accuracy. This preservation of accuracy is because the technique does take into account the entire history of the process in the same way as the straightforward approach mentioned above whose computational cost is O(N 2 ). Thus, one does need to keep the entire history data in active memory, but one avoids the requirement of using special meshes. All the Matlab codes for FDEs described in [10,21,38], and freely available on the Mathworks website [39], make use of this algorithm.

Kernel Compression Schemes
Although the terminology "kernel compression scheme" has been introduced only recently for a few specific works [40][41][42], we use it here to describe a collection of methods that were proposed at various times by various authors and are all based on essentially the same principle: approximation of the solution of a non-local FDE by means of (possibly several) local ODEs. We provide here just the main ideas underlying this approach and we will refer the reader to the literature for a more comprehensive coverage of the subject.
Actually, these are standalone methods (usually classified as nonclassical methods [43]) and not just algorithms improving the efficiency of the treatment of the memory term; for this reason they could have been discussed in Section 3 along with the other methods for FDEs. But since one of their main achievements (and the motivation for their introduction) is to handle memory and computational issues related to the long and persistent memory of fractional-order problems, we consider it appropriate to discuss them in the present section.
For ease of presentation we consider only 0 < α < 1 but the extension to any positive α is only a technical matter. The basic idea starts from some integral representation of the kernel of the RL integral (1), e.g., which, thanks to standard quadrature rules, can be approximated by exponential sums where the error e K (t) and the computational complexity related to the number K of nodes and weights depend on the choice among the many possible quadrature rules. When applying this approximation instead of the exact integral in the integral formulation (7), the solution of the FDE (6) is rewritten as Each of the integrals in (20) is actually the solution of an initial value problem: which can be numerically approximated by standard ODE solvers, yielding approximations y n on some grid {t n }. If the quadrature rule is chosen so as to make the error E K (t) so small that it can be neglected, an approximate solution of the original FDE (6) can be obtained step-by-step as where each y In practice, a non-local problem (the FDE) with non-vanishing memory is replaced by K local problems (the ODEs) each demanding a smaller computational effort and the memory storage is restricted to O pK if a p-step ODE solver is used for each of the ODEs (21).
Obviously, the idea sketched above requires several further technical details to work properly. First, an accurate error analysis is needed to ensure that the overall error is below the target accuracy. This is a very delicate task because it involves the investigation of the interaction between the quadrature rule used to approximate the integral in (20) and the ODE solver applied to the system (21), which can be a highly nontrivial matter. Moreover, some substantial additional problems must be addressed. For instance, A-stable methods should generally be preferred when solving the system (21) since some of the r k > 0 can be very large and give rise to stiff problems.
A non-negligible issue is that it is not possible to find a quadrature rule approximating (18) in a uniform manner with respect to all relevant values of t, i.e. with the same accuracy for any t ≥ t 1 where t 1 is the first mesh point to the right of the initial point t 0 or for all t ≥ t 0 (in either case, the singularity at t 0 indeed makes the integral quite difficult to be approximated). To overcome this difficulty, several different approaches have been proposed.
In a series of pioneering works [44][45][46], where a complex contour integral is chosen to approximate the kernel, the integration interval [t 0 , T] is divided into a sequence of subintervals of increasing lengths, and different quadrature rules (on different contours C) are used in each of these intervals. While high accuracy can be obtained, this strategy is quite complicated and requires the use of more expensive complex arithmetic.
In [40][41][42] the integral in (7) is divided into local and history terms Local term for a fixed δt > 0. This confines the singularity of the kernel to the local term, which can be approximated by standard methods for weakly singular integral equations (e.g., a product-integration rule) with a reduced computational cost and an insignificant memory requirement. The kernel in the history term no longer contains any singularity and can be safely approximated by (19) which applies now just for t > δt.
To obtain the highest possible accuracy, Gaussian quadrature rules are usually preferred. A rigorous and technical error analysis is necessary to tune parameters in an optimal way. Several implementations of approaches of this kind have been proposed (e.g., see [47][48][49][50][51]) but owing to their technical nature, a comparison to decide which method is in general the most convenient is difficult; we just refer to the interesting results presented in [52].

Some Remarks about Fractional Partial Differential Equations
Even though this paper is essentially devoted to the numerical solution of ordinary differential equations of fractional order and the computational treatment of the associated differential and integral operators, a few comments should be made regarding numerical methods for partial fractional differential equations (PDEs).

Remark 3.
The issues discussed in Section 4 are relevant to partial differential equations also. Indeed, it is shown in [53] that imposing excessive smoothness requirements on the solutions to a partial differential equation (e.g., for the sake of simplifying the error analysis or for obtaining a higher convergence order) has drastic implications regarding the class of admissible problems; in particular, the choice of the forcing function f (x, t) in a linear initial-boundary value problem will then completely determine the initial condition in the problem.
Our second remark regarding partial differential equations deals with a totally different aspect.

Remark 4.
Typical algorithms for time-fractional partial differential equations contain separate discretisation techniques with respect to the time variable and the space variable(s). A current trend is to employ a very high order method for the discretisation of the (non-fractional) differential operator with respect to the space variable. While this might seem an attractive approach at first sight, it has a number of disadvantages. Specifically, while this leads to a smaller discretization error in the space variable, it also increases the algorithm's overall complexity and makes the understanding of its properties more difficult. This complexity would be acceptable if the overall error could be reduced significantly. But since the overall error comprises not only the error from the space discretisation but also the contribution from the time approximation, it follows that to reduce the overall error, one must force this latter component to be very small also. As indicated above, we cannot expect to achieve a high convergence order in this variable, so the only way to reach this goal is to choose the time step size very small (in comparison with the space mesh size). From Section 6 we conclude that a standard algorithm with a higher-than-linear complexity is likely to lead to prohibitive run times, and even if the time discretisation uses a method with a linear or almost linear complexity, this very small step size requirement will still imply a high overall cost. Therefore, the use of a high-order space discretisation in a time-fractional partial differential equation is usually inadvisable.

Concluding Remarks
In this paper we have tried to describe some issues related to the correct use of numerical methods for fractional-order problems. Unlike integer-order ODEs, numerical methods for FDEs are in general not taught in undergraduate courses and, very often, non-specialists are unaware of the peculiarities and major difficulties that arise in the numerical treatment of FDEs and fractional PDEs.
The availability of only a few well-organized textbooks and monographs in this field, together with the presence of many incorrect results in the literature, makes the situation even more difficult.
Some of the ideas collected in this paper were discussed in the lectures of the Training School on "Computational Methods for Fractional-Order Problems", held in Bari (Italy) during 22-26 July 2019, and promoted by the Cost Action CA15225-Fractional-order systems: analysis, synthesis and their importance for future design.
We believe that the scientific community should make an effort to raise the level of knowledge in this field by promoting specific academic courses at a basic level and/or by organizing training schools.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

CM
Complete monotonicity FDE Fractional differential equation FLMM Fractional linear multistep method LMM Linear multistep method ODE Ordinary differential equation PDE Partial differential equation