Reinterpretation of Multi-Stage Methods for Stiff Systems: A Comprehensive Review on Current Perspectives and Recommendations

: In this paper, we compare a multi-step method and a multi-stage method for stiff initial value problems. Traditionally, the multi-step method has been preferred than the multi-stage for a stiff problem, to avoid an enormous amount of computational costs required to solve a massive linear system provided by the linearization of a highly stiff system. We investigate the possibility of usage of multi-stage methods for stiff systems by discussing the difference between the two methods in several numerical experiments. Moreover, the advantages of multi-stage methods are heuristically presented even for nonlinear stiff systems through several numerical tests.


Introduction
Most time-dependent differential equations are usually solved by multi-stage (one-step) method or multi-step method [1][2][3]. In general, there seems to be no significant difference in the structure between them when the multi-stage method is applied to get an initial guess for the multi-step method [4]. Nonetheless, a comparison of both methods has attracted quite a lot of interest from the viewpoints of convergence, stability, practical computations, numerical efficiency, etc. [5][6][7][8][9][10][11]. Comparisons in this regard do not take into account the impact of advances in computer science and technologies such as artificial intelligence (AI) or parallel computation, etc. Considering the impact, a new perspective to compare the potentials of both methods should be investigated as well as existing comparative studies. First of all, it is well known that the highest order of an A-stable multi-step method is two, so lots of research [12][13][14][15][16][17][18][19][20][21][22][23][24] developing higher order methods have focused on either multi-step methods satisfying some less restrictive stability condition or multi-stage methods which combine A-stability with high-order accuracy [2,[25][26][27][28][29]. In addition, multi-stage methods such as Runge-Kutta (RK) type methods do not require any additional memory for function values at previous steps since it does not use any previously computed values [30][31][32]. On the other hand, multi-step methods require additional memory in the sense that they use previously computed function values and have insufficient function values for initial data. Multi-stage methods are comparable with multi-step methods for nonlinear stiff problems and have no restriction to express initial data contrast to the other. There seems not to be such a clear a priori distinction between multi-stage and multi-step methods.
Another interesting point of view to find more efficient methods is quite susceptible to stiffness and nonlinearity of the given problem. For nonlinear stiff problems, a multi-step method is needed to evaluate function values only once at each iteration in a nonlinear solver, whereas multi-stage methods require several function evaluations at each iteration. This disadvantage of the multi-stage method can be ignored by the authors' recent research [33]. The authors showed numerically that one stage of the multi-stage method is equivalent to one step of the multi-step method for simple ordinary differential equation (ODE) systems. However, the multi-step methods such as the backward differentiation formula (BDF) are usually recommended to apply nonlinear stiff problems because the process of solving the nonlinear system of equation is also expensive computationally. In the process of solving nonlinear stiff problems by a multi-stage method, it generally generates a system M d ⊗ M s , where d and s represent the dimension of the given problem and the number of stages used in the multi-stage method, respectively. Here, the notation M k represents a matrix with the size k × k and the notation ⊗ denotes a Kronecker product. On the other hand, a multi-step method needs to solve only a system of size d × d.
The purpose of this paper is to investigate and compare the properties of the multi-stage and the multi-step methods for d-dimensional stiff problems described by Most nonlinear stiff problems are solved by multi-step methods rather than multi-stage methods since the multi-stage methods usually transform nonlinear stiff problems into bigger nonlinear systems, as mentioned in the previous paragraph. To solve such nonlinear systems efficiently, one has to consider both nonlinear and linear solvers. The nonlinear systems are usually solved by using an iteration technique such as Newton-like iterations, which incur considerable computation costs. There are various Newton-like iterations. Among them, a simplified Newton iteration is developed in connection with the development of computer process capacity [34][35][36][37]. Different nonlinear system solvers generate linear systems correspondingly. It means that the nonlinear system solver should be well-selected to adapt efficient linear solvers such as the eigenvalue decomposition method. Note that efficient linear solvers have also been well-studied [1,2,38,39]. An eigenvalue decomposition combined with simplified Newton iteration can apply to a multi-stage method. The resulting multi-stage method generates the same matrix, regardless of integration or iteration, as an object of decomposition for solving a linear system induced by the simplified Newton iteration. It allows for decomposing the matrix only once throughout the whole process. As a result, applying this combination to multi-stage methods highlights the advantage of multi-stage methods by reducing computational costs to the level of the costs required from multi-step methods without any loss of the original advantages of multi-stage methods, which is the main contention of this paper.
The remaining parts of this paper are as follows. We briefly describe the multi-step and multi-stage methods and simplified Newton iteration in Section 2. To support theoretical analysis, we present preliminary numerical results in Section 3. Finally, in Section 4, all results are summarized and further possibilities are discussed.

Methods
In this subsection, we briefly describe ODE solvers classified by mathematical theory. Numerical methods for ODEs fall naturally into two categories: one is 'multi-stage method' using one starting value at each step and the other is 'multi-step method' or 'multi-value method' based on several values of the solution. We deal with the theories of two methods in terms of convergence and stability. The multi-step method has a critically bad stability property with a higher convergence rate that can not actually be used. Due to these reasons, the third-order RK method (RK3) and the third-order BDF (BDF3) are considered as examples of multi-stage methods and multi-step methods. Note that the higher order multi-step method is also available, but it has very low practical use.
The general form of the multi-step methods [1,3,7,26,38,40,41] is described by Here, the coefficients a 0 , . . . , a s , b −1 , b 0 , . . . , b s are constants. If method (2) use s + 1 previous solution values with either a s = 0 or b s = 0, the method is called an s + 1 step method. A BDF method is the most efficient linear multi-step methods among several multi-step methods [40]. It is composed of the coefficients b p−1 = · · · = b 0 = 0 and the others chosen such that the method with the convergence order of s convergence order s. Thus, the s-step BDF has s-th convergence order. The BDF3 is given by Since implicit A-stable linear multi-step methods have convergence order of at most 2, second-order BDF can be A-stable, but the method can not be A-stable with order more than 3. The stability of BDF3 is almost A-stable [38].
An explicit RK method has been developed by Runge, Heun, and Kutta based on a Euler method [3,40]. Later, an implicit RK was developed for stiff problems based on several quadrature rules. RK methods have the following form: or an equivalent form of Butcher tableau One can specify a particular RK method by providing the number of stage s and all elements of the Butcher tableau, a ij (1 ≤ i, j ≤ s), b i and c i (i = 1, . . . , s). There is a popular implicit RK method for solving the stiff problem, which is called a collocation method. The collocation method is changed depending on the choice of the collocation points. For more details on the collocation method, one can refer to [3,38]. If we select uniform collocation points defined by c i = i/3 (1 ≤ i ≤ 3), we can obtain a third-order collocation method with having the following butcher table: Note that the order of the stage and convergence for the method (5) are both three as shown in the convergence analysis in [3,38]. Furthermore, the stability of (5) demonstrated through Dahlquist's problem is almost L-stable.

Simplified Newton Iteration and Eigenvalue Decomposition Method
To explicate a simplified Newton iteration proposed by Liniger and Willoughby [10], we consider the following nonlinear system obtained by RK-type methods, Equation (6) is equivalent to a system of equations described by where and I d is d-dimensional identity matrix. By applying Newton iteration to the nonlinear system of Equation (7), we can get a linear system of the form where J is a block diagonal matrix that consists of Jacobians ∂ f Usually, one Newton iteration needs several calculations of the Jacobian which requires lots of computational costs. To reduce such costs, all Jacobians ∂ f ∂y (t n + c i h, y n + z i ) are replaced by ∂ f ∂y (t n , y n ). This process is called 'simplified Newton iteration'. The simplified Newton iteration for (7) leads (9) to the formula where J := ∂ f ∂y (t n , y n ). Each iteration requires s times evaluation of f and the calculations of a d · s-dimensional linear system. Note that, by using the simplified Newton iteration, the matrix (I − hA ⊗ J) is the same for all iterations, so the decomposition method for solving the resulting linear system can be needed only once. For the linear system, we consider an eigenvalue decomposition technique in that it decomposes the given d · s dimensional linear system into several s-dimensional linear systems. In the view of computational efficiency, it is more efficient to calculate several small size systems even if it is a complex system, rather than to calculate one big size system. Note that only a simplified Newton iteration (9) enables usage of eigenvalue decompositions that cannot be applicable to traditional Newton iteration (8). The eigenvalue decomposition method for (9) is proposed independently by Butcher [31] and Bickart [30]. The main ideas of the method are eigenvalue decomposition of the matrix A −1 = TΛT −1 and linear transformation of the vector Z k . By transforming W k = (T −1 ⊗ I)Z k , the iteration (9) becomes equivalent to In a general case of the three-stage implicit RK method such as (5), the inverse matrix of A has an eigenvalue decomposition as follows: whereγ is one real real eigenvalue,α ± iβ are one complex eigenvalue pair and u 0 , and u 1 ± v 1 are eigenvectors corresponding toγ,α ± iβ, respectively. Therefore, the matrix in (10) can be rewritten as with γ =γ/h, α =α/h, β =β/h so that (10) can be split into two linear systems of dimension d and 2d, respectively. Moreover, the 2d-dimensional real valued subsystem can be transformed to the following d-dimensional complex valued system In terms of computational cost, the number of multiplication to solve (13) is approximately 4d 3 /3, since the complex multiplication consists of four real multiplications. Then, the total multiplication number for (12) is about 5d 3 /3, while the number of multiplications for decomposing the untransformed matrix (I − hA ⊗ J) in (9) is about (3d) 3 /3. Thus, we can reduce the number of multiplications to about 80% by calculating (12) instead of directly calculating the inverse of the matrix of (I − hA ⊗ J) in (10). Finally, to solve the transformations Z k = (T ⊗ I)W k , it additionally requires a multiplication of O(n). This difference becomes more apparent as the size of the matrix (or the numbers of stage) increases.

Numerical Comparison
In this section, we experiment five commonly used physical examples for comparison of both methods. In Sections 3.1-3.3, the BDF3 method (3) and RK3 (4) with its butcher table (5) are used as an example of multi-step and multi-stage methods, respectively. The initial guess for BDF3 is taken by exact values. Both methods use the traditional Newton iteration for solving nonlinear systems. In Section 3.3, especially, we measure CPU-time to compare the two methods in terms of accuracy and efficiency and simplified Newton iteration is used for a nonlinear solver. In Sections 3.4-3.5, we use RADAU5 and ODE15s representing a multi-stage and a multi-step method, respectively, which numerical codes are well optimized and open-source. Note that RADAU5, one of multi-stage methods, has convergence order 5 and stage order 3 [38] and ODE15s, one of multi-step methods, included MATLAB library, has variable orders from 1 to 5 [42]. Remarkably, RADAU5 has applied the eigenvalue decomposition and simplified Newton iteration. All numerical simulations are executed with the software MATLAB 2010b (Mathworks, Natick, MA, USA) under OS WINDOWS 7 (Microsoft, Redmond, WA, USA). Note that most numerical results in this section are repeatable even if different computational resources are used.

Simple Linear ODE
As the first example, we consider the Prothero-Robinson problem [29], f (t, y(t)) = ν(y(t) − g(t)) + g (t), t ∈ (0, 10], y(0) = g(0), which presents a stiffness by varying the parameter ν. The analytic solution of problem is given by y(t) = g(t). To compare the error behaviors of the two methods, we set up the parameter ν = −1.0 × 10 6 so that the given problem can be highly stiff. Here, the exact solution of this problem is set by g(t) = sin(t). In Figure 1, we display absolute errors |y(t i ) − y i | at each integration step in a log scale obtained by the two methods with different time step sizes h = 2 −k , (a) k = 1, (b) k = 2 and (c) k = 3. One can see that the error of BDF3 (Red) has magnitude (a) 1.0 × 10 −7 , (b) 1.0 × 10 −8 , and (c) 1.0 × 10 −9 . The error of RK3 (Blue) has magnitude (a) 1.0 × 10 −9 , (b) 1.0 × 10 −10 , and (c) 1.0 × 10 −11 . All three graphs in Figure 1 show that RK3 has better accuracy than BDF3. Additionally, to demonstrate the meaning of the stage of multi-stage methods and the step of multi-step methods, we set up a time step size of the multi-step method, BDF3, ash = h/3. The result of BDF3 withh = h/3 is labeled as BDF3c hereafter. It can be seen that the result from BDF3c (Black) has the same accuracy, compared with RK3. Therefore, it is sufficient to see a comparison of RK3 and BDF3c for further comparison.

Nonlinear Stiff ODE System: Multi-Mode Problem
As the second example, we consider a nonlinear ODE system based on the Prothero-Robinson problem. The system is given by where g(t) = sin(t), 1 N = (1, . . . , 1) T ∈ R N and N is the number of dimension. The exact solution is Y(t) = sin(t) · 1 N . The stiffness of (15) can be controlled by the eigenvalues of the matrix Λ, where Λ is diagonal matrix that has elements λ i = 1.0e + k i (i = 1, . . . , N), k i is random integer between 0 and 6. In addition, a linearity of the problem depends on the parameter δ. In this experiment, δ = 1 and δ = 5 are taken for linear and nonlinear cases, respectively. The parameter set (N, h) = (100, 2 −3 ) is used for both linear and nonlinear cases. As similar to the previous subsection, the error behaviors of two methods for both linear and nonlinear cases are observed over time, and the results are plotted in Figure 2. The error is measured as L ∞ -norm at each integration step, ||Y(t i ) − Y i || ∞ . For the nonlinear case, a traditional Newton iteration is used for a linearization. As mentioned in the previous subsection, BDF3c uses a smaller time step sizeh = h/3 and is compared with RK3 with time step h. Just in case, we mention that BDF3 with time step h is not appropriate to compare RK3 with the same time step size because of the meaning of the stage, explained in the previous subsection. In the linear case, δ = 1, RK3, and BDF3c have similar error behaviors as 1.0 × 10 −5.544 and 1.0 × 10 −5.253 at the final time point, respectively. In the nonlinear case, δ = 5, RK3, and BDF3 also have similar error behaviors as 1.0 × 10 −5.378 and 1.0 × 10 −5.123 at the final time, respectively.

Linear PDE-Heat Equation
We consider a linear partial differential equation (PDE), the heat equation generally described by with initial value u(0, x) = sin(πx) + 1 2 sin(3πx) and boundary conditions u(t, 0) = u(t, 1) = 0. The exact solution is given by u(x, t) = e −π 2 t sin(πx) + 1 2 e −(3π) 2 t sin(3πx). This problem is intended to compare two methods for solving big size stiff problems induced from PDE by spatial discretization such as Method of Lines. For the spatial discretization, we use the second-order central difference after evaluating at x = x j (x j = j N ). Then, the resulting system becomes a N-dimensional system of time dependent ODE. That is, the resulting system can be a big size ODE system depending on the discretization. Note that, to avoid unnecessary computational costs of the multi-stage methods described in the previous sections, we employ the multi-stage methods by combining an efficient linear solver such as an eigenvalue decomposition technique.
To examine the numerical accuracy of two methods for big size stiff systems, we integrate this problem by setting the system size N = 100, step size h = 1/64 for RK3 and step sizeh = 1/192 for BDF3. For the numerical comparison, we measure L ∞ -norm error Err(t i ) = ||u(x j , t i ) − u i j || ∞ in each integration time step where u i j ≈ u(x j , t i ). The error behaviors of two methods are plotted in Figure 3, which are measured on a logarithmic scale.
It can be seen that the accuracy of the multi-stage method RK3 with time step h is quite similar to that of the multi-step method BDF3 with time step sizeh. Additionally, to observe of the efficiency for the two methods, CPU-times, and the absolute error are measured at the final time t = 1 by varying the resolution of space N = k · 10 2 from k = 1 to k = 10. The results are plotted by absolute error versus CPU-time in Figure 4 with time step sizes h = 1/100 andh = 1/300. Figure 4 can be good evidence of the conclusion that RK3 with eigenvalue decomposition technique is more efficient than the BDF3 method. More precisely speaking, BDF3 requires more computational costs to obtain a similar magnitude of accuracy. In addition, RK3 combined with the eigenvalue decomposition technique can obtain higher accuracy for the same cost.

Nonlinear PDE: Medical Akzo Nobel Problem
In this example, we consider one of nonlinear stiff PDE, a reaction-diffusion system with one spatial dimension, described by along with the following initial and boundary conditions, where v 0 is a constant and Semi-discretization of this system yields the nonlinear stiff ODE given by dy dt = f (t, y), y(0) = g, y ∈ IR 2N , 0 ≤ t ≤ 20.
The function f is given by The function φ is given by The parameters k, v 0 , and c are set to 100, 1, and 4, respectively. The integer N can be decided by the user. In this experiment, we set N as 200. Since analytic solutions are unavailable, we use reference solutions listed in Table 1 excerpted from [43]. Specifically, results independent of computational resources were measured to compare the efficiency of two methods in this example. The number of times that nonlinear solvers are called (nsolve) and the number of function evaluations (nfeval) are measured by varying relative tolerance (Rtol) and absolute tolerance (Atol) as (Rtol, Atol) = (10 −n , 10 −n−2 ) (n = 4, . . . , 11). We also measure an L ∞ -norm error at the end time for each tolerance and plot the error in a logarithm scale as a function of nsolve (left) and nfeval (right) in Figure 5. These figures show that RADAU5 generates smaller errors, compared with ODE15s, for paying a similar computational expenses. From a different perspective, RADAU5 requires less computational resources than ODE15s to get similar level of errors. Thus, we can claim that RADAU5 has better performance in terms of computational costs and accuracy than ODE15s.

Kepler Problem
In this subsection, we consider a two-body Kepler's problem to examine two conservation properties-the Hamiltonian energy and angular momentum-which are indispensable factors in physics. The Kepler's problem describes the Newton's law of gravity revolving around their center of mass placed at the origin in elliptic orbits in the (q 1 , q 2 )-plan [44]. The equations with unitary masses and gravitational constant are defined by with initial conditions p 1 (0) = 0, p 2 (0) = 2, q 1 (0) = 0.4, and p 1 (0) = 0 on the interval [0, 100π]. The dynamics are described by Hamiltonian function given by together with angular momentum L given by L(p 1 , p 2 , q 1 , q 2 ) = q 1 p 2 − q 2 p 1 .
The initial Hamiltonian and the initial angular momentum conditions are H 0 = −0.5 and L 0 = 0.8, respectively.
The conservation properties for the Hamiltonian energy H and angular momentum L are investigated by simulating with the two methods, RADAU5 and ODE15s, with time step size h = 0.1 and plot the results in Figure 6. As shown in Figure 6, RADAU5 can conserve both quantities, whereas ODE15s loses the properties as time is going on.
Next, we also consider the movement of comet in planar regulated three-body problem of Sun-Jupiter-Comet. To investigate conservation properties of the two methods, we measure the Hamiltonian energy K and the angular momentum D for the three-body Kepler problem described by , where The energy and angular momentum of the comet are constant, when µ = 0 and ν = 1. For this experiment, initial condition is set to and the initial energy and angular momentum are set to K 0 /2 = 0.3 and D 0 = 5 with parameter step size h = 1/2π and t 0 = 0. In Figure 7, one can see that the behaviors of the energy and the momentum over time interval [0, 100π]. As observed in Figure 7, RADAU5 gives a maximum variation of 8.0356 × 10 −5 and 0.0013 for the energy and the momentum, whereas ODE15s presents a variation of 5.9063 × 10 −4 and 0.0173 for them. Therefore, one can conclude that RADAU5 has better conservation properties, compared with ODE15s.

Conclusions and Further Discussion
In this work, we compare multi-stage methods with multi-step methods by investigating the numerical properties of both methods. In a classical approach, nonlinear stiff systems were usually solved by multi-step methods to avoid huge computational complexity induced from linearization of a given nonlinear system. However, the computational costs for the multi-stage method can be reduced sufficiently without loss of stability and conservation, which is possible by using suitable nonlinear and linear solvers such as a Newton-type method and eigenvalue decomposition. It means that the multi-stage method can also be applied to solve nonlinear stiff systems without any damage to computational costs, compared with the multi-step methods. Moreover, it is seen that the multi-stage methods preserve the invariants of the energy and angular momentum in Hamiltonian systems. In addition, it is well-known that a stability property of multi-stage methods is much better than that of multi-step methods.
Overall, one can conclude that the multi-stage method can be a good candidate to solve nonlinear stiff systems. It means that, without any damage to computational costs, multi-stage methods can be applied to long-time simulations and massive physical simulations in fields such as astronomy, meteorology, nuclear fusion, nuclear power, aerospace, machinery, etc.