Two Taylor Algorithms for Computing the Action of the Matrix Exponential on a Vector

: The action of the matrix exponential on a vector e At v , A ∈ C n × n , v ∈ C n , appears in problems that arise in mathematics, physics, and engineering, such as the solution of systems of linear ordinary differential equations with constant coefﬁcients. Nowadays, several state-of-the-art approximations are available for estimating this type of action. In this work, two Taylor algorithms are proposed for computing e A v , which make use of the scaling and recovering technique based on a backward or forward error analysis. A battery of highly heterogeneous test matrices has been used in the different experiments performed to compare the numerical and computational properties of these algorithms, implemented in the MATLAB language. In general, both of them improve on those already existing in the literature, in terms of accuracy and response time. Moreover, a high-performance computing version that is able to take advantage of the computational power of a GPU platform has been developed, making it possible to tackle high dimension problems at an execution time signiﬁcantly reduced.


Introduction
The study of different matrix functions f (A), where A ∈ C n×n , such as the exponential, the trigonometric and hyperbolic functions, the logarithm or the sign, and several families of orthogonal matrix polynomials, among which are those of Hermite, Laguerre, Jacobi, or Chebyshev, form an attractive, wide, and active field of research due to their numerous applications in different areas of science and technology.
In the last years, the matrix exponential e A has become a constant focus of attention due to its extensive applications-from the classical theory of differential equations for computing the solution of the matrix system Y (t) = AY(t), given by Y(t) = e At , to the graph theory [1][2][3], even including some recent progress about the numerical solutions of fractional partial differential equations [4,5]-as well as the multiple difficulties involved in its effective computation. They have motivated the development of distinct numerical methods, some of them classic and very well-known, as described in [6], and other more recent and novel using, for example, Bernoulli matrix polynomials [7].
Nevertheless, sometimes it is not the computation of the function f (A) on a square matrix A which is required by applications, but its action on a given vector v ∈ C n , i.e., f (A)v. Once again, the motivation for this particular problem may come from its applicability in different and varied branches of science and engineering. As an example, the action of the matrix sign on a vector is used in quantum chromodynamics (QCD), see [8] for details. Furthermore, in particular, the action of the matrix exponential operator on a vector appears in multiple problems arising in areas of mathematics, as in the case of the following first-order matrix differential equation with initial conditions and constant coefficients being A ∈ C n×n and v ∈ C n , and whose solution is in the form Y(t) = e At v ∈ C n -this kind of problem occurs frequently, for example, in control theory-or in applications involving the numerical solutions of fractional partial differential equations [9]. Moreover, the action of the matrix exponential on a vector is also used in physics and engineering fields such as electromagnetics [10], circuit theory [11], acoustic/elastic wave equations [12], seismic wave propagation [13], chemical engineering [14], robotics [15], and so on. There exist different methods in the literature to calculate the action of the exponential matrix on a vector (see for example those described in references [16][17][18][19][20][21]). Additionally, a comparison of recent software can be found in [22]. Among these methods, those based on Krylov subspaces [23]-they reduce the e A v problem to a corresponding one for a small matrix using a projection technique-and those based on polynomial or Padé approximations, see [16,17,24] and references therein, can be highlighted.
When computing e A v, the approach using Taylor method, in combination with the scaling and squaring technique, consists of determining the order m of the Taylor polynomial T m (A) and a positive integer s, called the scaling factor, so that e A v ≈ (T m (2 −s A)) 2 s v. Indeed, in this paper, two algorithms that calculate e A v without explicitly computing e A have been designed and implemented in MATLAB language. Both of them are based on truncating and computing the Taylor series of the matrix exponential, after having obtained the values of m and s by means of a backward or forward error analysis.
Throughout this work, we will denote the matrix identity of order n as I n or I. With x , we will represent the result of rounding x to the nearest integer greater than or equal to x. In the same way, x will stand for the result of rounding x to the nearest integer less than or equal to x. The matrix norm A will refer to any subordinate matrix norm and A 1 will be, in particular, the 1−norm. A polynomial of degree m is given by an expression in the form P m (x) = a m x m + a m−1 x m−1 + · · · + a 1 x + a 0 , where x is a real or complex variable, and the coefficients a j , 0 ≤ j ≤ m, are complex numbers with a m = 0. Moreover, we can define the matrix polynomial P m (A), for A ∈ C n×n , by means of the formulation P m (A) = a m A m + a m−1 A m−1 + · · · + a 1 A + a 0 I. This work is organised as follows. First, Section 2 presents two scaling and squaring Taylor algorithms for computing the action of the matrix exponential on a vector. Then, in Section 3, these algorithms are implemented and their numerical and computational properties are compared with those of other state-of-the-art codes by means of different experiments. Next, Section 4 exposes the computational performance of our two codes after their implementation on a GPU-based execution platform. Finally, conclusions are given in the last section.

Algorithms for Computing the Action of the Matrix Exponential
Let be the Taylor approximation of order m devoted to the exponential of matrix A ∈ C n×n , and let v ∈ C n be a vector. In combination with the scaling and squaring method, the Taylor-based approach is concerned with computing e A = e 2 −s A 2 s ≈ (T m (2 −s A)) 2 s [6], where the nonnegative integers m and s are chosen to achieve full machine accuracy at a minimum computational cost.
In our proposal, the values of m and s are calculated such that the absolute forward error for computing e s −1 A v is less or equal to u = 2 −53 , the so-called unit roundoff in IEEE double precision arithmetic. The absolute forward error of e s −1 A v is bounded as follows: Hence, if we calculate s such that i.e., and we verify that the following inequality is as well satisfied then, the absolute forward error of computing e s −1 A v will be approximately less or equal to u. Once m and s have been calculated, e A v can be efficiently computed as follows: In this way, Algorithm 1 computes w = e A v, where A ∈ C n×n and v ∈ C n , starting with an initial value m ∈ N, without explicitly working out e A . First, in lines 1-4, vectors Av, A 2 v, · · · , A m v, A m+1 v, A m+2 v are computed and stored in the array of vectors V 1 , V 2 , · · · , V m+2 , respectively. Then, lines 5-13 are used to determine the minimum value m and the corresponding value s, calculated in line 7, taking into account expression (3), such that (4) is fulfilled. Next, in lines 15-17, m is set to the maximum value allowed if expression (3) could not be satisfied. Finally, in lines 18-29, w = e A v is computed according to (5). Algorithm 1 Given a matrix A ∈ C n×n , a vector v ∈ C n , and minimum m and maximum M Taylor polynomial orders, m, M ∈ N, this algorithm computes w = e A v ∈ C n by (5).
As in both cases in which the first term of these absolute errors coincides, Algorithm 1 could be easily modified for computing e s −1 A v such that E ab ≤ u. To do this, only the condition expression in line 8, which checks whether convergence is reached, should be replaced by s m+2 (m+2)! ≤ u. An alternative formulation to expression (2) in the absolute backward error estimation would have been to consider only the first of the terms. In this way, starting from a minimum value of m, the value of s is computed from the expression (3), which will already ensure that the error committed will be less or equal to u. With that assumption, together with the objective of reducing the matrix-vector products of the previous algorithm, we designed Algorithm 2, which determines the appropriate values of m and s that satisfy both purposes. Algorithm 2 Given a matrix A ∈ C n×n , a vector v ∈ C n , and minimum m and maximum M Taylor polynomial orders, m, M ∈ N, this algorithm computes w = e A v ∈ C n by (5).
end for 32: end for For it, in lines 5-20, the number of matrix-vector products p required, obtained as m × s, employing two consecutive values of m and their corresponding values of s, is compared in each iteration of the while loop. The procedure concludes when the number of products associated with a given degree m of the Taylor polynomial is greater than that obtained with the immediately preceding degree. Table 1 shows the computational and storage costs of Algorithms 1 and 2. The computational cost depends on the parameters m and s and it is specified in terms of the required number of matrix-vector products. The storage costs is expressed as the number of matrices and vectors with which the algorithms work.

Numerical Experiments
In this section, several tests have been carried out to illustrate the accuracy and efficiency of two MATLAB codes based on Algorithms 1 and 2. All these experiments have been executed on Microsoft Windows 10 × 64 PC system with an Intel Core i7 CPU Q720 @1.60Ghz processor and 6 GB of RAM, using MATLAB R2020b. The following MATLAB codes have been compared among them: • expmvtay1: Implements the Algorithm 1, where absolute backward errors are assumed. The degree m of the Taylor polynomial used in the approximation will vary from 40 to 60. The maximum value allowed for the scaling parameter s, so as not to give rise to an excessively high number of matrix-vector products, will be equal to 45. The code is available at http://personales.upv.es/joalab/software/expmvtay1.m (accessed on 28 January 2022). • expmvtay2: Based on Algorithm 2. As mentioned before, it attempts to reduce the number of matrix-vector products required by the code expmvtay1. It considers the same limits of m and s as the previous function. The implementation can be downloaded from http://personales.upv.es/joalab/software/expmvtay2.m (accessed on 28 January 2022). • expmAvtay: Code where e A is firstly expressly computed, by using the function exptaynsv3 described in [26], and the matrix-vector product e A v is then carried out. The code exptaynsv3 is based on a Taylor polynomial approximation to the matrix exponential function in combination with the scaling and squaring technique. The order m of the approximation polynomial will take values no greater than 30. • expmv: This function, implemented by Al-mohy and Higham [17], computes e tA v without explicitly forming e tA . It uses the scaling part of the scaling and squaring method together with a truncated Taylor series approximation to the matrix exponential. • expm_newAv: Code that first explicitly calculates e A by means of the function expm_new, developed by Al-Mohy and Higham [27], and then multiplies it by the vector v to form e A v. The function expm_new is based on Padé approximants and it implements an improved scaling and squaring algorithm. • expleja: This code, based on the Leja interpolation method, computes the action of the matrix exponential of H × A on a vector (or a matrix) v [28]. The result is similar to e H×A v, but the matrix exponential is not explicitly worked out. In our experiments, H will be equal to 1 and default values of the tolerance will be provided. • expv: Implementation of Sidje [23] that calculates e tA v by using Krylov subspace projection techniques with a fixed dimension for the corresponding subspace. It does not compute the matrix exponential in isolation but instead, it calculates directly the action of the exponential operator on the operand vector. The matrix under consideration interacts only via matrix-vector products (matrix-free method).
To evaluate the performance of the codes described above in accuracy and speed, a test battery composed of the following set of matrices has been used. For each matrix A, a distinct vector v with random values in the interval [−0.5, 0.5] has been generated as well. MATLAB Symbolic Math Toolbox with 256 digits of precision was employed in all the computations to provide the "exact" action of the matrix exponential of A on a vector v, thanks to the vpa (variable-precision floating-point arithmetic) function: (a) Set 1: One hundred diagonalizable 128 × 128 complex matrices with the form where D is a diagonal matrix with complex eigenvalues and V is an orthogonal matrix obtained as V = H/ √ 128, being H a Hadamard matrix. The 2-norm of these randomly generated matrices varied from 0.1 to 339.4. The "exact" action of the matrix exponential of A on a vector v was calculated computing first e A = V × e D × V T (see [29], p. 10) and then e A v. (b) Set 2: One hundred non-diagonalizable 128 × 128 complex matrices generated as where J is a Jordan matrix composed of complex eigenvalues with an algebraic multiplicity randomly varying between 1 and 3. V is an orthogonal matrix whose randomly obtained elements become progressively larger from one matrix to the next. The 2-norm of these matrices reached values from 3.76 to 323.59. The "exact" action of the matrix exponential on a vector was calculated as for the above set of matrices. (c) Set 3: Fifty matrices from the Matrix Computation Toolbox (MCT) [30] and twenty matrices from the Eigtool MATLAB Package (EMP) [31], all of them with a 128 × 128 size. With the aim of calculating the "exact" action of the matrix exponential on a vector, the "exact" exponential function e A of each matrix A was initially computed according to the next algorithm consisting of the following three steps, employing the function vpa in each of them: 1.
Diagonalise the matrix A via the MATLAB function eig and obtain the matrices V and D such that Then, the matrix E 1 will be computed as Calculate the matrix exponential of A by means of the MATLAB function expm, i.e., E 2 = expm(A).

3.
Take into account the "exact" matrix exponential of A only if: Lastly, the "exact" action was worked out as E 1 v, obviously again using the function vpa.
Among the seventy-two matrices that initially constitute this third set, only forty-two of them (thirty-five from the MCT and seven from the EMP) could be satisfactorily processed in the numerical tests carried out. The 2-norm of these considered matrices ranged between 1 and 10,716. The reasons for the exclusion of the others are given below:

-
Matrices 2 and 15 incorporated in the MCT and matrices 1, 3, 5, 10, and 15 belonging to the EMP incurred in a very high relative error by some code, due to the ill-conditioning of these matrices. - Matrices 8,11,13, and 16 from the EMP were repeated, as they were also part of the MCT. Figures 1-3 show, respectively, the results of the numerical analyses carried out by means of each of the different codes in comparison with the three types of matrices considered. In more detail, these figures depict the normwise relative errors (a), the performance profiles (b), the ratios of normwise relative errors between expmvtay1 and the rest of the implementations (c), the lowest and highest relative error rates (d), the polynomial orders and the Krylov subspace dimensions (e), the ratios of matrix-vector products between expmvtay2 and the other codes (f), the response time (g), and the ratio of the execution time between expmvtay2 and the remaining functions (h). Relative error ratio  Relative error ratio      For each of the methods under evaluation, the normwise relative error Er(A, v) committed in the computation of the action of the exponential function of a matrix A on a vector v was calculated as follows: where exp(A) denotes the exact solution and exp(A, v) represents the approximate one. Figures 1a-3a present the normwise relative error incurred by each of the seven codes under study. The solid line that appears in them plots the function k exp u, where k exp (or cond) means the condition number of the matrix exponential function ( [29], Chapter 3) for each matrix and u represents the unit roundoff in IEEE double precision arithmetic. The matrices were ordered by decreasing value their condition number. It is well-known that the numerical stability of each method is exposed if its relative errors are positioned not far above this solid line, although it is always preferable that they are below. Consequently, these figures reveal that expv is the code that most frequently presents relative errors above the k exp u curve, being therefore the least numerically stable of all the codes analysed for the matrices used in the numerical tests. The rest of the codes can be said to offer high numerical stability.
The percentage of cases in which expmvtay1 incurred in a normwise relative error less, equal or greater than the other codes is listed in Table 2. As can be appreciated, expmvtay1 always provided a higher percentage of improvement cases than the rest of its competitors, especially over expv followed by expm_newAv, expleja, and, with very similar rates, by expmAvtay and expmv. On the other hand, the gain in accuracy by expmvtay1 over expmvtay2 is not noticeable, and it can be concluded that the latter will also offer a notable improvement in the reliability of the results compared to the rest of the methods. In a very detailed way, Table 3 collects the values corresponding to the minimum and maximum normwise relative error committed by all the functions for the three sets of matrices employed, as well as the mean, median, and standard deviation. While the minimum relative errors incurred by the codes are very similar, it is easy to observe how the maximum relative error and, consequently, the largest values in the mean, median, and standard deviation corresponded in general to expv, which turned out to be the least reliable code, closely followed by expleja and expm_newAv. For all these metrics, the other methods, all based on the Taylor approximation, provided better values, analogous to each other. Table 3. Minimum, maximum, mean, median, and standard deviation values of the relative errors committed by all the codes for Sets 1, 2, and 3, respectively.

Min.
Max. Figures 1b-3b, corresponding to the performance profiles, depict the percentage of matrices in each set, expressed in terms of one, for which the error committed by each method in comparison is less than or equal to the smallest relative error incurred by any of them multiplied by α. It is immediately noticeable that expmvtay1 and expmvtay2 achieved the highest probability values for the vast totality of the plots, expmvtay1 showing a slightly highest accuracy than expmvtay2 in Figure 1b and similar in the other figures. The scores achieved by expmAvtay and expmv do not differ much from each other, and they are somewhat lower than those provided by the previous codes. Clearly, expm_newAv, expleja, and expv exhibited the poorest results, with a significantly lower accuracy than the other codes.

Mean
These accuracy results are also confirmed by the next two types of illustrations. Figures 1c-3c reflect the ratio of the relative errors for any of the methods under study and expmvtay1. Values of these ratios are decreasingly ordered and exposed according to the quotient Er(expmvtay1)/Er(expmvtay2). Most of these values are greater than or equal to 1, showing once again the overall superiority of expmvtay1, and correspondingly of expmvtay2, over the other functions.
As a pie chart, and for each of the sets, Figures 1d-3d show the percentage of matrices in which each method resulted in the lowest or highest relative error. According to the values therein, for Sets 1 and 2, expmvtay1 and expmvtay2 gave rise to the lowest errors on a highest percentage of occasions. Notwithstanding, for Set 3, these percentages were almost equally distributed among expmvtay1, expmvtay2, expmAvtay, and expmv. If our attention is now turned to the highest relative error rates, expv gave place to the worst results in most cases, leading to values equal to 65% and 81% for Sets 1 and 2, respectively.
For Set 3, this percentage dropped to 40%, followed in a 29% by expleja and in a 24% by expm_newAv. Table 4 compares the minimum, maximum, mean, and median values of the tuple m and s, i.e., the order of the Taylor approximation polynomial and the value of the scaling parameter used by the first four methods. For the other codes, the parameter m is not comparable as it represents the degree of the Padé approximants to the matrix exponential (expm_newAv), the selected degree of interpolation (expleja) or the dimension of the Krylov subspace employed (expmv). Additionally, s denotes the scaling value (expm_newAv) or the scaling steps (expleja), but it is not provided for expv, because this code does not work with the scaling technique. Regarding the mean values, expmv needed the highest orders of approximation polynomials, followed by expmvtay2, expmvtay1, and expmAvtay. The function expv was always invoked using the default value of m, which corresponded to 30. Concerning the value of s, also in average terms, expmAvtay always required the smallest values, while expmvtay1, in the case of matrix Sets 1 and 2, or expmv, in the case of Set 3, demanded the highest values. Alternatively, Figures 1e-3e graphically represent the values of m required in the computation of each of the matrices that compose our test battery by the distinct methods.
In addition to the above analysis related to the accuracy of the results provided by all the codes, their computational costs have also been examined from the point of view of the number of matrix-vector products and the execution time invested by each of them. Thus, Table 5 lists the total number of matrix-vector products carried out by the seven codes in the computation of the matrices of our three sets. As can be noted, expmvtay2 performed the lowest number of products, followed by expmvtay1. Then, following an increasing order in the number of operations involved, expmv, expleja, expmAvtay, and expm_newAv would be cited, exchanging the position of these last two codes for Set 3. By far, the largest number of products was carried out by expv. In a more detailed way, Figures 1f-3f show the ratio between the number of matrixvector products required by expmvtay1, expmAvtay, expmv, expm_newAv, or expleja and that needed by expmvtay2 in the computation of the matrices of the test sets, decreasingly ordered according to the quotient P(expmvtay1)/P(expmvtay2). In order not to distort these figures and to better appreciate the rest of the results, the ratio with respect to expv has not been considered, due to its excessively high number of products demanded. In the case of Sets 1 and 2, this factor reached values greater than or equal to one in the vast majority of the matrices. It is convenient to clarify that expmAvtay computes matrix-vector products to obtain the most appropriate values of the polynomial order (m) and the value of the scaling (s), especially with regard to the estimation of the 1-norm of A p or A p B operations, where A and B are square matrices and p is the power parameter. In addition, this function works out matrix products not only in the calculation of these mentioned values, but also in the evaluation of the Taylor approximation polynomial by means of the Paterson-Stockmeyer method. Something very similar could be said about expm_newAv, as matrixvector products will be carried out by the function expm_new in the estimation of the 1-norm of power of matrices, and matrix products will be as well required when calculating the matrix exponential by means of Padé approximation. As a consequence, the computational cost of each matrix product for expmAvtay and expm_newAv was approximated as n matrixvector products, where n represents the dimension of the square matrices involved.
On the other hand, Table 6 reports the amount of time required by all the codes in comparison to complete its execution. As expected according to the matrix-vector products, expmvtay2 spent the shortest times, closely followed by expmvtay1. Exceedingly time-consuming resulted to be expv, particularly for Sets 1 and 2. The execution time of expv was more moderate in the case of Set 3, where the response times of expmAvtay and expm_newAv were also remarkable due to the explicit computation of the matrix exponential and its subsequent product by the vector. Figures 1g-3g display graphically these same values by means of bar charts. Again, expv times have not been included so as not to distort the graphs. Finally, in Figures 1h-3h, and always following a descending sequence in the quotient T(expmvtay1)/T(expmvtay2), the ratios of the computation times spent by expmvtay1, expmAvtay, expmv, expm_newAv, and expleja versus expmvtay2 in each matrix computation are plotted. In the specific case of expmv, this ratio took values within the intervals [1.54, 7.96], [1.35, 7.23], and [0.42, 10.21], respectively, for the Sets 1 to 3. As can be easily noticed in the figures, the results corresponding to expmAvtay and expm_newAv were the highest ones for any set.

Algorithm Migration to a GPU-Based Computational Platform
For the next experiment, we provide a GPU-CUDA implementation able to be executed from a MATLAB environment. The MATLAB routine receives an argument that points out on which system do we want to execute the algorithm. This way, we can easily select the system and compare execution times and accuracy. Figure 4 shows execution time (left) and speed up (right) when executing the algorithm on a GPU environment. The system used for this experiment comprises an Intel(R) Core(TM) i9-7960X CPU @ 2.80GHz with 16 cores that is denoted as "CPU" in the figure.
The GPU device is a NVIDIA Quadro RTX 5000 under the CUDA Driver Version 11.2 (3072 cores). Matrices used in the figure are randomly generated with absolute values between 0 and 1, resulting in an accuracy ||x c −x g || ||x c || ≈ 10 −16 . The main result we can observe is that behaviour of the speed increase. For a problem size less than n ≈ 2500, we do not obtain profit using the GPU. For larger problem sizes, we see more and more speed increase as the size increases, achieving more than two times the performance with the GPU than with the CPU, and achieving slightly better results for expmvtay1 than for expmvtay2.

Conclusions
In this paper, two algorithms devoted to the computation of the action of the matrix exponential on a vector have been described. Their numerical and computational performance have been evaluated in several experiments under a testbed composed of distinct state-of-the-art matrices. In general, these algorithms provided a higher accuracy and a lower cost than state-of-the-art algorithms in the literature, with expmvtay1 achieving a slightly higher accuracy in some occasions at a slightly higher cost than expmvtay2.
Both algorithms have been migrated in their implementation to be able to run and take advantage of a computational infrastructure based on GPUs or a traditional computer, such execution being configurable and fully transparent to the user from the MATLAB application itself.