An Efficient Algorithm for Basic Elementary Matrix Functions with Specified Accuracy and Application

: If the matrix function f ( At ) posses the properties of f ( At ) = g (cid:0) f ( (cid:0) tk A (cid:1)(cid:1) , then the recurrence formula f i − 1 = g ( f i ) , i = N , N − 1, · · · ,1, f ( tA ) = f 0 , can be established. Here, f N = f ( A N ) = m ∑ j = 0 a j A jN , A N = tk N A . This provides an algorithm for computing the matrix function f ( At ) . By specifying the calculation accuracy p , a method is presented to determine m and N in a way that minimizes the time of the above algorithm, thus providing a fast algorithm for f ( At ) . It is important to note that m only depends on the calculation accuracy p and is independent of the matrix A and t . Therefore, f ( A N ) has a fixed calculation format that is easily computed. On the other hand, N depends not only on A , but also on t . This provides a means to select t such that N is equal to 0, a property of significance. In summary, the algorithm proposed in this article enables users to establish a desired level of accuracy and then utilize it to select the appropriate values for m and N to minimize computation time. This approach ensures that both accuracy and efficiency are addressed concurrently. We develop a general algorithm, then apply it to the exponential, trigonometric, and logarithmic matrix functions, and compare the performance with that of the internal system functions of Mathematica and Pade approximation. In the last section, an example is provided to illustrate the rapid computation of numerical solutions for linear differential equations.


Introduction
Matrix exponent functions plays a crucial role in various fields, including ordinary differential equations and control theory.In particular, they are instrumental in analyzing general structural dynamic systems, represented by the initial value problem: where M, G, and K are n by n square matrices, and r(t) denotes the external force applied.The introduction of dual variables of a Hamiltonian system, results in the following system: where The solution of (3) is provided as follows: U(t) = e Ht U 0 + t 0 e H(t−s) F(s)ds. ( When a problem is oscillatory, the formal solution often involves both the sine and cosine of a matrix.Famous examples include the Schrődinger equation in quantum mechanics: where H(t) is a Hermitian operator and ψ is a complex wave function.If H(t) = A is a real and constant matrix, the solution of ( 6) is as follows: There are other examples where the computation of the sine and cosine of a matrix can be of interest.Wave equations provided by the generic second order system are written as follows: x where the solution is provided by the variation of the constant method: x(t) = cos(tA)x 0 + sin 1 (tA)x 1 + t 0 sin 1 ((t − s)A)F(s)ds, where A = √ K, sin 1 (tA) = t 0 cos(sA)ds.If A is invertible, then (9) can be written as follows: x(t) = cos(tA)x 0 + sin(tA)A −1 x 1 + A −1 t 0 sin((t − s)A)F(s)ds.(10) For (5), providing U(kτ)(k = 1, 2, • • • ) as accurately as possible is a concern for engineers and mathematicians, where τ is called the step.The specific method for this problem is as follows: Similarly, for (10), providing the algorithm x(kτ)(k = 1, 2, • • • ) as accurately as possible involves introducing vectors x ′ = y and the following algorithm: To improve the accuracy of U(kτ) and x(kτ), it is essential to develop an algorithm with high accuracy for computing T = e Hτ , cos(τA) and sin(τA) and the integral containing these matrix functions.This paper improves the algorithms developed for the computation of e A and cos A in the existing literature, see [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15], and also discusses the computation of other trigonometric, hyperbolic, logarithmic, and related inverse triangular and inverse hyperbolic matrices.

The General Theorem of the Fast Algorithm for Matrix Functions
In [16], we addressed the challenge of rapidly calculating elementary functions with specified precision.The implementation method involves an analytic function f (x) specifying property, If C 0 is an approximation as close as possible to f x k N , with N as a positive integer, for example, or C 0 being the Padé approximation of f x k N , then the following calculations are performed: Completing ( 14) and ( 15), f (x) ≈ C N is obtained.This method is also applicable to the calculation of matrix functions, where the variable x is replaced by the n by n square matrix A. Now, we have the following: or C 0 is the Padé approximation of f (∆A).Our objective is to determine m and N to mini- mize the computation time under the condition that the truncation error and computational precision are 10 −p .For the matrix function calculation, the coefficient calculation of ( 14) can be ignored, mainly focusing on the multiplication calculation of the matrix in ( 16) and (15).If the number of multiplications to complete g(A) is K, then the total number of times to complete ( 16) and ( 15) is roughly where M n is the number of times to complete one multiplication of the n by n matrix.Based on the above discussion, we present a fast algorithm for the matrix function as follows.
Theorem 1.Let m p be an integer close to the minimum point of h(m), where and N = p ln 10 , where The above algorithm is a fast algorithm for the general matrix function.
Proof.With a truncation error of 10 −p , for (16) to be satisfied, where ∥A∥ is a norm of A. So, From (17), we obtain the following: Note that the minimum point of g(m) has nothing to do with K ln∥A∥ ln k .By making q(m, N) as small as possible, the algorithm obtained by ( 1) and ( 2) is a fast algorithm, thus completing the proof of the theorem.Remark 1.In the specific calculation, ∥A∥ can be replaced by the approximate eigenvalue with the maximum modulus in this paper.
3. Fast Algorithm for e A , cos A, sin A, sinh A, and cosh A Suppose λ = ∥A∥ is a norm for A. If we let A = λA 0 , then ∥A 0 ∥ ≤ 1.From (19), we can obtain the following: In order to reduce the length of the article, the algorithms for several matrix functions are compiled into a table.
Theorem 2. For e A , cos A, sin A, cosh A, and sinh A, we devised the following fast algorithm using Theorem 1.

Remark 2.
It is worth noting that m p in Table 1 only depends on the calculation accuracy p and the matrix function, and is independent of matrix A.
Now, we utilize the general fast algorithm to specifically demonstrate the correctness of the algorithms in Table 1.The algorithms in Table 1 are straightforward; they involve placing the specific functions f and g into the general algorithm, and the tricky part lies in deriving the formulas of m p and N p .We calculate the formulas for m p and N p for the function e A for 12 ≤ p ≤ 36 here.In fact, the graphing of the function of h(m) supports our results for all values of p.The derivation of m p and N p for the other functions is very similar.
(1) For the function e A , we have g(u) = u 2 , k = 2, K = 1, and Using Sterling's approximation formula, we obtain the following: ).
Now, implementing it into (25), we obtain the following: Taking its derivative with respect to x, one obtains To determine extreme values, we set it to 0 and multiply both sides by (m + 1) 2 : (x + Using p = 12, we obtain a lower bound for x : x > 2p ln 10 + 1 2 ln 2 − 1 = 6.37.
Thus, we have a lower bound for m: m ≥ 6.We also can solve the quadratic equation to get In this case, for p = 12, 16, 24, and 32, the corresponding integer points for h(m) to achieve its minimum value are 3, 3, 5, and 6.Again, it can be estimated that m p = 3p ln 10 5 ln 3 , N p = p ln 10 (3) For cos A cosh A , there are two cases: (2j)! are very small, so we may replace N with N − 1, and m p with m p + 1.
Case (II): Similarly, for p = 12, 16, 24, and 32, the corresponding integer points for h(m) to achieve its minimum value are 4, 5, 6, and 7. Consequently, we can determine the following: m p = 3p ln 10 5 ln 2 , N p = p ln 10 The effectiveness of the algorithms is displayed in Table 1 by implementing numerical computations in Mathematica.N p and m p are replaced with N p,µ = µN p , m p,µ in Table 1, defined correspondingly by for sin A, for cos A, where one has N p,1 = N p and m p = m p,1 .
For comparison, let A take the following form: where U is a randomly generated n by n matrix, . In all the examples in this paper, we agree on the relative error as follows: Let A i (i = 1, 2, • • • , M) be M matrices produced by (32).From Figure 1, it is evident that the calculation accuracy is essentially the same for µ = 1, 2 3 , 4  3 , but µ = 1 provides the highest efficiency.The double-angle formula is notably more efficient and accurate than the triple-angle formula for cos A. Henceforth, we will exclude the case that utilizes the triple-angle formula for cos A. Now let us juxtapose the algorithms presented in Table 1 with the internal functions and commonly used algorithms found in mathematical software.For ease of reference, let err(g) denote the calculation error of matrix function f , where a blank g implies the utilization of the algorithms in Tables 1-3 of this article.If g = Sym, Pade, • • • , it signifies calculation according to the internal function, Padé approximation, • • • , respectively.When p = 16, m p = 7, 4, 3 for e A , cos A(cosh A), sin A(sinh A), respectively.We denote the Padé approximation of sin x as r 66 (x) and the Padé approximation of e x and cos x as r 88 (x).In Table 1, the Padé approximation of e A , cos A(cosh A), sin A(sinh A) is as follows: For the algorithms in Table 1, the err(g)(A) of random matrix from order 1 to 800 is as follows., It is evident from Figure 2 that, for e A , cos A, and sin A, the accuracy obtained by the three methods is essentially the same.However, when the matrix order is relatively high, the Taylor expansion method presented in Table 1 outperforms the others.In particular, for cos and sin A, the calculation time of the internal functions is approximately eight times longer than that of the algorithms in Table 1.Numerical calculations demonstrate that while the calculation speed of cos A and sin A using the internal functions Re(e iA ) = cos A and Im(e ) = sin A is faster than that of the internal functions cos A and sin A, their calculation accuracy is relatively poor.
For cosh A and sinh A, the internal functions can be calculated by utilizing the following equation: Our experiment provides the following results for err(g)(A) of random matrices with orders from 1 to 800.
It is evident from Figure 3 that using (35) not only doubles the calculation time but also makes the calculation error substantial.Furthermore, by observing Figures 2 and 3, it becomes apparent that the algorithms in Table 1 outperform the Padé approximation algorithm.

Fast Algorithm for tan A, tanh A, sec A, and sechA
Now let us contemplate the swift computation of tan A, tanh A, sec A, and sechA.Although these matrix functions can be computed using the following approach, we aim to directly compute them, as demonstrated in the previous section.
While the Padé approximation significantly contributes to matrix function computation, the numerical results in the previous section indicate that, based on the algorithm presented in this paper, neither the calculation accuracy nor the calculation speed is predominant.Therefore, this section does not adopt it.Theorem 3.For tan A, sec A, tanh A, and sechA, we have the following fast algorithm based on Theorem 1.
In Table 2, for σ(j) and E j , we have the following recursive formula [1]: Using the algorithms in Table 2, our experiments generated the following data for the err(g)(A) of random matrices from order 1 to 200.
Here, Sym indicates that matrix functions tan A, tanh A, sec A, and sechA, expressed by sin A(cos A) −1 , e A − e −A e A + e −A −1 , (cos A) −1 and 2 e A + e −A −1 , respectively, are computed using the internal functions.As observed in Figure 4, the algorithm's computation speed in Table 2 surpasses that of the internal function, particularly for tan A and sec A, where the speed is increased by 8-11 times compared to the internal function, while maintaining consistent accuracy is consistent.When the order n of the matrix is relatively large, for tanh A and sec hA, the calculation speed is twice that of the internal function, yet the precision remains the same as the internal function.Let us consider implementing our algorithms for calculating the functions cot A, cot A, csc A, and cschA.By we see that when ∥A∥ is small, the norm of the inverse of A will be large, so the calculation using (38) will produce large errors.Therefore, we must utilize the following method: where Theorem 4. Using Theorem 1, we have the following algorithm for cot A, coth A, csc A, and sec hA.
Implementation of the algorithms in Table 3 provides the following results for the matrix functions in (39).
Here, Sym signifies that matrix functions cot A, coth A, csc A, and sec hA are calculated using the internal functions with the regular formulas cos A(sin A) −1 , e A + e −A e A − e −A −1 , (sin A) −1 , and 2 e A − e −A −1 , respectively.The results in Figure 5 are essentially consistent with the corresponding results in Figure 2.

Fast Algorithm for Logarithmic Matrix Function ln A and Some Related Functions
People consider various algorithms [17][18][19][20][21][22] for logarithmic matrix function ln A. One of the most effective methods we noticed was the following algorithm.The standard way of dealing with this problem is to use the square root operator repeatedly to bring A near the identity: As k increases, A 1/2 k → I, so for sufficiently large k, we can apply a direct method to A 1/2 k .This procedure for the logarithm was introduced by Kenney and Laub [19].The specific method is to choose k such that and usually, k is selected from 8. We slightly optimize the method as follows: and where k is usually selected from 4. For ln x, the algorithm that converges faster is where y = 1−x 1+x .For ln A k , we have where Applying the Euler-Abel transformation [21], we obtain the following accelerated series: where F = 2B(I − B 2 ) −2 .We can use the following Padé approximation of ln A k [20].
where B A k − I.We also provide some numerical results for the matrix function ln A.
Based on ( 41)-( 44) we also provide some numerical results of ln A using ( 46)-( 48).It can be seen from Figure 6 that through optimization (43), the calculation accuracy of (46)-( 48) is significantly improved, and the calculation speed of ( 46 We used the following identities to implement our general algorithm for the computations of the inverse trigonometric and hyperbolic functions.To control the length of the paper, we will not display the results here.
For arcsin A, arctan A, arcsinhA, and arctanhA, under certain conditions, we can use the following identity.arcsin A= −i ln (51)

Improved Calculation Formula for Initial Value Problems (3) and (8)
Now, let us consider the calculation of the integral in (11), i.e., If numerical integration is used to calculate this integral, it is quite time-consuming, and the calculation accuracy cannot be guaranteed.We adopt the following method here.For nn = 1, 2, • • • , there is the following relationship: where T nn can be calculated by the following method: T nn = I, zz = I; So, (53) can be written as follows: so we can convert the integral of the product of matrix exponential function and vector function into the integral of pure vector function.
The improved calculation formula for (3) is as follows: (55), (56), where T = e Hτ is calculated by the algorithm in Table 1, and T nn,j and H j are calculated by ( 55) and (56).For the following two algorithms can be used. ( is the algebraic sum of functions of the following types: t α , t l e −ωt , t l sin ωt, t l cos ωt, (59) then (58) can be expressed in analytical form, so H j can also be expressed in analytical form.
In this way, the calculation of (57) can be completed.
(2) Gauss three-point integral formula Note that, under 16-bit calculation accuracy, and the algebraic accuracy of the Gauss three-point integral formula is 5. Using the Gauss three-point integral formula, where 5 , to approximate (58) with high accuracy.So, (57) can be replaced by the following algorithm.
In particular, we let then N p = 0, m = m p = 9p ln 10 8 ln 2 , At this time, the theoretical calculation accuracy of (62) is p.
Remark 3. Note that, W i,H (i = 1, 2, 3) in ( 63) and (65) only needs to be calculated once, and is independent of F(t), which is not only efficient but also easy to program.Now let us consider the numerical solution of the initial value problem (8).If it is converted to the initial value problem (3), then By where and (68), we obtain the following algorithm.
Algorithm 3. Numerical solution algorithm of initial value problem (8): for nn = 1, 2, • • • , where A = √ K.In particular, if nn is determined using (64), the theoretical accuracy of this algorithm is p and ), ), where TW sin j,i = (1+b i )τ 2nn cos and If A is reversible, then ). (75) Therefore, (75) can be written in the form of complex function as follows: We first provide an examples to illustrate the effectiveness of Algorithms 1.Although we skip examples of Algorithm 2, we then include an example for the effectiveness of Algorithm 3.

Example 1. In
The analytic of the initial value problem (8) is It can be seen from Figure 7 that the calculation accuracy of Algorithm 1 is higher than that of Algorithm 3 but the calculation speed of Algorithm 3 is three times that of Algorithm 1.However, the error of numerical results of Mathematica internal functions is far greater than the error of Algorithms 1 and 3.In particular, the collapse is calculated at t = 667.787.After this point, the error increases exponentially with time t.However, Algorithms 1 and 3 are quite stable.For example, the calculation error in [0, 8000] is still very small.
In Algorithms 1 and 3, let nn = µnn A , µ = 0.5, 0.25, 0.125, where nn A is calculated according to (64).The numerical results are as follows.Compared with Figures 7 and 8, when µ = 0.5, the calculation accuracy is not reduced by much, and the calculation speed is almost doubled.For µ = 0.125 Algorithms 1 and 3, the calculation accuracy is still high, and the calculation speed is 10 times higher than that in Figure 8.

Conclusions
In this paper, we established algorithms for the computation of matrix exponential functions, matrix trigonometric functions, and matrix hyperbolic functions.Our numerical results show that these algorithms are faster with higher accuracy compared to existing algorithms and the system algorithms in Mathematica.In the series expansion of matrix function, the number of expanded items m p is only related to the requirement of calculation accuracy, and has nothing to do with the matrix itself.However, the number of times in the iterative process N p is not only related to the matrix itself but also closely related to t.When |3 n p λ A t| ≤ 1(|2 n p λ A t| ≤ 1), N p = 0, the calculation of the matrix function becomes an algebraic operation, so the integral in ( 9)-( 12) becomes an easy integral to compute.In brief, our algorithm allows users to set a required level of accuracy and then use it to choose appropriate values for parameters m and N to minimize computation time.In this manner, both accuracy and efficiency are addressed simultaneously.

9p ln 10 8 ln 2 .
Our following numerical calculation supports this claim completely.Case (a): p = 12.Using the formula obtained by the quadratic formula and the upper bound m = 11, we obtain a new lower bound: m ≥ 6.63.Using the formula obtained by the quadratic formula and lower bound m = 7, we can also obtain the upper bound: m ≤ 6.63.Using our claim 9p ln 10 8 ln 2 , we obtain m = 6.70.This clearly matches the lower and upper bound quite well.Case (b): p = 18.Using the formula obtained by the quadratic formula and the upper bound m = 11, we obtain a new lower bound: m ≥ 8.05.Using the formula obtained by the quadratic formula and lower bound m = 7, we can also obtain the upper bound: m ≤ 8.07.Using our claim 9p ln 10 8 ln 2 , we obtain m = 8.20.These values match very well for taking the closest integer.Case (c): p = 24.Using the formula obtained by the quadratic formula and the upper bound m = 11, we obtain a new lower bound: m ≥ 9.27.Using the formula obtained by the quadratic formula and lower bound m = 7, we can also obtain the upper bound: m ≤ 9.28.Using our claim 9p ln 10 8 ln 2 , we obtain m = 9.47.Case (d): p = 36.Using the formula obtained by the quadratic formula and the upper bound m = 11, we obtain a new lower bound: m ≥ 11.29.Using the formula obtained by the quadratic formula and lower bound m = 7, we can also obtain the upper bound: m ≤ 11.37.Using our claim 9p ln 10 8 ln 2 , we obtain m = 11.60.Using m p = 9p ln 10 8 ln 2 , we can obtain N p = p ln 10 and (29) hold.Compared to I, the values in the second part of C = I + ∑

Figure 2 .Table 3 .
Figure 2. Error of e A , cos A, and sin A for three algorithms.

Figure 3 .
Figure 3. Error of sinh A and cosh A for three algorithms.

Figure 4 .
Figure 4. Error of the tan A, sec A, tanh A, and sechA for two algorithms.

Figure 5 .
Figure 5. Error of cot A, cot A, csc A, and cschA f or two algorithms.

Figure 6 .
Figure 6.Error of ln A for the several algorithms.

Figure 9 .
Figure 9. Numerical solutions and errors of two algorithms.

Table 1 .
The fast algorithm for e A , cos A, sin A, cosh A, and sinh A.

m p , N p Algorithm
ln 2 , N cos = We can again use the values of m and p = 12 to update the lower bound of m as follows: Now, our new lower bound for m is 7.With this small range of values for m, by inspecting the formula for the values of m, we can see that the value of m is very close to

Table 2 .
The fast algorithms for tan A, sec A, tanh A, and sechA.m p , N