An Improved Taylor Algorithm for Computing the Matrix Logarithm

: The most popular method for computing the matrix logarithm is a combination of the inverse scaling and squaring method in conjunction with a Padé approximation, sometimes accom-panied by the Schur decomposition. In this work, we present a Taylor series algorithm, based on the free-transformation approach of the inverse scaling and squaring technique, that uses recent matrix polynomial formulas for evaluating the Taylor approximation of the matrix logarithm more efﬁciently than the Paterson–Stockmeyer method. Two MATLAB implementations of this algorithm, related to relative forward or backward error analysis, were developed and compared with different state-of-the art MATLAB functions. Numerical tests showed that the new implementations are generally more accurate than the previously available codes, with an intermediate execution time among all the codes in comparison. distinct types of matrices was used. The numerical results showed that the new Taylor-based codes are usually more accurate than the other state-of-the-art ones, with an intermediate execution time among all of them. Author Contributions: Conceptualization, J.S., J.I. and E.D.; methodology, J.S., J.I., J.M.A. and E.D.; software, J.S., J.I., J.M.A. and P.R.; validation, J.M.A. and P.R.; formal analysis, J.S. and J.I.; investigation, J.S., E.D., J.M.A. and J.I.; writing—original J.S., J.I., E.D., P.R. and J.M.A.; writing— review and editing, J.S., J.I., E.D., P.R. and J.M.A. read agreed published version manuscript.


Introduction and Notation
The calculus of matrix functions has been, for a long period of time, an area of interest in applied mathematics due to its multiple applications in many branches of science and engineering; see [1] and the references therein. Among these matrix functions, the matrix exponential stands out due to both its applications and the difficulties of its effective calculation. Basically, given a square matrix A ∈ C n×n , its exponential function is defined by the matrix series: Directly related to the exponential function, we find the matrix logarithm. Specifically, given a nonsingular matrix A ∈ C n×n whose eigenvalues lie in C − (−∞, 0 ], we define the matrix logarithm of A as any matrix X ∈ C n×n satisfying the matrix equation: Of course, there are infinite solutions of Equation (1), but we only focus on the principal matrix logarithm or the standard branch of the logarithm, denoted by log (A), which is the unique logarithm of matrix A (see Theorem 1.31 of [1]) whose eigenvalues all lie in the strip {z ∈ C; −π < Im(z) < π}. Indeed, this principal matrix logarithm is the most used in applications in many fields of research from pure science to engineering [2], such as quantum chemistry and mechanics [3,4], buckling simulation [5], biomolecular dynamics [6], machine learning [7][8][9][10], graph theory [11,12], the study of Markov chains [13], sociology [14], • An algorithm based on the arithmetic-geometric mean iteration; see [36]; • The use of contour integrals; see [37]; • Methods based on different quadrature formulas, proposed in [38,39].
Finally, we should mention the built-in MATLAB function, called logm, that computes the principal matrix logarithm by means of the algorithms described in [32,33].
Throughout this paper, we refer to the identity matrix of order n as I n , or I. In addition, we denote by σ(A) the set of eigenvalues of a matrix A ∈ C n×n , whose spectral radius ρ(A) is defined as: With x , we denote the result reached after rounding x to the nearest integer greater than or equal to x, and x is the result reached after rounding x to the nearest integer less than or equal to x. The matrix norm || · || stands for any subordinate matrix norm; in particular, || · || 1 is the usual 1-norm. Recall that if A = (a ij ) is a matrix in C n×n , its 2-norm or Euclidean norm represented by A 2 satisfies [40]: All the implemented codes in this paper are intended for IEEE double-precision arithmetic, where the unit round off is u = 2 −53 ≈ 1.11 × 10 −16 . Their implementations for other distinct precisions are straightforward. This paper is organized as follows: Section 2 describes an inverse scaling and squaring Taylor algorithm based on efficient evaluation formulas [41] to approximate the matrix logarithm, including an error analysis. Section 3 includes the results corresponding to the experiments performed in order to compare the numerical and computational performance of different codes against a test battery composed of distinct types of matrices. Finally, Section 4 presents the conclusions.

Taylor Approximation-Based Algorithm
If ρ(A) < 1, then (see [1], p. 273): Taylor approximation is given by: where all the coefficients are positive and m denotes the order of the approximation.
For the lower orders m = 1, 2, and 4, the corresponding polynomial approximations from (3) use the Paterson-Stockmeyer method for the evaluation. For the Taylor approximation orders m ≥ 8, Sastre evaluation formulas from [41,42] are used for evaluating Taylor-based polynomial approximations from (3) more efficiently than the Paterson-Stockmeyer method. For reasons that will be shown below, the Taylor approximation of − log(I − A), denoted by T m , is used in all the Sastre approximations and evaluation formulas of the matrix logarithm.
Sastre approximations and evaluation formulas based on (4) are denoted as S T m (A), where the superscript stands for the type of polynomial approximation used, in this case the Taylor polynomial (4), and the subindex m represents the maximum degree of the polynomial for the corresponding polynomial approximation. Using (3) and (4), the implementation for the matrix logarithm computation is based on: For m = 8, we see that S T 8 (A) = T 8 (A). However, for higher orders of approximation, we show that S T m (A) has some more terms than T m (A). These terms are similar to, but different from the respective ones of the Taylor series. Following [43] (Section 4), we represent this order of approach by m+, and its corresponding approximation is denoted by S T m+ (A). The next subsections deal with the different evaluation formulas and approximations.

Evaluation of T 8 (A)
For m = 8, the following evaluation formulas from [41] (Example 3.1) are used: where the second subindex in y 02 (A) and y 12 (A) expresses the maximum power of the matrix A that appears in each expression, i.e., A 2 . The sum of both subindices in y 02 (A) and y 12 (A) gives the cost in terms of matrix products for evaluating each formula, denoted from now on by C. Thus, C = 2 for y 02 (A) and C = 3 for y 12 (A).
The system of nonlinear equations to be solved in order to determine the coefficients in (6) that arise when equating y 12 (A) = P 8 (A), where P 8 (A) is a general matrix polynomial: is given by (16)-(24) of [41], i.e., From (7), one obtains c 4 = ± √ b 8 . Since (4), where the corresponding coefficient is positive to prevent c 4 from being a complex number.
This system can be solved analytically using (25)-(32) from [41] (Example 3.1), giving four sets of exact solutions for the coefficients in (6). Then, the solutions are rounded to the desired precision arithmetic. As mentioned before, we use IEEE double-precision arithmetic. From now on, for any coefficient x i ,x i denotes the corresponding one rounded to that precision.
In order to check if the rounded coefficient solutions are sufficiently accurate to evaluate (6), we follow the stability check of [41] (Example 3.1). It consists of substituting the rounded coefficients into the original system of Equations (7)- (15) and testing if the relative error given by them with respect to each coefficient b i , for i = 0, 1, . . . , 8, is lower than the unit round off in IEEE double-precision arithmetic. For instance, from (12), it follows that the relative error for b 3 is: Then, we select a set of solutions where the relative errors for all b i , for i = 0, 1, . . . , 8, are lower than u. In our experience, solutions giving relative errors of order O(u) also provide accurate results. Finally, the approximation is computed as: where −S T 8 (−A) is evaluated using (6) with the coefficients from Table 1.

Evaluation of Higher-Order Taylor-Based Approximations
For higher orders of approximation, we followed a similar procedure as that described in [43] (Sections 3.2 and 3.3), where evaluation formulas for Taylor-based approximations of the matrix exponential of orders m = 15+ and 21+ were given. Both of these formulas were generalized to any polynomial approximation in [42] (Section 3), and this type of polynomial approximation evaluation formula was first presented in [41] (Example 5.1). Following the notation given in [43] (Section 4), the suffix "+" in m means that the corresponding Taylor-based approximation is more accurate than the Taylor one of order m. This is because the approximation with the order denoted by m+ has some more polynomial terms than the Taylor one of order m, and these terms are similar to the corresponding Taylor approach terms. We numerically define this similarity in each case.
Proposition 1 and the MATLAB code fragments from [42] (Section 3.1) allow obtaining all the solutions for the coefficients of the evaluation formulas (10)-(12) from [43] (Section 3.2), not only for the exponential Taylor approximation of order m = 15+, but for whatever polynomial approximation of the same order. Unfortunately, there were no real solutions for the evaluation formulas of the logarithm Taylor approximation of order m = 15+. We also tested different variations of the formulas providing order m = 15+; however, there were no real solutions for any of them. The same also occurred in [44] (Section 2.2) with the hyperbolic tangent Taylor approximation of order m = 15+. In that case, we could find real solutions using an approximation of order m = 14+. In this case, there were also real solutions for the following logarithm Taylor approximation evaluation formulas of order m = 14+: where it is easy to show that the degree of y 22 (A) is 16. Therefore, we denote this approximation by S T 14+ (A), and similarly to [44] (Section 2.2), one obtains: where T 14 (A) is given by (4) and b 15 and b 16 depend on the coefficients c i from (18). From (4), one obtains that p 15 = 1/15 and p 16 = 1/16. From all the solutions obtained for the coefficients, a stable one was selected that incurs in relative errors lower than one with respect to the corresponding Taylor coefficients p 15 and p 16 , concretely: showing two decimal digits. For orders from m = 21+ to m = 39+, we first evaluate and store the matrix powers A i = A i for i = 2, 3, . . . , s, and then, the evaluation formulas (94)-(96) proposed in [42] (p. 21) are used, reproduced here for the sake of clarity: The degree of y 2s (A) is 8s, and the total number of coefficients of y 2s (A) is 6s + 4, i.e., 3s coefficients a i , s coefficients g i , s coefficients h i , s + 1 coefficients l i , and coefficients f 0 , j 0 , and k 0 ; see [42] (p. 21). Analogous to (6), the cost C for evaluating y ks (A), k = 0, 1, 2, from (21), (23) and (24) in terms of the matrix products is C = k + s.
Again, the corresponding approximation from (24), denoted as S T (6s+3)+ (A) = y 2s (A), allows us to evaluate Taylor-based approximations of the matrix logarithm of the type: where the coefficients b i depend on the ones from (21)- (24).
In order to obtain all coefficients from these expressions, the nested procedure used in [43] (Sections 3.2 and 3.3) was applied. Analogous to [44] (p. 7), we employed the vpasolve function from the MATLAB Symbolic Computation Toolbox (https://es. mathworks.com/help/symbolic/vpasolve.html, accessed on 20 August 2021) with the random option to obtain different solutions of the nonlinear system arising for each order of approximation. Concretely, we obtained 1000 solutions for the coefficient sets of (21), (22), and (24) for orders m = 21+ and m = 27+ and 300 solutions for m > 27+. Then, the most stable real solution according to the stability check from [41] (Example 3.1) (see (16)) was selected for each order m+.
Similar to (20), we checked that the relative error for coefficients b i , i = 6s + 4, 6s + 5, . . . , 8s from (27) with respect to the corresponding Taylor coefficients for the selected solutions was less than one.
For the orders higher than m = 39+, we combined y 2s (A) from (24) with the Paterson-Stockmeyer method, similar to (52) from [41] (Proposition 1), giving: where p is a multiple of s and the resulting Taylor-based approximation order is m = (6s + 3 + p)+. If p = 0, note that z 2ps (A) = y 2s (A). Therefore, the corresponding approximation (26) is denoted by S T (6s+3+p)+ (A), and it allows us to evaluate Taylor-based approximations of the matrix logarithm of the type: Table 2 shows the orders of approximations available using the Paterson-Stockmeyer method or S T (6s+3+p)+ (A) and their differences for a given cost C in terms of matrix products. It also shows the corresponding values of s and p for (26). Table 2 shows that the difference in the polynomial orders increases as C grows. See the function available at http://personales.upv.es/~jorsasma/Software/logm_pol.m for implementation details (accessed on 20 August 2021). Table 2. Maximum available approximation order for a cost C in terms of matrix products using the Paterson-Stockmeyer method, denoted by d PS , the maximum order using S T (6s+3+p)+ (A) from (27), denoted by d z S T , and the difference d z S T − d PS . Parameters s and p are used in z S T (A) in each case.

Error Analysis
On the one hand, the relative forward error of approximating log(I + A) by means of S T m (A) can be defined as: where S T m (A) is any of the Sastre approximations that appear in Sections 2.1 and 2.2. This error can be expressed as: Using Theorem 1.1 from [45], one obtains: where: Let Θ m be: On the other hand, the absolute backward error can be defined as the matrix ∆A that verifies: whereby the backward error can be expressed as: It can be proven that this error can be expressed as ∑ k≥m+1 b k A k ; hence, the relative backward error of approximating log(I + A) by means of −S T m (−A) can be bounded as follows: The values of Θ m k andΘ m k , listed in Table 3, were obtained by using symbolic computations. They satisfy that Θ m k ≈Θ m k and Θ m k <Θ m k , for k ∈ {1, 2, · · · , 13}. Hence, the numerical performance of two different algorithms, each based on one of these error formulations, will be very similar. Taking into account (28) and (30)-(32), if α m ≤ Θ m , then: and: i.e., the relative forward and backward errors are lower than the unit round off in IEEE double-precision arithmetic. However, if only α m ≤Θ m , then: i.e., only the relative backward error is lower than the unit round off.

Algorithm for Computing log(A)
Taking into account that log(A) can be expressed as log(I + A − I), Algorithm 1 computes log(A) by the inverse scaling and squaring Taylor algorithm based on the expressions that appear in Sections 2.1 and 2.2, where m k , k ∈ {1, 2, · · · , 13}, is a Taylor approximation order. In this algorithm, we can use either the Θ k (forward error analysis) or theΘ k (backward error analysis) of Table 3.
The preprocessing step is based on applying a similarity transformation of matrix A to obtain a permutation of a diagonal matrix T whose elements are integer powers of two and a new balanced matrix B such that B = T −1 AT. The aim of balancing is to achieve the norms of each row of B and its corresponding column nearly equal, trying to concentrate any ill-conditioning of the eigenvalues of matrix A into the diagonal scaling matrix T. To carry out this balancing, the balance MATLAB function can be employed.
Steps 2 to 11 bring A close to I by taking successive square roots of A until α ≤ Θ m k , where m k is the maximum order and α is computed as described in Step 5. To compute the matrix square roots required, we implemented a MATLAB function that uses the scaled Denman-Beavers iteration described in [1] (Equation (6.28)). This iteration is defined by: , n being the dimension of matrix A. Although convergence is not guaranteed, if it does, then X k converges to A 1/2 . In Step 5, we use the following approximation for calculating α from (29) (see [46,47]): To compute approximately the 1-norm of ||(A − I) m k || and ||(A − I) m k +1 ||, the 1-norm estimation algorithm of [48] can be employed. In order to implement Algorithm 1 in the most efficient way, it is worth noting that ||( Algorithm 1 Given a matrix A ∈ C n×n , a maximum approximation order m k , k ∈ {1, 2, 3, · · · , 13}, and a maximum number of iterations maxiter, this algorithm computes L = log(A) by a Taylor approximation of order lower that or equal to m k .
if not f inish then Then, Step 12 finds the most appropriate order of the approximation polynomial to be used, always with the aim of reducing the computational cost, but without any loss of accuracy. For efficiency reasons, an algorithm based on the well-known bisection method of function root search was adopted. In Step 13, the matrix logarithm approximation of A − I is computed by means of the efficient polynomial evaluations presented in Sections 2.1 and 2.2. Finally, the squaring phase takes place in Step 14,and in Step 15, the postprocessing operation L = TLT −1 is required to provide the logarithm of matrix A if the preprocessing stage was previously applied to it.

Numerical Experiments
In this section, we compare the following five MATLAB codes in terms of accuracy and efficiency: • logm_iss_full: It calculates the matrix logarithm using the transformation-free form of the inverse scaling and squaring method with Padé approximation. Matrix square roots are computed by the product form of the Denman-Beavers iteration ( [1], Equation (6.29)). It corresponds to Algorithm 5.2 described in [32], denoted as the iss_new code; • logm_new: It starts with the transformation of the input matrix A to the Schur triangular form A = QTQ * . Thereafter, it computes the logarithm of the triangular matrix T using the inverse scaling and squaring method with Padé approximation. The square roots of the upper triangular matrix T are computed by the Björck and Hammarling algorithm ( [1,49], Equation (6.3)), solving one column at a time. It is an implementation of Algorithm 4.1 explained in [32], designated as the iss_schur_new function; • logm: It is a MATLAB built-in function in charge of working out the matrix logarithm from the algorithms included in [32,33]. MATLAB R2015b incorporated an improved version that applies inverse scaling and squaring and Padé approximants to the whole triangular Schur factor, whereas the previous logm applied this technique to the individual diagonal blocks in conjunction with the Parlett recurrence. Although it follows a similar main scheme to the logm_new code, the square roots of the upper triangular matrix T are computed using a recursive blocking technique from the Björck and Hammarling recurrence, which works outs the square roots of increasingly smaller triangular matrices [50]. The technique is rich on matrix multiplications and has the requirement of solving the Sylvester equation. In addition, it allows parallel architectures to be exploited simply by using threaded BLAS; • logm_polf: This is an implementation of Algorithm 1, where Taylor matrix polynomials are evaluated thanks to the Sastre formulas previously detailed. Values from the set m ∈ {2, 4, 8, 14+, 21+, 27+, 33+, 39+, 45+, 52+, 59+, 67+, 75+} were used as the approximation polynomial order for all the tests carried out in this section, and forward relative errors were considered; • logm_polb: It is an identical implementation to the previous code, but it is based on relative backward errors.
A testbed consisting of the following three sets of different matrices was employed in all the numerical experiments performed. In addition, MATLAB Symbolic Math Toolbox with 256 digits of precision was the tool chosen to obtain the "exact"matrix logarithm, using the vpa (variable-precision floating-point arithmetic) function: • Set 1: The 100 diagonalizable 128 × 128 complex matrices. They have the form A = V × D × V T , where D is a diagonal matrix with complex eigenvalues and V is an orthogonal matrix such as V = H/ √ n, H being a Hadamard matrix and n the matrix order. The 2-norm of these matrices varies from 0.1 to 300. The "exact"matrix logarithm was calculated as log (A) = V × log (D) × V T ; • Set 2: The 100 nondiagonalizable 128 × 128 complex matrices computed as A = V × J × V T , where J is a Jordan matrix with complex eigenvalues whose algebraic multiplicity is randomly generated between 1 and 3. V is an orthogonal random matrix with elements in progressively wider intervals from [−2.5, 2.5], for the first matrix, to [−250, 250], for the last one. As the 2-norm, these matrices range from 3.39 to 337.72. The "exact"matrix logarithm was computed as log (A) = V × log (J) × V T ; • Set 3: The 52 matrices A from the Matrix Computation Toolbox (MCT) [51] and the 20 from the Eigtool MATLAB Package (EMP) [52], all of them with a 128 × 128 order. The "exact"matrix logarithm was calculated by means of the following three-step procedure: 1.
Calling the MATLAB function eig and providing, as a result, a matrix V and a diagonal matrix D such that A = V × D × V −1 , using the vpa function. If any of the elements of matrix D, i.e., the eigenvalues of matrix A, have a real value less than or equal to zero, this will be replaced by the result of the sum of its absolute value and a random number in the interval [0, 1]. Thus, a new matrix D will be generated. Next, matrices A and L 1 will be calculated in the form Computing the matrix logarithm thanks to the logm and vpa functions, that is L 2 = logm( A);

3.
Considering the "exact"matrix logarithm of A only if: A total of forty-six matrices, forty from the MCT and six from the EMP, were considered for the numerical experiments. The others were excluded for the following reasons: -Matrices 17, 18, and 40 from the MCT and Matrices 7, 9, and 14 from the EMP did not pass the previous procedure for the "exact"matrix logarithm computation; - The relative error incurred by some of the codes in comparison was greater than or equal to unity for Matrices 4, 6, 12, 26, 35, and 38 belonging to the MCT and Matrices 1, 4, 10, 19, and 20 pertaining to the EMP. The reason was the ill-conditioning of these matrices for the matrix logarithm function; -Matrices 2, 9, and 16, which are part of the MCT, and Matrices 15 and 18, incorporated in the EMP, resulted in a failure to execute the logm_iss_full code. More specifically, the error was due to the fact that the function sqrtm_dbp, responsible for computing the principal square root of a matrix using the product form of the Denman-Beavers iteration, did not converge after a maximum allowed number of 25 iterations; - Matrices 8,11,13, and 16 from the EMP were already incorporated in the MCT.
If we had not taken the previously described precaution of modifying all those matrices with eigenvalues less than or equal to zero, they would have been directly discarded, and evidently, the number of matrices that would be part of this set would be much more reduced.
The accuracy of the distinct codes for each matrix A was tested by computing the normwise relative error as: where log(A) means the "exact"solution and log(A) denotes the approximate one. All the tests were carried out on a Microsoft Windows 10 ×64 PC system with an Intel Core i7-6700HQ CPU @2.60Ghz processor and 16 GB of RAM, using MATLAB R2020b. First of all, three experiments were performed, one for each of the three above sets of matrices, respectively, in order to evaluate the influence of the type of error employed by comparing the numerical and computational performance of logm_polf and logm_polb. Table 4 shows the percentage of cases in which the normwise relative error of logm_polf was lower, greater than, or equal to that of logm_polb. As expected, according to the Θ and Θ values exposed in Table 3, both codes gave almost identical results, with a practically inappreciable improvement for logm_polf in the case of Sets 1 and 3. The computational cost of logm_polf and logm_polb was basically identical, although slightly lower in the case of the latter code, since the logm_polb function required, on some cases, approximation polynomials of lower orders than those of logm_polf. Both of them performed an identical number of matrix square roots.
After analyzing these results, we can conclude that there are no significant differences between the two codes, and either of them could be used in the remainder of this section to establish a comparison with the other ones. Be that as it may, the code chosen to be used for this purpose was logm_polf, and from here on, we designate it as logm_pol, a decision justified by its numerical and computational equivalence to logm_polb.
Having decided which code based on the Taylor approximation to use, we now assess its numerical peculiarities and those of the other methods for our test battery. Table 5 sets out a comparison, in percentage terms, of the relative errors incurred by logm_pol with respect to logm_iss_full, logm_new, and logm. As we can see, logm_pol outperformed logm_iss_full in accuracy in 100% of the matrices in Sets 1 and 2 and 97.83% of them in Set 3. logm_pol achieved better accuracy than logm_new and logm in 97%, 89%, and 89.13% of the matrices for Sets 1, 2, and 3, respectively.
Graphically, Figures 1-3 provide the results obtained according to the experiments carried out for each of the sets of matrices. More in detail, the normwise relative errors (a), the performance profiles (b), the ratio of the relative errors (c), the lowest and highest relative error rates (d), the polynomial or diagonal Padé orders (e), the execution times (f), and the ratio of the execution times (g) for the four codes analyzed can be appreciated. Figures 1a-3a illustrate the normwise relative errors assumed by each of the codes when calculating the logarithm of the different matrices that comprise each of the sets. The solid line that appears in them depicts the function k log u, where k log (or cond) represents the condition number of the matrix logarithm function ([1], Chapter 3) and u is the unit round off. The numerical stability of each method is evidenced if its errors are not much higher than this solid line. The matrices were ordered by decreasing value of cond.
Roughly speaking, logm_pol obtained the lowest error values for most of the test matrices, whereas the top values corresponded to logm_iss_full, for Sets 1 and 2, or to logm_new and logm, for Set 3. Indeed, in the case of the first two sets, the vast majority of errors incurred by logm_pol are below the continuous line, while those of logm_new and logm are around this line and, in the case of logm_iss_full, clearly above it. The results provided by logm_new and logm were identical in the different experiments performed, with the exception of Set 3, where they were very similar. With regard to Set 3, the high condition number of the logarithm function for some test matrices meant that the maximum relative error committed by logm_pol punctually reached in one case the order of 10 −3 , closely followed by the error recorded by logm_iss_full. Nevertheless, for this third set of matrices and except for a few concrete exceptions, all methods provided results whose errors are usually located below the k log u line.     Accordingly, these results showed that the numerical methods, on which the different codes were based, are stable, especially in the case of Taylor and its implementation in the logm_pol function, whose relative errors, as mentioned before, are always positioned in lowest positions on the graphs, thus delivering excellent results. It should be clarified that Matrices 19,21,23,27,51, and 52 from the MCT and Matrix 17 from the EMP are not plotted in Figure 3a because of the excessive condition number of the logarithm function, although it was taken into account in all other results.  Figures 1b-3b trace the performance profile of the four codes where, on the x-axis, α takes values from 1-5 with increments of 0.1. Given a specific value for the α coordinate, the corresponding amount p on the y-axis denotes the probability that the relative error committed by a particular algorithm is less than or equal to α-times the smallest relative error incurred by all of them.
In consonance with the results given in Table 5, the performance profiles showed that logm_pol was the most accurate function for the three matrix sets. In the case of the matrices belonging to the first two sets, the probability reached practically 100% in most of the figures, while it tended to be around 95% for the matrices that comprised the third set. Regarding the other codes, logm_new and logm offered practically identical results in accuracy, with a few more noticeable differences in the initial part of Figure 3b in favor of logm_new. On the other hand, logm_iss_full was significantly surpassed by logm_new and logm in Sets 1 and 2, with the opposite being true for Set 3, although in a more discrete manner.
The ratios in the normwise relative errors between logm_pol and the other tested codes are presented, in decreasing order according to Er(logm_iss_full)/Er(logm_pol), in Figures 1c-3c. For the vast majority of the matrices in Sets 1 and 2, the values plotted are always greater than one, which again confirmed the superiority in terms of accuracy of logm_pol over the other codes, especially for logm_iss_full. Similar conclusions can be extracted from the analysis of the results for Set 3, although, in this case, the improvement of logm_pol was more remarkable over logm_new and logm for a small group of matrices.
In the form of a pie chart, the percentage of matrices in our test battery in which each method yielded the smallest or the largest normwise relative error over the other three ones is displayed in Figures 1d-3d. Thus, on the positive side, logm_pol resulted in the lowest error in 94%, 86%, and 89% of the cases for, respectively, each of the matrix sets. Additionally, logm_pol never gave the most inaccurate result in the first two matrix sets and only gave the worst result in 2% of the matrices in Set 3. As can be seen, logm_iss_full gave the highest relative errors in 94% and 88% of the matrices correspondingly belonging to Set 1 and Set 2. On the contrary, the highest error rates were shared between logm_new and logm in Set 3.
The value of parameter m, or, in other words, the Taylor approximation polynomial order, in the case of logm_pol, or the Padé approximant degree, in the case of logm_iss_full, logm_new, and logm, is shown in Figures 1e-3e, respectively, for the three sets. It should be clarified that the value and meaning of m is not comparable between logm_pol and the rest of the codes. Additionally, and for the three matrix sets, Table 6 collects those values of m, together with those ones of s, i.e., the number of matrix square roots computed, in the form of maxima, minima, means, and medians. Whereas the logm_pol function always required values of m much higher than the others, the values of s were more similar among the different codes, always reaching the smallest values on average in the case of logm_pol. Consequently, logm_pol needed to calculate a lower number of square roots compared to those required by the other codes. Moreover, it should be pointed out that logm_iss_full, logm_new, and logm were forced to use an atypically high value of s for a few matrices in Set 3. In all the experiments performed, logm_new and logm always employed an identical value of m and s.
On the other hand, Figures 1f-3f and Table 7 inform us about the execution times spent by the different codes analyzed on computing the logarithm of all the matrices that constituted our testbed. To obtain these overall times, the runs were launched six times, discarding the first one and calculating the average time among the others. As we can appreciate, logm was always the fastest function, followed by logm_iss_full, for Sets 1 and 2, or by logm_pol, for Set 3. In contrast, logm_new always invested the longest time for our three experiments. Numerically speaking, and according to the times recorded for Set 3, logm_pol was 1.74-and 2.04-times faster than logm_iss_full and logm_new, respectively. This notwithstanding, logm_pol and the other codes were clearly outperformed by logm, whose speed of execution in computing the logarithm of this third set of matrices was 2.71-times higher than that of logm_pol. In more detail, Figures 1g-3g provide the ratio between the computation time of the other codes and logm_pol, individually for each of the matrices, in a decreasing order consistent with the factor T(logm_iss_full)/T(logm_pol). As an example, for Set 3, this ratio took values from 0.28 to 15.11 for logm_iss_full, from 0.32 to 28.03 for logm_new, and from 0.05 a 3.93 for logm.
Lastly, it is worth mentioning that, after performing the pertinent execution time profiles, it could be appreciated that much of the time spent by all the codes was involved in the square root computation. Indeed, observe that logm resulted in better execution times than logm_new, despite being two codes that share a similar theoretical scheme. This was mainly due to two reasons: first, because the recursive approach that logm follows to compute the square root of a triangular matrix is much more efficient than that of logm_new; second, this is due to implementation issues since, among others, logm is composed of distinct built-in functions in charge of, for example, calculating the Schur form, solving the Sylvester equation, or solving systems of linear equations, resulting in a very cost-effective code that is not possible to beat in speed as long as all its improvements are not also implemented in its competitors. Table 7. Execution time (T), in seconds, corresponding to logm_pol, logm_iss_full, logm_new, and logm for computing matrices in the three sets.

Conclusions
In this paper, an algorithm to approximate the matrix logarithm was designed. It uses recent matrix polynomial evaluation formulas corresponding to the Taylor series expansion of the logarithm function, which offers a lower computational cost than the Paterson-Stockmeyer method, but similar accuracy. From this algorithm, two MATLAB codes, according to relative forward or backward error analyses, were implemented. They were initially compared with each other and subsequently with MATLAB Codes 4.1 and 5.2 from [32], as well as with the logm built-in MATLAB function. For that, a testbed composed of three distinct types of matrices was used. The numerical results showed that the new Taylor-based codes are usually more accurate than the other state-of-the-art ones, with an intermediate execution time among all of them. Funding: This research was funded by the European Regional Development Fund (ERDF) and the Spanish Ministerio de Economía y Competitividad Grant TIN2017-89314-P.