Improving the Accuracy of the Fast Inverse Square Root by Modifying Newton–Raphson Corrections

Direct computation of functions using low-complexity algorithms can be applied both for hardware constraints and in systems where storage capacity is a challenge for processing a large volume of data. We present improved algorithms for fast calculation of the inverse square root function for single-precision and double-precision floating-point numbers. Higher precision is also discussed. Our approach consists in minimizing maximal errors by finding optimal magic constants and modifying the Newton–Raphson coefficients. The obtained algorithms are much more accurate than the original fast inverse square root algorithm and have similar very low computational costs.


Introduction
Efficient performance of algebraic operations in the framework of floating-point arithmetic is a subject of considerable importance [1][2][3][4][5][6]. Approximations of elementary functions are crucial in scientific computing, computer graphics, signal processing, and other fields of engineering and science [7][8][9][10]. Our aim is to compute elementary functions at a very low computational cost without using memory resources. Direct evaluation of functions could be of interest in any systems where storage capabilities challenge the processing of a large volume of data. This problem is crucial, for instance, in high-energy physics experiments [11][12][13].
In this paper, we consider approximation and fast computation of the inverse square root function, which has numerous applications (see [8,10,[14][15][16][17]), especially in 3D computer graphics, where it is needed for normalization of vectors [4,18,19]. The proposed algorithms are aimed primarily at floating-point platforms with limited hardware resources, such as microcontrollers, some field-programmable gate arrays (FPGAs), and graphics processing units (GPUs) that cannot use fast look-up table (LUT)-based hardware instructions, such as SSE (i.e., Streaming SIMD (single instruction, multiple data) Extensions) or Advanced Vector Extensions (AVX). We mean here devices and chips containing floating-point multipliers, adders-subtractors, and fused-multiply adders. Therefore, our algorithms can easily be implemented on such a platform. We also offer them as an alternative to library functions that provide full precision, but are very time consuming. This was the motivation for considering the cases of higher precision in Section 3.2. By selecting the precision and number of iterations, the desired accuracy can be obtained. We propose the use of our codes as direct insertions into more general algorithms without referring to the corresponding library of mathematical functions. In the double-precision mode, most modern processors do not have SSE instructions like rsqrt (such instructions appeared only in AVX-512, which is supported only by the latest processor models). In such cases, one can use our algorithms (with the appropriate number of iterations) as a fast alternative to the library function 1/sqrt(x).
In most cases, the initial seed needed to start the approximation is taken from a memory-consuming look-up table (LUT), although the so-called "bipartite table methods" (actually used on many current processors) make it possible to considerably lower the table sizes [20,21]. The "fast inverse square root" code works in a different way. It produces the initial seed in a cheap and effective way using the so-called magic constant [4,19,[22][23][24][25]. We point out that this algorithm is still useful in numerous software applications and hardware implementations (see, e.g., [17,[26][27][28][29][30]). Recently, we presented a new approach to the fast inverse square root code InvSqrt, presenting a rigorous derivation of the well-known code [31]. Then, this approach was used to construct a more accurate modification (called InvSqrt1) of the fast inverse square root (see [32]). It will be developed and generalized in the next sections, where we will show how to increase the accuracy of the InvSqrt code without losing its advantages, including the low computational cost. We will construct and test two new algorithms, InvSqrt2 and InvSqrt3.
The main idea of the algorithm InvSqrt consists in interpreting bits of the input floating-point number as an integer [31]. In this paper, we consider positive floating-point normal numbers and, in Section 3.1, we also consider subnormal numbers. We use the standard IEEE-754, where single-precision floating-point numbers are encoded with 32 bits. For positive numbers, the first bit is zero. The next eight bits encode e x , and the remaining 23 bits represent the mantissa m x . The same 32 bits can be treated as an integer I x : where N m = 2 23 and B = 127. In this case B + e x is a natural number not exceeding 254. The case of higher precision is analogous (see Section 3.2). The crucial step of the algorithm InvSqrt consists in shifting all bits to the right by one bit and subtracting the result of this operation from a "magic constant" R (and the optimum value of R has to be guessed or determined). In other words, Originally, R was proposed as 0x5F3759DF (see [19,23]). Interpreted in terms of floating-point numbers, I y 0 approximates the inverse square root function surprisingly well (y 0 ≈ y = 1/ √ x). This trick works because (3) is close to dividing the floating-point exponent by −2. The number R is needed because the floating-point exponents are biased (see (2)).
The magic constant R is usually given as a hexadecimal integer. The same bits encode the floating-point number R f with an exponent e R and mantissa m R . According to (1), In [31], we have shown that if e R = 1 2 (B − 1) (e.g., e R = 63 in the 32-bit case), then the function (3) (defined on integers) is equivalent to the following piece-wise linear function (when interpreted in terms of corresponding floating-point numbers): where m R is the mantissa of R (i.e., m R := N −1 m R − N −1 m R ), and, finally, µx = 0 or µx = 1 depending on the parity of the last digit of the mantissa ofx.
The function µx is two-valued, so a given parameter t may correspond to either two values of R or one value of R (when the term containing µx has no influence on the bits of the mantissa m R ). The function y = 1/ √ x, the function (3), and all Newton-Raphson corrections considered below are invariant under the scalingx = 2 −2n x andỹ = 2 n y for any integer n. Therefore, we can confine ourselves to numbers from the interval [1,4). Here and in the following, the tilde always denotes quantities defined on the interval [1,4).
In this paper, we focus on the Newton-Raphson corrections, which form the second part of the InvSqrt code. Following and developing ideas presented in our recent papers [31,32], we propose modifications of the Newton-Raphson formulas, which result in algorithms that have the same or similar computational cost as/to InvSqrt, but improve the accuracy of the original code, even by several times. The modifications consist in changing both the Newton-Raphson coefficients and the magic constant. Moreover, we extend our approach to subnormal numbers and to higher-precision cases.

Modified Newton-Raphson Formulas
The standard Newton-Raphson correctionsỹ 1 andỹ 2 for the zeroth approximationỹ 0 given by (4) are given by the following formulas: (analogous formulas hold for the next corrections as well; see [31]). The relative error functionsδ j (x, t) (where j = 0, 1, 2, . . .) can be expressed as: The functionδ 0 (x, t), which is very important for the further analysis, is thoroughly described and discussed in [31]. Using (7), we substituteỹ j = (1 +δ j )/ √x (for j = 0, 1, 2, . . .) into (6),x cancels out, and the formulas (6) assume the following form: whereδ j =δ j (x, t). We immediately see that every correction increases the accuracy, even by several orders of magnitude (due to the factorδ 2 j−1 ). Thus, a very small number of corrections is sufficient to reach the machine precision (see the end of Section 4).
In this paper we confine ourselves to the case t = t 1 (i.e., we assume t 2 = t 1 ) because the more general case (where the magic constant is also optimized with respect to the assumed number of iterations) is much more cumbersome, and the related increase in accuracy is negligible. Then, we get for details, see [31]. The theoretical relative errors are given by The superscript (0) indicates values corresponding to the algorithm InvSqrt (other superscripts will denote modifications of this algorithm).
The idea of increasing the accuracy by a modification of the Newton-Raphson formulas is motivated by the fact thatδ k (x, t) 0 for anyx (see [31,32]). Therefore, we can try to shift the graph ofδ 1 upwards (making it more symmetric with respect to the horizontal axis). Then, the errors of the first correction are expected to decrease twice and the errors of the second correction are expected to decrease by about eight times (for more details, see [32]). Indeed, according to (8), reducing the first correction by a factor of 2 will reduce the second correction by a factor of 4. The second correction is also non-positive, so we may shift the graph ofδ 2 , once more improving the accuracy by the factor of 2. This procedure can be formalized by postulating the following modification of the Newton-Raphson formulas (6):ỹ where a k + b k = 1 for k = 1, 2. Thus, we have four independent parameters (d 1 , d 2 , a 1 , and a 2 ) to be determined. In other words, where four coefficients c jk can be expressed by the four coefficients a k and d k : We point out that the Newton-Raphson corrections and any of their modifications of the form (13) are obviously invariant with respect to the scaling mentioned at the end of Section 1. Therefore, we can continue to confine our analysis to the interval [1,4).
Below, we present three different algorithms (InvSqrt1, InvSqrt2, InvSqrt3) constructed along the above principles (the last two of them are first introduced in this paper). They will be denoted by superscripts in parentheses, e.g.,ỹ (N) k means the kth modified Newton-Raphson correction to the algorithm InvSqrt N. We always assume that the zeroth approximation is given by (4), i.e.,ỹ and relative error functions, ∆ j , are expressed as We point out that the coefficients of our algorithms are obtained without taking rounding errors into account. This issue will be shortly discussed at the end of Section 4.

Algorithm InvSqrt1
Assuming a 1 = a 2 = 0 and b 1 = b 2 = 1, we transform (12) intõ y (1) Therefore,ỹ 1 )|| (for details, see [32]). As a result, we get: and t (1) 1 (see (10)). Therefore, R (1) = R (0) , i.e., InvSqrt1 has the same magic constant as InvSqrt. The theoretical relative errors are given by The algorithm (17) can be written in the form (13), where: Taking into account numerical values for d This large number of digits, which is much higher than that needed for the singleprecision computations, will be useful later in the case of higher precision.
Thus, finally, we obtained a new algorithm InvSqrt1 that has the same structure as InvSqrt, but with different values of numerical coefficients (see [32]). In the case of two iterations, the code InvSqrt1 has more algebraic operations (one additional multiplication) in comparison to InvSqrt.

InvSqrt2 Algorithm
Assuming a 1 = a 2 = 1 and b 1 = b 2 = 0, we transform (12) intõ whereỹ (2) 1 andỹ (2) 2 depend onx, t, d 1 , and d 2 . Parameters t = t 1 )|| (see Appendix A.1 for details). As a result, we get: and t The theoretical relative errors are given by The coefficients in (13) are given by Taking into account the numerical values for d (2) 1 and d 2 (see (23)), we obtain the following values of the parameters c (2) jk : where the large number of digits will be useful later in the case of higher precision. Thus, we completed the derivation of the code InvSqrt2: float y = *(float*) &i; 6.
y* = 1.50000057f -halfx*y*y; 8. return y; 9. } The code InvSqrt2 contains a new magic constant (R (2) ) and has two lines (6 and 7) that were modified in comparison with the code InvSqrt. We point out that InvSqrt2 has the same number of algebraic operations as InvSqrt.

InvSqrt3 Algorithm
Now, we consider the algorithm (13) in its most general form: where k j , k 2 , k 3 , and k 4 are constant. In Appendix A.2, we determine parameters t = t  1 (x, t)||. Then, the parameters k 3 and k 4 are determined by minimization of ||∆ 1 )||. As a result, we get: and t The theoretical relative errors are given by They are significantly smaller (by 26% and 45%, respectively) than the analogous errors for InvSqrt1 and InvSqrt2 (see (19) and (25)). The comparison of error functions for InvSqrt2 and InvSqrt3 (in the case of one correction) is presented in Figure 1.  1 (x, t (2) ), while the dashed line represents ∆ The numerical values of coefficients c (3) ij (compare with (13)) are given by: Thus, we obtained the following code, called InvSqrt3: float y = *(float*) &i; 5.
y* = 1.50000036f -0.500000053f*x*y*y; 7. return y; 8. } The code InvSqrt3 has the same number of multiplications as InvSqrt1, which means that it is slightly more expensive than InvSqrt and InvSqrt2.

Generalizations
The codes presented in Section 2 can only be applied to normal numbers (1) of the type float. In this section, we show how to extend these results to subnormal numbers and to higher-precision formats.

Subnormal Numbers
Subnormal numbers are smaller than any normal number of the form of (1). In the single-precision case, positive subnormals can be represented as m x · 2 −126 , where m x ∈ (0, 1). They can also be characterized by nine first bits equal to zero (which also includes the case where x = 0). In order to identify subnormals, we will make a bitwise conjunction (AND) of a given number with the integer 0x7 f 800000, which has all eight exponent bits equal to 1 and all 23 mantissa bits equal to 0. This bitwise conjunction is zero if and only if the given number is subnormal (including 0).
In the case of the single precision, the multiplication by 2 24 transforms any subnormal number into a normal number. Therefore, we make this transformation; then, we apply one of our algorithms and, finally, make the inverse transformation (i.e., multiplying the result by 2 −12 ). Thus, we get an approximated value of the inverse square root of the subnormal number. Note that 2 24 is the smallest power of 2 with an even exponent that transforms all subnormals into normal numbers.
In the case of InvSqrt3, the procedure described above can be written in the form of the following code.
if (k==0) return 4096.f*y; //4096.f=pow(2.0f, 12) 13. return y; 14. } The maximum relative errors for this code are presented in Section 4 (see Table 1). Table 1. Relative numerical errors for the first and second corrections in the case of the type float (compiler 32-bit) for subnormal numbers.

Higher Precision
The above analysis was confined to the single-precision floating-point format. This is sufficient for many applications (especially microcontrollers), although the doubleprecision format is more popular. A trade-off between accuracy, computational cost, and memory usage is welcome [33]. In this subsection, we extend our analysis to double-and higher-precision formats. The calculations are almost the same. We just have to compute all involved constants with an appropriate accuracy. Low-bit arithmetic cases could be treated in exactly the same way. In this paper, however, we are focused on increasing the accuracy and on possible applications in distributed systems, so only the cases of higher precision are explicitly presented.
We present detailed results for double precision and some results (magic constants) for quadruple precision. Performing computations in C, we use the GCC Quad-Precision Math Library (working with numbers of type _float128). The crucial point is to express the magic constant R through the corresponding parameter t, which can be done with the formula: In the case of the zeroth approximation (without Newton-Raphson corrections), the parameter t is given by: which can be compared with [31]. The corresponding magic constants computed from the formula (33) read: In this paper, we focus on the case of Newton-Raphson corrections, where the value of the parameter t may depend on the algorithm. For InvSqrt and InvSqrt1, we have: Actually, the above value of R in the 64-bit case (i.e., R (1D) ) corresponds to µx = 1 (the same value of R was obtained by Robertson for InvSqrt [24] with a different method). For µx = 0, we got an R greater by 1 (other results reported in this section do not depend on µx). In the 128-bit case, Robertson obtained an R that was 1 less than our value (i.e., R (1Q) ).
In the case of InvSqrt2, we have (compare with (A11)), which yields: 32-bit: R (2) = 0x5F376908 64-bit: R (2D) = 0x5FE6ED2102DCBFDA 128-bit: Finally, for InvSqrt3, we obtained: (see (A29)). The corresponding magic constants are given by: The parameters of the modified Newton-Raphson corrections for the higher-precision codes can be computed from the theoretical formulas used in the single-precision cases, taking into account an appropriate number of significant digits. In numerical experiments, we tested the algorithms InvSqrt1D, InvSqrt2D, and InvSqrt3D with the magic constants R (1D) , R (2D) , and R (3D) , respectively, and the following coefficients in the modified Newton-Raphson iterations (compare with (21) The algorithm InvSqrt and its improved versions are usually implemented in the single-precision case with no more than two Newton-Raphson corrections. However, in the case of higher precision, higher accuracy of the result is welcome. Then, a higher number of modified Newton-Raphson iterations could be considered. As an example, we present the algorithm InvSqrt2D with four iterations: long long i=*(long long*) &x; 4.
y* = 1.5000000000000000 -halfx*y*y; 10. return y; 11. } By removing Line 9, we obtain the code InvSqrt2D with three iterations, and by also removing Line 8, we get the code defined by (44). The maximum relative errors for this code are presented in Section 4 (see (52)).

Numerical Experiments
The numerical tests for the codes derived and presented in this paper were performed on an Intel Core i5-3470 processor using the TDM-GCC 4.9.2 32-bit compiler (when repeating these tests on the Intel i7-5700 processor, we obtained the same results, and comparisons with some other processors and compilers are given in Appendix B). In this section, we discuss round-off errors for the algorithms InvSqrt2 and InvSqrt3 (the case of single precision and two Newton-Raphson iterations) and then present the final results of analogous analysis for other codes described in this paper.
Applying algorithms InvSqrt2 and InvSqrt3, we obtain relative errors that differ slightly, due to round-off errors, from their analytical values (see Figures 2 and 3; compare with [32] for an analogous discussion concerning InvSqrt1). Although we present only randomly chosen values in the figures, calculations were done for all float numbers x such that e x ∈ [−126, 128).  2 (x, t (2) ), dashed lines correspond to ∆ (2) , and dots represent errors for 4000 floating-point numbers x randomly chosen from the interval (2 −126 , 2 128 ). Right: relative error ε (2) (see (47) (3) , and dots represent errors for 4000 floating-point numbers x randomly chosen from the interval (2 −126 , 2 128 ). Right: relative error ε (3) . Dashed lines correspond to minimum and maximum values of these errors, and dots denote errors for 2000 valuesx randomly chosen from the interval [1,4).

The errors of numerical values returned by InvSqrt2
belong (for e x = −126) to the interval (−6.21 · 10 −7 , 6.53 · 10 −7 ). For e x = −126, we get a wider interval: [−6.46 · 10 −7 , 6.84 · 10 −7 ]. These errors differ from the errors ofỹ 2 (x, t (2) ), which were determined analytically (compare (25)). We define: This function, representing the observed blur of the float approximation of the InvSqrt2 output, is symmetric with respect to its mean value (see the right part of Figure 2), and covers the following range of values: Analogous results for the code InvSqrt3 read: The results produced by the same hardware with a 64-bit compiler have a greater amplitude of the error oscillations as compared with the 32-bit case (also compare Appendix B).
The maximum errors for the code InvSqrt and all codes presented in the previous sections are given in Table 2 (for codes with just one Newton-Raphson iteration) and Table 3 (the same codes but with two iterations).
Looking at the last column of Table 2 (this is the case of one iteration), we see that the code InvSqrt1 is slightly more accurate than InvSqrt2, and both are roughly almost two times more accurate than InvSqrt. However, it is the code InvSqrt3 that has the best accuracy. The computational costs of all these codes are practically the same (four multiplications in every case). In the case of two iterations (Table 3), the code InvSqrt3 is the most accurate as well. Compared with InvSqrt, its accuracy is 12 times higher for single precision and 14.5 times higher for double precision. However, the computational costs of InvSqrt1 and InvSqrt3 (eight multiplications) are higher than the cost of InvSqrt (seven multiplications). Therefore, the code InvSqrt2 has some advantage, as it is less accurate than InvSqrt3 but cheaper. In the single-precision case the code InvSqrt2 is 6.8 times more accurate than InvSqrt.
We point out that the round-off errors in the single-precision case significantly decrease the gain of the accuracy of the new algorithms as compared with the theoretical values, especially in the case of two Newton-Raphson corrections (compare the third and the last column of Table 3).
The range of errors in the case of subnormal numbers (using the codes described in Section 3.1) is shown in Table 1. One can easily see that the relative errors are similar-in fact, even slightly lower-than in the case of normal numbers.
Although the original InvSqrt code used only one Newton-Raphson iteration, and in this paper, we focus mostly on two iterations, it is worthwhile to also briefly consider the case of more iterations. Then, the increased computational cost is accompanied by increased accuracy. We confine ourselves to the code InvSqrt2 (see the end of Section 3.2), which is less expensive than InvSqrt3 (and the advantage of InvSqrt2 increases with the number of iterations). In the double-precision case, the maximum error for three Newton-Raphson corrections is much lower, and the fourth correction yields the best possible accuracy. ∆ (2) 1D,N = 0.87908 × 10 −3 , ∆ (2) 2D,N = 0.57968 × 10 −6 , ∆ (2) 3D,N = 2.5213 × 10 −13 , ∆ In the case of single precision, we already get the best possible accuracy for the third correction, given by adding the line y* = 1.5f -halfx*y*y as Line 8 in the code InvSqrt2 (see Section 2.2). ∆ The derivation of all numerical codes presented in this paper did not take rounding errors into account. Therefore, the best floating-point parameters can be slightly different from the rounding of the best real parameters, all the more so since the distribution of the errors is still not exactly symmetric (compare fourth and fifth columns in Tables 2 and 3). The full analysis of this problem is much more difficult than the analogous analysis for the original InvSqrt code because we now have several parameters to be optimized instead of a single magic constant. At the same time, the increase in accuracy is negligible. Actually, much greater differences in the accuracy appear in numerical experiments as a result of using different devices (see Appendix B).
As an example, we present the results of an experimental search in the case of the code InvSqrt3 with one Newton-Raphson correction (three parameters to be optimized). The modified Newton-Raphson coefficients are found to be Figure 4 summarizes the last step of this analysis. The dependence of maximum errors on R shows clearly that the optimum value for the magic constant is slightly shifted as compared to the theoretical (real) value: The corresponding errors given by 1,N max = 6.50112284 · 10 −4 , ∆ 1,N min = −6.501092575 · 10 −4 (56) are nearly symmetric. They are smaller than the maximum error ∆ 1,N corresponding to our theoretical values, but only by about 0.001% (see Table 2).

Conclusions
We presented two new modifications (InvSqrt2 and InvSqrt3) of the fast inverse square root code in single-, double-, and higher-precision versions. Each code has its own magic constant. All new algorithms are much more accurate than the original code InvSqrt. One of the new algorithms, InvSqrt2, has the same computational cost as InvSqrt in the case of any precision. Another code, InvSqrt3, has the best accuracy, but is more expensive if the number of Newton-Raphson corrections is greater than 1. However, its gain in accuracy is very high, even by more than 12 times for two iterations (see Table 3 in Section 4).
Our approach was to modify the Newton-Raphson method by introducing arbitrary parameters, which are then determined by minimizing the maximum relative error. It is expected that such modifications will provide a significant increase in accuracy, especially in the case of asymmetric error distribution for Newton-Raphson corrections (and this is the case with the inverse square root function when these corrections are non-positive). One has to remember that due to rounding errors, our theoretical results may differ from the best floating-point parameters, but the difference is negligible (see the end of Section 4). In fact, parameters (magic constants and modified Newton-Raphson coefficients) from a certain range near the values obtained in this article seem equally good for all practical purposes.
Concerning potential applications, we have to acknowledge that for general-purpose computing, the SSE and AVX reciprocal square root instructions are faster and more accurate. We hope, however, that the proposed algorithms can be applied in embedded systems and microcontrollers without a hardware floating-point divider, and potentially in FPGAs. Moreover, in contrast to the SSE and AVX instructions, our approach can be easily extended to computational platforms of high precision, like 256-bit or 512-bit platforms.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Analytical Derivation of Modified Newton-Raphson Coefficients
Appendix A.1. Algorithm InvSqrt2 We will determine the parameters d 1 and d 2 in formulas (22) that minimize the maximum error. Substituting (16) (with n = 2) into (22), we get: where ∆ 1 (x, t, d 1 ) andδ 0 (x, t) is the relative error of the zeroth approximation (the functionδ 0 (x, t) is presented and discussed in [31,32]).
First, we are going to determine the t and d (2) 1 that minimize the maximum absolute value of the relative error of the first correction. We have to solve the following equation: Its solutionδ corresponds to the value which is a maximum of ∆ 1 (x, t) because its second derivative with respect tox, i.e., is negative. In order to determine the dependence of d 1 on the parameter t, we solve the equation − ∆ which (for some t = t 1 ) equates the maximum value of error with the modulus of the minimum value of error. Thus, we obtain the following relations: where The last step consists in equating the minimum boundary value of the error of analyzed correction with its smallest local minimum: where x I I 0 = (4 + t)/3 (see [31]). Solving the Equation (A10) numerically, we obtain the following value of t: which corresponds to the following magic constant: Taking into account (A8), we compute and, using (A4) and (A5), we get In the case of the second correction, we keep the obtained value t = t 1 and determine the parameter d (2) 2 , which equates the maximum value of the error with the modulus of its global minimum. ∆ 1 ) is increasing (decreasing) with respect to negative (positive) ∆ 1 ) and has local minima that come only from positive maxima and negative minima. Therefore, the global minimum should correspond to the global minimum −∆ 1 ), we obtain that deeper minima of (−∆ (2) 2 max ) come from the global minimum of the first correction: and the maximum, by analogy to the first correction, corresponds to the following value of ∆ 1 ): Solving the equation Parameters k 1 , k 2 , k 3 , and k 4 in the formula (28) will be determined by minimization of the maximum error. The relative error functions for (28) are given by: where j = 1, 2. Substituting (A19) into (28), we obtain: where k = (k 1 , k 2 , k 3 , k 4 ) andδ 0 (x, t) is the relative error of the zeroth approximation (see [31,32]). We are going to find parameters t and k such that the error functions take extreme values. We begin with ∆

Appendix B. Numerical Experiments Using Different Processors and Compilers
The accuracy of our codes depends, to some extent, on the devices used for testing. In Section 4, we limited ourselves to the Intel Core i5 with the 32-bit compiler. In this Appendix, we present, for comparison, data from other devices (Tables A1 and A2). All data are for the type float (single precision).
The first two columns with data correspond to the Intel Core i5 with the 32-bit compiler (described, in more detail, in Section 4), the next two columns correspond to the same processor, but with the 64-bit compiler. Then, we have results (the same) for three microcontrollers: STM32L432KC and TM4C123GH6PM (ARM Cortex-M4), as well as STM32F767ZIT6 (ARM Cortex-M7). The last two columns contain results for the ESP32-D0WDQ5 system with two Xtensa LX6 microprocessors.