Fast Switch and Spline Function Inversion Algorithm with Multistep Optimization and k-Vector Search for Solving Kepler’s Equation in Celestial Mechanics

: Obtaining the inverse of a nonlinear monotonic function f ( x ) over a given interval is a common problem in pure and applied mathematics, the most famous example being Kepler’s description of orbital motion in the two-body approximation. In traditional numerical approaches, this problem is reduced to solving the nonlinear equation f ( x ) − y = 0 in each point y of the co-domain. However, modern applications of orbital mechanics for Kepler’s equation, especially in many-body problems, require highly optimized numerical performance. Ongoing efforts continually attempt to improve such performance. Recently, we introduced a novel method for computing the inverse of a one-dimensional function, called the fast switch and spline inversion (FSSI) algorithm. It works by obtaining an accurate interpolation of the inverse function f − 1 ( y ) over an entire interval with a very small generation time. Here, we describe two signiﬁcant improvements with respect to the performance of the original algorithm. First, the indices of the intervals for building the spline are obtained by k-vector search combined with bisection, thereby making the generation time even smaller. Second, in the case of Kepler’s equation, a multistep method for the optimized calculation of the breakpoints of the spline polynomial was designed and implemented in Cython. We demonstrate results that accurately solve Kepler’s equation for any value of the eccentricity e ∈ [ 0, 1 − (cid:101) ] , with (cid:101) = 2.22 × 10 − 16 , which is the limiting error in double precision. Even with modest current hardware, the CPU generation time for obtaining the solution with high accuracy in a large number of points of the co-domain can be kept to around a few nanoseconds per point.


Introduction
The inversion of a nonlinear monotonic function f (x) is a fundamental mathematical problem that appears in many scientific and technological applications. A famous example is Kepler's equation [1][2][3], describing bound orbital motion in the two body approximation, where e ∈ [0, 1) is the eccentricity; y ≡ h p 2 (e 2 − 1) become equivalent in terms of generation time. These improvements can be applied to any monotonic function. For the case of Kepler's equation, we designed a multistep method that reduces the number of breakpoints used by the FSSI. This reduction is important for values of the eccentricity e close to the limit e → 1, and allows the FSSI to solve Kepler's equation accurately for every value of the eccentricity e ∈ [0, 1 − ], with being the intrinsic computational error ( = 2.22 × 10 −16 in double precision arithmetic). For e 0.9, the accuracy can be set to 10 −15 rad, while for e 0.9 it can be as low as O(10 −14 − 10 −13 ) rad for most values of e and y. Only for e close to the limit 1 − , which is also stiff in other methods [33][34][35], is there a small region, y 10 −9 rad, for which the accuracy is found to be limited to ∼2 × 10 −11 rad.
For the numerical calculations, all the routines have been implemented in Cython [36], a superset of the Python language. This language provides special data types and control-structure semantics for efficient and optimized translation to pure C language with Python bindings (through Python's C/API function call library). The result of this strategy is that when the Cython code is compiled as a shared-object binary (with the GNU C-compiler gcc), the code executes with the speed of a normal C binary (without the Python interpreter) but can be imported and called from a Python script.

Overview of the FSSI Algorithm
Let f (x) be an input function that is monotonic over a domain x ∈ [x min , x max ]. Hereafter, f (x) will also be assumed to be ascending, as in the case of Kepler's equation. The inversion of a descending function can be computed similarly, with simple changes, or by noting that f −1 = −(− f ) −1 and applying to − f the results that will be obtained for an ascending function.
The FSSI algorithm [27] can be used to obtain a piecewise cubic spline interpolation S(y) of the inverse function f −1 (y) for any y belonging to the co-domain [y min , y max ], with y min = f (x min ) and y max = f (x max ). This algorithm consists of two parts, which will be called "setup" and "generation" procedures, respectively.

Setup Procedure
When the function f (x) and its derivative f (x) are known continuous functions, such as the case of Kepler's problem, the setup procedure of the FSSI algorithm consists of the following basic steps: (i) Grid generation. A grid {x j } of n + 1 points is generated where (x 0 , · · · , x n ), with x 0 = x min and x n = x max . In the examples given in reference [27], these points were chosen to be equally spaced.
Here, the step sizes h j = x j+1 − x j , for j = 0, · · · , n − 1, are allowed to be a function of j. The values of h j are chosen in order to control the error at the desired level over the entire interval, as shown in Section 3. (ii) Computation of the breakpoints and coefficients. First, the values of the function, y j = f (x j ), which will provide the breakpoints of the spline interpolation of f −1 , and of the derivative of its inverse, d j = 1/ f (x j ), are computed on the grid {x j }. This is done given the assumption that f (x j ) = 0 over the entire domain. In practice, this demands that | f (x j )| should be larger than the numerical error over the interval [x min , x max ]. When applied to the solution of Kepler's equation (1), this constraint becomes f (x) = 1 − e cos x > for every x, or equivalently, e < 1 − . In fact, the multistep version of the FSSI algorithm provides a manner for obtaining the solution of Kepler's equation even in the limiting case e = 1 − with an accuracy down to ∼10 −11 rad.
Second, following reference [27], and defining δy j = y j+1 − y j , the coefficients of the spline interpolation S j (y) = ∑ 3 q=0 c (q) j y − y j q of the inverse function in the interval y j ≤ y < y j+1 are computed using the equations for j = 0, . . . , n − 1. These formulas have been derived assuming that none of the differences δy j = y j+1 − y j is zero. Numerically, the expressions for the coefficients c (2) j and c (3) j in Equation (3) seem to require not only |δy j | , but also (δy j ) 2 and |δy j | 3 for every j. In double precision arithmetic, this would imply that the grid spacing should be controlled to keep |δy j | 1/3 3 × 10 −5 . Fortunately, this requirement can be relaxed since d j = 1/ f (x j ) = h j /δy j + O(h j ), so that the opposite signs in the last two Equation (3) imply that Since these two coefficients have to be multiplied by (y − y j ) 2 and (y − y j ) 3 , respectively, with y ∈ [y j , y j+1 ), their contributions to the interpolation of the inverse function, S(x), are expected to be at most of the order h 2 j . Therefore, c j and c (3) j can be set to zero when h 2 j is smaller than the numerical precision, i.e., when h j < 1.49 × 10 −8 given in double precision arithmetic. These considerations are later used in the numerical implementation of the algorithm. (iii) Computation of the k-vector. If the generation procedure includes an implementation of the k-vector search algorithm [18][19][20] (as described below), the k-vector should also be computed at the setup stage. As discussed in Section 3, the main part of this computation can be performed during the j loop in point (ii) above. The implementation of the k-vector search in the FSSI algorithm is quite natural since they both use a setup process to minimize their generation time.
Since the array y of the n + 1 breakpoints obtained in point (ii) is given in ascending order, following references [18][19][20] we can define its associated k-vector k as an array of size n + 1 whose element k l is given by the maximum index j, such that y j ≤ m l + q, for l = 0, · · · , n. The constants ξ, m, and q are chosen to be ξ = (y max − y min ) , m = (y max − y min + 2ξ)/n and q = y min − ξ. Thus m l + q describes a line of equally spaced points such that y 0 = y min − ξ and y n = y max + ξ, which can be used for a fast interpolation search as discussed below in point (i) of the generation procedure.
These setup computations are executed once for a given function f (x) and a given error level, thereby generating the breakpoints y j and the coefficients of the spline interpolation S of the inverse function f −1 , along with the values of the k-vector. These values can be stored as tables of constants either in a file or in the RAM memory. As seen in Section 4 for the Kepler equation, high accuracy can be obtained with only O(10 3 − 10 4 ) grid points.

Generation Procedure
The spline breakpoints and coefficients obtained in the setup stage can then be used to evaluate S(Y) on any set (or array, in computer language) of N points Y = {Y a }, a = 0, . . . , N − 1 belonging to the interval [y min , y max ]. The generation procedure to compute the values of S(Y) on the output array Y consists of the following steps: (i) Search for the interval. For each point Y a ∈ Y, find the index j a of the breakpoints y j such that y j a ≤ Y a < y j a +1 , with j a an integer in the range 0, · · · , n − 1 and the possible values of a being a = 0, · · · , N − 1. This search can be executed using the bisection method [32], as in the examples of reference [27], or with a different search algorithm, such as the combination of the k-vector search and the bisection method (see Section 3). In the latter case, the k-vector is used for a first bracketing of the possible values of j a by computing the integer part (floor) j y = [m Y a + q] and the k-vector values k j y and k j y +1 . Following references [18][19][20], the index j a is constrained to be in the interval k j y ≤ j a < k j y +1 . As discussed in Section 4, a higher accuracy can be obtained with FSSI by selecting a slightly more conservative choice, including one more point in the bracketing, k j y − 1 ≤ j a < k j y +1 . Finally, a binary search is performed within this reduced interval to obtain the value of the index j a . This last execution is very fast since it only involves a very small number of points.
This task only requires a few multiplications and a sum involving numbers that have been computed previously and are given in a table; thus, it is very fast.

Errors of the FSSI Algorithm
If the function f (y) is infinitely differentiable, as for Kepler's equation, an excellent approximate upper limit for the accuracy of the FSSI algorithm can be given analytically [27].
where f , f , f (3) , and f (4) are the derivatives of f of order from 1 to 4, respectively. As said in reference [27], this error is in general several orders of magnitude smaller than that expected from general spline interpolation theory. Equation (6) can be applied to the error E j = max This is due to the fact that the FSSI algorithm can be used to invert the function in each grid interval, for which x min and x max can be replaced with x j and x j+1 , respectively. Once the numerical interpolation S(y) of the inverse function is obtained, this prediction for the error can be compared with the numerical error, which can be computed a posteriori by evaluating S(Y) − S( f (S(Y))) on the output array Y [27].
One consequence of Equation (7) is that the error of the FSSI algorithm, E j , scales as the fourth power of the grid interval h j , which can be thought of as the error of x previous to executing the FSSI spline interpolation. As mentioned above, and shown in Section 4, the spline interpolation provides an Equation (7) can also be used to optimize the FSSI algorithm by choosing a variable grid spacing h j = h(x j ). The maximum step size compatible with a given global error level E can be obtained by inverting Equation (7) as, This optimal choice of the grid spacing h j , entering point 1 in the setup procedure as outlined above, is applied to Kepler's equation in Section 3.

Methods
In this section, the detailed implementation of the FSSI algorithm is discussed, including improvements such as the multistep optimization and the use of the k-vector search. Simple pseudocode listings are provided with array indices following similar conventions to those used in C and Python languages. Most of these routines can be applied to a general monotonic function f (x) with minimal or no modifications. An exception is the multistep optimization, which is given specifically for Kepler's problem, although the procedure described in this case can be adapted to more general monotonic functions.
In the case of Kepler's problem, the function for every integer q. This also implies the periodicity equations for every integer q. Therefore, without any loss of generality, it is sufficient to invert f in the interval [0, π], with x min = y min = 0 and x max = y max = π, and then the solution for every x, y can be obtained using Equation (9). For convenience, the eccentricity is considered an input variable to the program, such that the function f and its derivative f ≡ dfdx are functions of (e, x). The eccentricity is denoted ec in the code to distinguish it from Euler's number. The pseudocode defining Kepler's functions is shown in Listing 1.

Grid Generation with Multistep Optimization for Kepler's Equation
In the case of Kepler's Equation (1), the function f and its derivatives are known analytically, and the expression (8) for the step size compatible with a given error level can be written as This expression has been obtained from Equation (6), which has been shown to provide a good approximation when h j is small enough [27]. Therefore, it has to be implemented with a few changes to enforce the validity of this approximation and to avoid divisions by small numbers that may be close to machine precision. A slightly more conservative-i.e., smaller-expression for h j than the maximum value given in Equation (10)  The values of ec = e, ec2 = e 2 , and hgfac = 4.4 E 1/4 shall be passed with the calling of the function, as shown below. The last if statement has been chosen empirically to keep h j small enough so that the approximations for the error are still valid for every e.
Function hstep_kepler, as shown in Listing 2, can be used to generate the array of grid intervals h j . For a small enough h j , the minimum value in Equation (10) is obtained in one of the endpoints, x j or x j+1 , except in a few intervals. The additional factors that have been introduced in the function hstep_kepler, as compared to Equation (10), have been found to be sufficient to ensure that the choice of the endpoint that minimizes hstep_kepler always gives an error below E , if the latter is set at a level ≥10 −13 rad, except for e equal or very close to 1 − and y very close to zero, in which case a limiting accuracy ∼2 × 10 −11 rad is found when using double precision arithmetic-see Section 4. Remarkably, this function works for every e ∈ [0, 1 − ], thereby including the highly nonlinear regime of eccentricity values close to 1 that is within the computational error in double precision.
The array of grid intervals h j can then be generated with the pseudocode of Listing 3. In a computer implementation of the algorithm, the array h can be initialized by dynamic memory allocation, or by assigning a size that is greater than the maximum value of n that may be required for e ∈ [0, 1 − ] and for error level down to 10 −15 rad. This value can be chosen to be 26,000, as shall be seen in Section 4. After running the routine hstep_kepler with the while loop shown above, the value of the required number of grid intervals n will be available, along with the values of h j ≡ h[j]. This value of n will be used to allocate the memory for the arrays of the breakpoints, of the coefficients, and the k-vector.

Computation of the Breakpoints and Coefficients
Pseudocode implementing the formulas given in Section 2 for the computation of the arrays of the breakpoints y j , called y[j] in the code, and of the coefficients c  function FSSI_coef(xmin, xmax, n, ec, h): As written, the routine FSSI_coef of Listing 4 is applied here to Kepler's function f(ec, x). However, it can also be used to obtain the breakpoints of more general monotonic functions, if an array of grid steps h and the values of the derivatives are provided.
The arrays y and c have to be initialized by allocating n + 1 and 4 × (n + 1) "double" values, respectively, with the value of n that has been obtained in stage 1 of the setup phase. Then the breakpoints and the coefficients for the spline interpolation of the inverse of Kepler's function can be obtained with the call, y, c = FSSI_coef(0., PI, n, ec, h) by providing the number of grid intervals n, the array of grid steps h, and the input eccentricity ec.

Computation of the k-Vector
When the interval search in the generation procedure is performed using the k-vector method, a pre-processing computation of the k-vector has to be implemented in the setup phase. This can be done within the routine FSSI_coef, using the same for loop to compute the contributions to the k-vector. The pseudocode for computing breakpoints, coefficients of the spline, and k-vector is called with the following routine: function FSSI_coef_kv(xmin, xmax, n, ec, h, mkv, qkv): This function is based upon the pseudocode FSSI_coef, but with the modifications in the for loop over igrid that are provided in Listing 5. In Listing 5, the dots . . . represent the same code as in routine FSSI_coef. As in the previous case, the arrays y and c must be initialized together with the function FSSI_coef, while the integer arrays kv and deltakv of size n + 1 are allocated and initialized to zero. Similar to FSSI_coef, this routine can be applied to any function and is not specific to Kepler's equation as described here.

Generation Procedure: Search of the Interval
The search of the index j a corresponding to the interval to which a given point Y a ∈ Y belongs, i.e., y j a ≤ Y a < y j a +1 , can be executed using the bisection method [32] with the routine described by the pseudocode shown in Listing 6. Here, left and right are two indices that are known to bracket the correct interval, and that have to be defined in the function call of the routine, as shown below. Alternatively, as discussed in Section 2, the search can be performed using the k-vector method followed by a bisection search, as shown in Listing 7. Indeed, for Y large and sorted, many successive points Y a and Y a+1 ∈ Y fall into the same interval between breakpoints, so that j a+1 = j a . As such, it is convenient to check this condition, thereby exiting further evaluations in the loop if true. The versions of the routines find_interval_BS and find_interval_kv with this additional conditional check are referred to as find_interval_BS_us and find_interval_kv_us, respectively, where the suffix _us denotes "use sorted".

Generation Procedure: Evaluation
Once the correct interval has been found, the value of the inverse function in each of the N points of the array Y (called Yp below) can be performed as shown in Listing 8. The only remaining changes with respect to the routine poly_evaluate_BS will be lines calling find_interval_kv, which will be modified as shown in Listing 9.
Listing 9. Modification of the pseudocode for evaluating the piecewise polynomial using the k-vector.
if yval < yvali: i = find_interval_kv(y, ny, kv, mkv, qkv, yval, 0, i + 1) else: i = find_interval_kv(y, ny, kv, mkv, qkv, yval, i, ny -1) Note, this generation part of the code is not specific to function inversion. It provides a general method for cubic spline interpolation when the coefficients and breakpoints are known. If these coefficients and breakpoints are those computed with the setup phase of the FSSI algorithm when applied to a function f , the result is the interpolation of the inverse f −1 .

Results
All the routines described in Sections 2 and 3 have been implemented from scratch in Cython [36] using double-precision arithmetic, and have been applied to the solution of Kepler's equation for different values of the eccentricity e.
Briefly, a Cython program resembles a Python program but contains specialized directives, data type syntax, and function commands (that act as wrappers, called decorators). This collection of extra syntax allows the CPython interpreter (together with other standard libraries) to generate a pure C program file that can be compiled with the GNU gcc compiler, thereby producing a shared-object binary file. In all our computer code described here (of Sections 2 and 3), we wrote highly customized Cython code with explicit type and function referencing that would guarantee optimized compilation results that would produce the lowest function call overhead possible.
Additionally, the translated C file from the Cython code is compiled using appropriate gcc optimization switches that could in principle take advantage of low-level vectorization. The result is a shared-object binary that can be imported into a normal Python script, but when executed, runs as a C binary. As mentioned, on appropriate hardware, further acceleration can be obtained by compiling the Cython code with specialized C libraries such as the Intel vector extension and the Intel Math Kernel library. The numerical results presented below were obtained on a standard laptop computer with modest hardware (a 64 bit Intel i7-8565U CPU 1.8 GHz, with 16 GB memory, and with a Linux Mint operating system with 4.15.0-118 kernel). Figure 1 shows the interpolation S(y) of the solution f −1 (y) of Kepler's equation for three representative values of e, namely, e = 0.5, 0.9, and 1 − , with = 2.22 × 10 −16 . The diagonal line represents x = y, corresponding to the solution for e = 0, describing a circular orbit. Higher deviations of the solution from this linear regime are observed for increasing values of e, corresponding to more severe nonlinearity, a well-known behaviour that is described in textbooks [1][2][3]. For this reason, the multistep method that we have described in Section 3 is most important in the regime e 0.9, especially for values close to the limit e → 1. Table 1 shows the values of the minimum (fifth column) and maximum (sixth column) step sizes, along with the total number n of steps (fourth column), obtained with the multistep routine for Kepler's equation that has been described in Section 3. These results are given for four different values of the eccentricity (first column) and five different values of the input error level (second column). For e ≤ 0.9, the actual error (third column), obtained by computing the maximum value of |S(y) − S( f (S(y)))| over a large array of y points, is always smaller or at most equal to the input error level because of the conservative assumptions that were made in the design of the routine. For e = 0.9 and input accuracy 10 −15 rad they become equal. For values e > 0.9, and even close to the limit e → 1 within the numerical error, this multistep routine is able to reach an overall error as small as ∼2 × 10 −11 rad, which can be reduced to O(10 −14 − 10 −13 ) rad except for values of e very close to 1 − and for y 10 −9 rad.  Obtaining higher accuracy, if needed, would require further adjustments or data types higher than double precision. For e ≤ 0.9, the values of h min and h max do not vary by more than a factor 10. In this case, the higher cost in setup time of the multistep routine is not compensated by the reduction of the number of steps, and a single step routine with step size equal to h min may be preferred, as shown below. However, the multistep routine is very important for e > 0.9, and especially close to the limit e → 1, where the ratio h max /h min can be very large. For e = 1 − and input error level 10 −13 rad, for example, a single step routine would require n = π/h min ∼ 2.2 × 10 9 steps, while the multitep routine can attain the same precision with just n = 7874 steps. From a practical point of view, we can conclude that the multistep routine is only needed for e > 0.9, and that it is more and more necessary the closer is e to 1. Figure 2 shows the results for the CPU execution time, t CPU , of the Cython code corresponding to different variants of the FSSI algorithm, when applied to computing the solution f −1 (y) of Kepler's equation for eccentricity e = 0.9 for an increasing number N of points of the co-domain. The input accuracy level is fixed to the value 10 −15 rad. The prefix ms-indicates the routines using the multistep method described in Section 3, as opposed to those indicated with the prefix ss-that use a single step h = (x max − x min )/n ss , with n ss =13,500 (for e = 0.9) to attain the same error level.
The suffixes _bs and _kv label the search routines that are used to find the interval for building the spline in each case, based on the bisection method (_bs), or on the use of k-vector followed by bisection (_kv), respectively. These routines do not exploit the fact the Y array is sorted; it usually is since it is the time variable. However, they can be modified to account for sorted arrays with the addition of the conditional test, if x[left + 1] > xval: return left, as mentioned in Section 3. The resulting _bs_us and _kv_us routines turn out to have the same performance, within the uncertainty; thus, they are indicated with the suffix _us and without the indices _bs and _kv in Figure 2.
For N 10 4 , the CPU execution time is essentially dominated by the setup time. In this regime, for e = 0.9, the single-step routines are faster than the multistep implementations by a factor ∼1.6. In this N regime, and for these values of the eccentricity and error level, the _bs and _kv routines have approximately the same speed.
In the large N regime, the CPU execution time is dominated by the generation time. In this case, the single-step and multistep versions of any of the FSSI routines have the same speed, within a computational error. In this regime, the determining factor of execution time is the search routine. Without the _us enhancement, the FSSI_kv routines using k-vector search followed by bisection are faster than those using bisection, FSSI_bs, by a factor ∼1.6. For sorted Y arrays, however, the _us versions are significantly faster than the "bare" routines _bs and _kv.
For reference, the CPU execution times for a Cython implementation of the Newton-Raphson method-called Newton-are also given in Figure 2. (A similar result, with almost the same values for the execution time, can be obtained with an implementation of Brent's method [10], which is then not indicated in the figure). Moreover, Newton_us indicates a routine that uses the result of the previous search as a starting guess for each y, resulting in a significant reduction-by a factor ∼3.3 for N = 10 8 -of the execution time for large N and sorted Y array, similar to the improvement that has been obtained with the _us variant for the FSSI algorithm. (Such an improvement is not obtained for the Brent method). The routines ms-FSSI_bs_us, ss-FSSI_bs_us, ms-FSSI_kv_us, and FSSI_kv_us are faster by a factor ∼28, for N = 10 8 , than the routine Newton_us. Without the _us enhancement, the routines ss-FSSI_kv and ms-FSSI_kv are ∼37 times faster than Newton.
Similar results for the execution time can be obtained for any value of e. In this case, the main difference is the comparison between the single-step and the multistep methods, which becomes more and more favorable to the multistep routine for e → 1, as discussed previously.
Deeper insight into the performance and the convergence properties of the FSSI algorithm and its variants can be obtained by analyzing how its setup and generation times depend on the accuracy. Figure 3 shows the dependence of the "setup time," t CPU setup , on the input error level E . This is defined as the part of the CPU execution time that is dedicated to the computation of the spline coefficients and breakpoints and includes the time spent in the multistep routine. In theory, t setup can be expected to be proportional to n, which is proportional to E −1/4 . As shown in the figure, this dependence is satisfied, to a good approximation, for input accuracy below ∼10 −10 rad. The deviation from this dependence above this input error level is due to the introduction of the minimum step appearing in the actual multi-step routine, which also implies the-conservative-discrepancy between E and the actual error. For this value of e, the multistep routine still requires a larger setup time than the single-step routine. As said above and shown in Table 1, this is not the case for higher values of the eccentricity, where the multistep routine becomes more convenient. Finally, Figure 3 shows the small additional CPU-time cost of the setup of the k-vector, as compared with the setup routine that is used together with the bisection method. This additional cost in t CPU setup may be justified when the number of evaluation points N is such that the generation time is higher than the setup time. Finally, Figure 4 shows the dependence of the "generation time" per point of the output array Y, t CPU generation /N, on the input error level E . More precisely, the value given corresponds to t CPU /N for N = 10 7 points. Even with the modest hardware, we used in our numerical computations, t CPU generation /N is just a few nanoseconds per point. Remarkably, for the routines using the k-vector method or the _us enhancement for sorted arrays, t CPU generation /N turns out to be practically independent of the error level. This is because in these cases the error level is fixed by the setup phase, while the generation phase only requires a few arithmetic operations between numbers that have already been computed in the setup phase, together with a search that, for the _kv and _us methods, only involves a small number of points. In contrast, for the _bs routine, t CPU generation /N behaves similarly to the bisection method, whose execution time is proportional to log(n) [32], which is proportional to − 1 4 log E . This behavior can be observed in Figure 4 for E 10 −10 , the deviations for higher error level being due to the imposition of a maximum value of h in the multistep routine discussed in Section 3.

Discussion
When applied to just a single grid interval, n = 1, the FSSI algorithm provides a simple analytical approximation of the inverse function: whose coefficients can be computed through Equation (3); note that in this case and δy j = π; thus, This approximation can be shown to have an accuracy ∼0.06 rad for e = 0.5, but becomes completely unreliable for e 0.9. Recently, excellent approximate analytical solutions of Kepler's equation using a cubic polynomial have been proposed in the literature [37]. The FSSI algorithm, when applied to Kepler's equation, can be thought of as a generalization of such solutions, using piecewise cubic approximations in suitable sub-intervals. The FSSI can be convenient for computing the inverse of a monotonic function over an entire interval or a large set of points. If the inversion has to be repeated many times in different sessions, the FSSI method is efficient once the coefficients, the breakpoints, and possibly the k-vector are computed and stored. Then, all subsequent evaluations for obtaining values of the inverse function at each point are very fast, since they would correspond to generation times in the range of a few nanoseconds per point, even with the modest hardware. However, if the values of the inverse function are only required in a reduced number of points, point-by-point methods may be preferred since they can have lower setup times.
A common strategy for accelerating expensive computations of a nonlinear function is to use pre-computed lookup tables (LUT) at evenly distributed intervals and then use interpolation [38,39]. Similarly, the FSSI may constitute a useful tool for creating libraries for inverse functions. Similar libraries could also be created by using other methods, such as Newton-Raphson, by obtaining the inverse in a given grid of points and then extrapolating. However, the FSSI has unique advantages for this purpose, since the values of the inverse function in the n grid points are obtained nearly for free, consisting of just an interchange of x j and y j with no generated error, other than the intrinsic computational error . Moreover, the FSSI algorithm, with its choice of the breakpoints and coefficients, has been shown to provide a huge enhancement in precision, as compared to the prediction of error with general spline interpolation [27].
Although the FSSI algorithm, as described in reference [27] and here, has been designed to invert monotonic functions, such as Kepler's equation, it may be interesting to explore its possible use in more general cases. In this sense, it is interesting to note that the k-vector method has been used for the multi-valued inversion of non-monotonic functions [20] when applied in conjunction with the Newton-Raphson algorithm. It would then be natural to explore the possibility of replacing the latter with the FSSI and applying the methods of reference [20] to the combination of the FSSI and the k-vector search for inverting non-monotonic functions.

Conclusions
In contrast to point-by-point methods that solve the equation y = f (x) for each value of y, the FSSI algorithm [27] generates a piecewise polynomial interpolation of the inverse f −1 (y) of a monotonic function f (x) over an entire interval at once. Here, two significant improvements to this algorithm have been described. First, we introduced the use of a k-vector search for bracketing, followed in the last step by a binary search, for finding the interval of the FSSI spline to which an evaluation point y belongs. Second, in the particular case of Kepler's equation for orbital motion, we designed a multistep mechanism that significantly reduces the number of breakpoints needed for the interpolation of the inverse function.
For eccentricity e 0.9, the accuracy of the FSSI can be set to 10 −15 rad. In this case, a single step routine can be preferred to a multistep method since its setup time can be significantly smaller. However, for e > 0.9, the multistep mechanism becomes ever more efficient. It can solve Kepler's equation with an error of O(10 −14 − 10 −13 ) rad for most values of the eccentricity e > 0.9 and mean anomaly y. Only for e close to the limit 1 − , where is the intrinsic computational error, is there a small region, y 10 −9 rad, for which the accuracy is found to be limited to ∼2 ×10 −11 rad.
The use of a k-vector search for bracketing followed by bisection for finding the correct interval significantly reduces the generation time, especially when very high accuracy is desired. When the array of y points (i.e., that which contains the inverse function to be evaluated) is large and sorted, the generation time can be reduced considerably more. As this is an important case, we have separate search algorithm routines with this variant denoted _us.
Even with modest computer hardware, the Cython implementations of these versions of the FSSI algorithm efficiently solve Kepler's equation for a very large number of N of y values with a generation time of the order of a few nanoseconds per point. Remarkably, the generation times of the implementations of the FSSI algorithm using k-vector bracketing, or using the _us variants of the search routines, are practically independent of the error level. Moreover, in the large N regime, these generation times are smaller by more than an order of magnitude than those of point-to-point inversion methods like Newton-Raphson's or Brent's.
These results may be of interest, e.g., for the orbital modeling of exoplanets, which requires solving Kepler's equation for a large number of points [28,29].