Data Interpolation by Near-Optimal Splines with Free Knots Using Linear Programming

: The problem of obtaining an optimal spline with free knots is tantamount to minimizing derivatives of a nonlinear differentiable function over a Banach space on a compact set. While the problem of data interpolation by quadratic splines has been accomplished, interpolation by splines of higher orders is far more challenging. In this paper, to overcome difﬁculties associated with the complexity of the interpolation problem, the interval over which data points are deﬁned is discretized and continuous derivatives are replaced by their discrete counterparts. The l ∞ -norm used for maximum r th order curvature (a derivative of order r ) is then linearized, and the problem to obtain a near-optimal spline becomes a linear programming (LP) problem, which is solved in polynomial time by using LP methods, e.g., by using the Simplex method implemented in modern software such as CPLEX. It is shown that, as the mesh of the discretization approaches zero, a resulting near-optimal spline approaches an optimal spline. Splines with the desired accuracy can be obtained by choosing an appropriately ﬁne mesh of the discretization. By using cubic splines as an example, numerical results demonstrate that the linear programming (LP) formulation, resulting from the discretization of the interpolation problem, can be solved by linear solvers with high computational efﬁciency and the resulting spline provides a good approximation to the sought-for optimal spline.


Introduction
When an exact functional form in a model is unknown, it is often described by measurements represented by available data points and some expected assumptions about the function. Traditionally, data points were interpolated by polynomials more frequently than by spline functions. In a basic way, spline theory deals with a simple but important question of how one uses a lower degree piecewise polynomial function rather than a single larger degree polynomial to represent given data and attendant properties. Spline theory attempts to obtain appropriate problem formulations to obtain the spline, its existence and characterization of solutions, and properties and computation of such functions. While polynomials have many desirable properties, polynomial interpolation suffers from Runge's Phenomena [1]-a polynomial oscillation at the edges of an interval. With large data, the derivatives of a single polynomial at each of the data points tend to grow, thereby resulting in multiple local extrema. In particular, it has been shown that polynomials, extensively used, for example, in thermometry [2], are not capable of recovering local behavior with good quantifiable accuracy. Spline interpolation, on the other hand, tends to greatly reduce oscillation between data points due to lower degree of polynomials. Cubic splines, with a multitude of applications, in particular, have been shown to be able to reproduce local characteristics well [2]. In addition, computationally, the problem of finding a polynomial interpolating a large number of data points is equivalent to inverting a Vandermonde matrix with a large condition number, thereby rendering the
Studies have shown that the approximations by splines can be significantly improved if knots are allowed to be free rather than at a priori fixed locations (e.g., data points) [46]. When the knots of a spline are fixed a priori (for example, knots coincide with data points), it is called a spline with fixed knots [47,48]. When the knots are not fixed but are determined optimally, it is called a spline with free or variable knots. Generally, as one would expect, the analysis for fixed knots tends to be simpler than for free knots [25] (p. 190). A spline that is required to pass through the data points exactly when data are fairly accurate is called an interpolatory spline; otherwise, it is an approximating spline. The methods given in this section focus on optimal interpolatory splines, with free knots with cubic splines being a special case. A comparative analysis showing advantages of free-knot splines over data-smoothing techniques has been given in [49]. Typically, the degree of smoothness is controlled by both the number and the position of the knots.
It has been shown in [26] that there always exists an optimal spline with f (r) ∞ = k * for each polynomial comprising the spline and (n − r − 1) knots, where n is the number of data points and r is the degree of the spline that interpolates them. It is noted that, for a given value k * , an optimal spline with f (r) ∞ = k * is not necessarily unique (since some of the polynomials comprising the spline (but not all) that interpolate given data points may have f (r) ∞ that is less than the optimal value k * ). The apparatus of the proof of the above result is complex, but a simple proof for the quadratic case r = 2 is given in [50].

General Problem Formulation
For a given set of data points {x i , y i } n i=1 such that A = x 1 < . . . < x n = B, a spline function S(x) of degree r with the knots at x knot 1 , . . . , x knot K is a real valued function defined on the interval [A,B] such that (i) S(x) passes through all the data points, i.e., S(x i ) = y i ; (ii) In each interval A, x knot 1 , x knot 1 , x knot 2 , . . . , x knot K , B , where K is the number of knots, S(x) is given by a polynomial of degree r or less; (iii) S(x) and its derivatives of orders 1, 2, . . . , r−1 are continuous in the interval (A,B) and (iv) The rth derivative of S(x) can be discontinuous at a finite number of points.
Thus, a spline function of degree r is a piecewise polynomial function with (r − 1) continuous derivatives. We could also say that it is a member of C r−1 for which the rth derivative is a step function. In other words, it is an rth order indefinite integral of a step function. Polynomial pieces of a spline are "spliced" appropriately at the knots where the rth derivative jumps from one value to another (often with opposite signs), enabling it to turn or change its shape, while all derivatives up to (r − 1) th order remain continuous, imparting smoothness to the spline. It should be noted that x-coordinates of given data points {x i , y i } n i=1 are generally not the knot points. The problem of finding the optimal spline of rth degree is where {x i , y i } n i=1 are given data points. The formulation (1) and (2), though simple and elegant, is unfortunately not easy to solve. This formulation amounts to the optimization over a Banach space [51] of continuous functions on a compact set [A, B], which is well-known to be notoriously difficult ( [31], p. 3). Within (1) and (2), the decision variable is a function f that minimizes f (r) ∞ . While the problem of data interpolation by quadratic splines has been accomplished [4], interpolation by splines of higher orders is far more challenging. Our key contributions to resolve this difficulty are to consider obtaining a near-optimal solution in a computationally efficient manner, and to prove that the near-optimal splines approach optimal splines as discretization mesh is refined. This will be achieved by discretizing the entire space into segments over which the optimization is performed by replacing continuous function with their discrete counterparts and by converting the resulting problem into a linear programming problem that can be solved in polynomial time by using Simplex method [52][53][54], as will be explained next. (1) and (2) This section presents a numerical approach for finding near-optimal splines in maximum norm space (typically referred to as L ∞ space) since we minimize f (r) ∞ = max f (r) . The main idea of the approach is to formulate a linear optimization problem, the solution of which approximates f (r) ∞ , with certain tolerance, of the optimal spline interpolating given n data points {x i , y i } n i=1 . The idea is presented by using n equidistant data points (note that in the following development of the formulation, there are no requirements of the point equidistance and the data points are not restricted to be equidistant) such that A = x 1 , B = x n and

Linear Optimization Formulation for Near-Optimal Solution to
Note that the number of points defining the l subintervals is l + 1 and that the points are denoted by x j , such that the relationx (i−1)m+1 = x i , i = 1, . . . , n − 1 holds between original data points x i and data pointsx j created by subdivisions.
For example, ] specified in the data, h denotes the width of each smaller interval x j ,x j+1 created after m subdivisions of each interval. In other words, H = mh.
After the interval [A, B] is discretized, the derivatives are replaced by their discrete counterparts. Derivatives of a function are often approximated by finite differences. A discrete analog of the rth derivative of a continuous function is the rth order difference divided by h r . A general formula of the rth order forward difference is based on r consecutive (sub)intervals and is represented as follows [55][56][57] curate approximations as compared to forward (or backward) differences: thus, errors in forward differences are of the order h, while in central differences, they are of the order h 2 . Essentially, this is due to considering the values of the function at h/2 distance away both to the left and to the right of the function in central differences rather than the entire h distance only on one side of the function in forward differences, resulting in a better approximation of the function derivative by central differences.
h divided by h can be expanded into Taylor series in the following way: Additionally, . Similar arguments hold for higher order finite differences.
Even though the forward difference approximates the rth derivative atx j with the error of the order of O(h), it approximates the rth derivative at midpointx j+1 +x j+2 2 with the error of the order of O(h 2 ), as in the discussion of central differences above. Indeed, if we denoteẑ j =x j+1 +x j+2 2 , then which is the rth-order central difference, and therefore, the overall accuracy of the approximation is of the order of O(h 2 ). Our approach to find a near-optimal solution of (1) and (2) is to minimize the maximum of the rth-order differences directly: Note that in this discretized version (and for the rest of the paper), points denoted by f x j are decision variables, wherex j , j = 1, . . . , (n − 1)m + 1 are endpoints of subintervals.
Since the objective function in formulation (3) is clearly nonlinear due to max and absolute value operators, in order to deal with this nonlinearity, the problem is converted into an equivalent linear counterpart (similar to ([58], p. 28)): Since f x (m−1)i+1 = y i are given as data, the actual decision variables to determine are f x j , with the exception of f x (m−1)i+1 , i = 1, . . . , n, although for more convenient implementation, f x (m−1)i+1 can still be treated as decision variables. Additionally, due to linearity, (4) can be solved with much higher efficiency than (3); regardless of the problem instance, formulation (4) remains the LP problem and the LP optimization methods converge within the polynomial time, as discussed above. The reduction of complexity is generally the goal of all linearization processes used to deal with nonlinearity from a computational perspective. Theorem 1. The optimal value of formulation (4) approaches k * , which is the optimal value of the formulation (1) and (2), as the number of subintervals approaches infinity (m → ∞).
Proof. As established above, formulation (4) As the number of subintervals approaches infinity (m → ∞), the third-order difference approaches the third derivative, so formulation (4) becomes Despite having the same objective function, formulation (5) is theoretically not equivalent to the formulation of the original problem (1) and (2). Any countable (even infinitely countable) set of decision variables f x j is a set of measure zero (a set of points that can be enclosed by intervals with arbitrarily small width), whereas the space over which problem (1) and (2) is optimized is continuous, namely, the entire interval [A, B]. In other words, formulations (1) and (2), and (5), though they have the same objective function, are optimized over different decision variable spaces, namely, discrete and continuous, respectively.
Notwithstanding the differences between formulations (1) and (2), and (5) noted above, their respective optimal values differ by an infinitesimally small value as the number of subintervals approaches infinity. Indeed, for an arbitrarily small ε > 0, we can always find a large enough m, the number of subintervals (of each interval), such that x j −x j+1 < ε. Since the third derivative of f exists almost everywhere for a cubic spline by definition, we can define central differences at the midpoints of each subinterval Therefore, the optimal solution to (5) differs from the optimal solutions by a quantity of the order of O h 2 . In other words, for any two arbitrarily close decision variables f x j and f x j+1 , the respective third derivative of f evaluated at any point between them differs from the third derivative evaluated at either of these points by a quantity that is arbitrary small except possibly at the points of discontinuity of f (r) , which constitute a set of measure zero since f (r) exists almost everywhere. Therefore, respective optimal values of formulations (5) min and (1) and (2) min differ by a quantity of the order of O h 2 .

Numerical Testing
The algorithms described above were implemented in CPLEX on an Intel Core i7 CPU Q 720 at 1.60 GHz and 4.00 GB RAM (see the CPLEX code in Appendix A).

Numerical Examples
In this section, five examples are presented. Example 1 provides a step by step walkthrough of the algorithm. Examples 2-4 illustrate the performance of the algorithm in terms of the CPU time for different test functions. Scalability as well as accuracy results are presented in Example 5. In addition, Example 5 gives computational evidence that the optimal value of formulation (4) approaches the optimal value of (1) and (2) as O h 2 . Example 1. Walk-Through Example. Consider a problem with five data points: (x 1 = 1, y 1 = 3), (x 2 = 2, y 2 = 5), (x 3 = 3, y 3 = 4.5), (x 4 = 4, y 4 = 6.6), and (x 5 = 5, y 5 = 2.6) and let r = 3. We want to find a near-optimal cubic spline while illustrating all steps of the formulation very clearly and providing all basic details. Note that some of thex j , j = 1, . . . , 9 are problem data values x i , i = 1, . . . , 5:x 1 = x 1 , x 3 = x 2 ,x 5 = x 3 ,x 7 = x 4 , andx 9 = x 5 . Formulation (4) takes the following form: and f (x 9 ) are known to be y 1 , y 2 , y 3 , y 4 and y 5 , respectively, so the actual decision variables are only f ( The solution is f (x 2 ) = 4.9625, f (x 4 ) = 4.4125, f (x 6 ) = 5.6625, f (x 8 ) = 6.0125, and k = 10.4. Figures 1 and 2 present a near-optimal cubic spline, and the cubic polynomials comprising the spline. The splines are represented by discrete points, interpolating the above data points by using m = 10.  After being discretized, functions f within Examples 2-4 can be represented as 20 , which are shown in Table 1. Note that in most practical cases, the exact functional form in which the cubic spline is used to approximate is not necessarily polynomial and is unknown, so the optimal k * of optimal splines interpolating given points may differ from the exact k of given test functions (see Table 2). When the test function is polynomial, the optimal k * is very close to the exact k (see Example 5).

Scalability Results
To test scalability of our approach, formulation (1) and (2) is used to test problems with varying number of subintervals. The computation times are summarized in Table 3.
As one can see in Table 3, the algorithm scales well since increase in the computation time is almost linear for the number of subintervals. Moreover, as the number of subintervals doubles, the accuracy increases (the gap, the relative difference betweenk and k * , decreases) by roughly four times.

Comparison with Other Methods
Our method is compared with the methods implemented in MATLAB R2021a by using data for Example 1. With a mesh of discretization of 0.01, the default spline interpolation [59] computes splines within 0.0156 seconds and obtains the l ∞ -norm (max norm) of the third derivative of 12.15, while our method computed splines within 0.071 seconds and obtains the third derivative of 10.767, which is 12.84% lower than the value obtained by MATLAB. The spline interpolation with "optimal" knots [60] obtains the knot location at x = 3, which is one of the data points, and the l ∞ -norm of the third derivative is the same as that obtained within [59]. The difference is explained by the fact that results in [60] are motivated by works as in [47,48]; therefore, optimality is with respect to knots selected from an a priori given points, while within our method, the position of knots is not restricted as being fixed, thereby leading to a lower objective function.

Conclusions
Used in very diverse areas of applications, classical data interpolation by a general spline with free knots is formulated as a linear programming problem to minimize spline l ∞ -norm (max norm) of the derivative of order r, for reduced complexity, and the problem is efficiently solved by using linear programming solvers. Theoretical convergence to the true optimal splines is proven, indicating that the interpolatory data points obtained by solving the LP problem are arbitrarily close to the points on the true optimal splines. Testing results based on cubic interpolatory splines provide good approximations to optimal splines within a fast computation time. The main benefit of the new method is the significant reduction in complexity to obtain near-optimal splines with near-optimal curvature. Another advantage is that the method has a potential to be generalizable for 2D dimensional (and higher) cases, whereby "interpolatory surfaces" can be obtained by using 2D finite differences. Moreover, as indicated in the Introduction section, since the splines are expected to recover local behavior with good quantifiable accuracy, the proposed method is suitable for finding local as well as global minima of the implied underlying function quickly. The complexity involved in finding a global optimum of the resulting spline based on "merge sort" is "subquadratic" O(n · log(n)). A limitation of this study is that the near-optimal position of knots cannot be efficiently obtained in online implementations since the method only outputs points belonging to the near-optimal splines, although near-optimal positions of knots can be estimated offline based on the points where the third derivative (for cubic splines) changes its sign. Future research will be to obtain an exact functional representation of the optimal splines as well as the exact positions of the knots.