Data Interpolation by Optimal Splines with Free Knots Using Linear Programming

Studies have shown that in many practical applications, data interpolation by splines leads to better approximation and higher computational efficiency as compared to data interpolation by a single polynomial. Data interpolation by splines can be significantly improved if knots are allowed to be free rather than at a priori fixed locations such as data points. In practical applications, the smallest possible curvature is often desired. Therefore, optimal splines are determined by minimizing a derivative of continuously differentiable functions comprising the spline of the required order. The problem of obtaining an optimal spline is tantamount to minimizing derivatives of a nonlinear differentiable function over a Banach space on a compact set. While the problem of data interpolation by quadratic splines has been accomplished analytically, interpolation by splines of higher orders or in higher dimensions is challenging. In this paper, to overcome difficulties associated with the complexity of the interpolation problem, the interval over which data points are defined, is discretized and continuous derivatives are substituted by their discrete counterparts. It is shown that as the mesh of the discretization approaches zero, a resulting near-optimal spline approaches an optimal spline. Splines with the desired accuracy can be obtained by choosing an appropriate mesh of the discretization. By using cubic splines as an example, numerical results demonstrate that the linear programming (LP) formulation, resulting from the discretization of the interpolation problem, can be solved by linear solvers with high computational efficiency and resulting splines provide a good approximation to the optimal splines.

When an exact functional form in a model is unknown, it is often described by measurements represented by data points. Traditionally, data points were interpolated by polynomials more frequently than by spline functions. While polynomials have many desirable properties, polynomial interpolation suffers from the Runge's Phenomena [69] -a polynomial oscillation at the edges of an interval. With large data, the derivatives of a single polynomial at each of the data points tend to grow thereby resulting multiple local extrema. In particular, it has been shown that polynomials, extensively used in thermometry [70], are not capable of recovering local behavior with good quantifiable accuracy. Spline interpolation, on the other hand, tends to greatly reduce oscillation between data points and (cubic) splines have been shown to be able to reproduce local characteristics well [70]. In addition, the problem of finding a polynomial interpolating a large number of data points is equivalent to inverting a van der Monde matrix with a large condition number thereby rendering the interpolation problem intractable, inefficient and unstable in computer implementations. On the other hand, the great flexibility of splines, resulting from the use of several lower degree polynomials smoothly connected at optimally chosen points (usually referred to as "knots"), resolves the major problems mentioned above effectively. This is why splines are turning out to be so useful in a wide variety of applications.
The aim of this paper is two-fold: 1) to give a perspective on applications of spline functions in Mathematical Programming, Dynamic Programming, Statistics, Optimal Control Theory, and Computer Graphics in Section 2; and 2) to formulate a linear optimization program that finds near-optimal curvature of splines with free knots interpolating a given set of data points efficiently and with the desired level of accuracy in Section 3. Section 4 provides numerical results, presenting computational efficiency of the method developed. Section 5 gives brief concluding remarks.

Literature Review
Spline functions were introduced as early as in the nineteenth century by Lobachevsky [71, p.165]. In the mid 40's, the B-splines were investigated by Schoenberg [72]. Since then the subject has grown vigorously and many excellent texts are available on the subject: [37,[73][74][75][76][77][78]. A good introduction is given in [74]. The early computational algorithms have been developed and are available in [75]. A comprehensive theory can be found in [76], and a briefer treatment in [37]. Reference [38] finds [77] "one of the most useful such volumes." A general survey can be found in the overviews given in these texts. Moreover, the selection of topics largely depends on the application area of the reader's interest.
In a basic way, spline theory deals with a simple but important question of how one uses a lower degree piecewise polynomial function rather than a single larger degree polynomial to represent given data and attendant properties. Spline theory attempts to obtain appropriate problem formulations, existence and characterization of solutions, and properties and computation of such functions.
Representation of the function with lower degree polynomials, facilitation of efficient computation, as well as good approximation of the original function between the given data points being interpolated, continuous or essentially bounded derivatives (often second and third derivatives), are some of the important assumptions in spline applications. For example, second derivative is used as a measure of nonlinearity of models such as in convex separable programming [28,29], and nonlinear statistical estimation problems [79]. Boundedness of the third derivative in interpolation by cubic splines is natural since one can always have a function interpolating the given point with arbitrarily high curvature. Therefore, to estimate the most favorable possible error in function approximation, one would like to determine the smallest curvature possible.
One of the first books on the approximation by least squares cubic splines was published in 1968 by de Boor and Rice [80] and [81]. Studies have shown that the approximations by splines can be significantly improved if knots are allowed to be free rather than at a priori fixed locations (e.g., data points). When the knots of a spline are fixed a priori (for example, knots coincide with data points) we call it a spline with fixed knots. When the knots are not fixed but are determined optimally it is a spline with free or variable knots. Generally, as one would expect, the analysis for fixed knots tends to be simpler than for free knots [37] (p. 190). A spline is required to pass through the data points exactly, when data are fairly accurate, is called an interpolatory spline, otherwise an approximating spline. The methods given in this section focus on optimal interpolatory cubic splines with free knots. A comparative analysis showing advantages of free-knot splines over data-smoothing techniques has been given in [82]. Typically the degree of smoothness is controlled by both the number and the position of the knots. In [83] the knots are estimated by solving a non-linear least-squares optimization problem.
It has been shown in [39] that there always exists an optimal spline with f (r) ∞ = k * for each polynomial comprising the spline and (n − r − 1) knots, where n is the number of data points and r is the degree of the spline that interpolates them. It is noted that for a given value k * , an optimal spline with f (r) ∞ = k * is not necessarily unique (since some of the polynomials comprising the spline (but not all) that interpolate given data points may have f (r) ∞ that is less than the optimal value k * ).
Supported by substantial mathematical theory and effective computational methods, splines have a very wide spectrum of general applications. Applications of splines range from modeling insect natality and DNA coils in biology [84,85], pattern recognition of handwritten Japanese in linguistics [86], study of geomagnetism in physics [87], analysis of liquid alloys in chemistry [88], characterization of vibrations in acoustics [89], to solving spectroscopic orbits in astronomy [90].
Among many of these, cubic splines have also come in common use in multiple disciplines such as robotics and statistics. Cubic splines are used in the problem of coordination and planning of the robotic motion [91]. The trajectories for robot motion are frequently described by smooth cubic splines [92]. Cubic splines are used to characterize the dependency of the Markov model parameters on the conditional parameters [93]. Many filtering techniques require the use of splines. Splines are used to extract curves to model approximation to the shapes of segmented images that result from smoothing the lines of an object in the presence of its shape uncertainty [94]. In digital signal processing, splines are used to generate new sequences with a higher sampling rate (up-sampling) [95]. Numerical shape-preserving quadratic splines were developed in [96].

Spline Applications
Spline applications include areas of mathematical programming, dynamic programming, engineering, statistics, and optimal control theory that will be discussed below in detail.

Dynamic Programming Applications
Dynamic programming has had multiple important applications in the area of sequential decision making. The ability to handle nonlinearities and stochastic nature of functional relations describing the model, empower it with great versatility. However, equally well-known is the "curse of dimensionality." As the dimension of the state space and the number of decision stages increases, the computational burden grows enormously and becomes computationally prohibitive very quickly. In such a case, to use the dynamic programming methodology, approximation of the space by discretization at certain grid points is often necessary. Clearly, the greater the number of the grid points, the better the approximation and the larger the computational effort. We briefly describe here a stochastic, continuous state space water reservoir model in water/power resource management, where the use of cubic splines instead of a more common piecewise linearization leads to rather impressive computational performance.
Such models have been used for real world problems in Shasta/Trinity system in Northern California, Brazil's large hydropower system, and Egypt's High Aswan Dam [10,11]. For an important and well discussed 4-reservoir, 3-period water release problem [12][13][14] the computational time to obtain an optimal decision was reduced very significantly. For this problem, the number of grid points varying from 3 to 17 was used to discretize the values of functions representing starting volume of water, the inflow of water, released amount of water at time t and related variables. Rather than the usual piecewise linear interpolation, cubic splines with continuous second derivatives were used for this purpose.
Since (1) cubic polynomials allowed for better approximation, and (2) the continuity of the second derivative made possible the use of Quasi-Newton methods in the solution of the subproblems, very significant computational savings have been achieved. For a 1% error, the spline dynamic programming algorithm reduced the computational time by a factor of 255. For a 0.5% error the reduction factor was even greater: 330 [15]. For practical reservoir problems with more than 2-3 dimensions, while linear interpolation needs prohibitive amounts of computation time for reasonable accuracy, the use of cubic splines for 5-dimensional problems led to significant computational improvement and an overall advancement in the area [15]. The more traditional approaches for containing a sprawling dynamic program include the use of principal component analysis, sampling, duality, Benders decomposition, substitution of deterministic equivalents for stochastic components, and partition of the original problem into separable components [15][16][17][18][19][20].
There are many other problems that are modeled as continuous state space, including manufacturing process control, management of forests, fisheries, crops pest control, and portfolio selection under fluctuating interest rates [8,9,[21][22][23][24][25][26][27]. Though realistic parameters would be different in these applications, the use of splines would offer a promising strategy to estimate them.

Mathematical Programming Applications (Estimating the objective function and solution error for piecewise linear approximations)
Piecewise linear functions often appear in optimization largely because of their increased mathematical tractability, and "one of the most useful applications of piecewise linear representation is for approximating nonlinear functions" [33, p. 382]. Since approximations obviously affect the computed quantities of interest (e.g., the optimal objective value), how do we measure the quality of this approximation quantitatively beyond the common assertion that 'the greater the number of the linear pieces, the better the approximation', and how do we determine the errors in the values of interest? Optimal splines help find a bound on this error. Found by computing an optimal quadratic spline, the minimal required curvature value k * of f 2 of a smooth function taking specified values at the given points, will provide a bound on the error in such approximations. For example, iff is the piecewise linear function resulting from joining the adjacent points, {(x i , y i )} r+2 1 obtained, say, from an experiment [52, p. 171] or a function from its values at {x i }'s, then k * gives a bound on the function error max a≤x≤b f (x) −f (x) [34] that can be shown to be bounded by: Since we have determined k * ≤ f 2 for any admissible function f , one specific use of such a solution k * is in the computation of error bounds on the optimal objective value and optimal solution (vector) of convex separable programs. Similarly, considering a convex separable program Z with a nonempty feasible space and its piecewise linear approximation Z 1 , with a subdivision interval δ, the bounds on the maximum deviation between the optimum objective values Z * 1 and Z * can be calculated by using quantities where D j , E ij values are calculated by the solutions of the quadratic splines as k * discussed above. Similarly, the method to compute bounds on each component of the optimal solution vector X * of Z * using k * is given in [35]. These bounds, in turn, have also been used to develop more efficient algorithms for convex separable programming problems [28] and [29], and non-convex programming problems [36].
As discussed above, separable programming solutions are inexact since they correspond only to piecewise-linear approximations and not the original functions. Estimation of the errors in the optimal quantities is therefore important in practice. This is even more so because, for large problems without any special structure, the technique may often be the only nonlinear optimization technique available with large commercial linear programming systems; the systems with which practitioners solving large, real-world problems have been familiar for a long time. Estimation of k * , obtained by optimal quadratic splines, will also allow such systems to give estimates of the errors in the solutions produced.

Applications in Statistics (Density estimation and nonparametric regression models)
The great flexibility of splines to interpolate data with lower degree polynomials which minimize computational burden and large oscillations between data points resulting from interpolation by using higher degree polynomials is invaluable in statistical models. Since data representation and its analysis is one of the core issues in statistics, it is not surprising that there are applications of splines starting from the basic problems of smoothing a histogram. For pioneering work on this subject, an interested reader is referred to the age distribution of Hungarian mothers in 1963, ears of Iowa variety of corn [44][45][46], and various problems relating to the estimation of density functions and higher moments of distributions [39,47]. For a comprehensive survey of uses of splines in general statistics, an interested reader is referred to [38] and in approximate regression models [48].
The text [47] discusses splines in nonparametric regression and its "intriguing connection" with Bayesian estimation in a very general framework. The [49] provides a very readable yet elementary and concise discussion of fitting spline functions with fixed knots by standard regression methods, e.g., using SPSS package computations. While in parametric regression we need very specific quantitative information, generally based on theory or past experience, the flexibility of spline functions in nonparametric regression lets the "data speak for itself" [47]. In addition, such regression allows to examine the validity of an assumed parametric form of the regression function, and in turn, a spline function obtained may suggest an appropriate functional form for a parametric analysis. Regression problems in statistics under order restrictions are discussed in [121]. For other models that require splines refer to [37,50].

Applications to Optimal Control Problems (Optimal timing policies)
Many optimization problems requiring an optimal function rather than only an optimal point in R n for their solution are best handled by control theory [58][59][60][61][62] framework. An excellent introduction to splines in control theory intended to be read by 'ordinary mortals' (p. vi) is [50]. The problems analyzed in the applications area include several problems in inventory and production [63,64], construction project scheduling [59], timing of acceleration to achieve takeoff with minimum energy consumption [52], optimal extraction timing of natural resources such as timber or oil reserves [65], system or user optimization in dynamic network traffic assignment problems [66], and in the topic of dynamic scheduling and routing for flexible manufacturing systems [67]. In certain optimal control situations the optimal function must be a spline. For example, consider a time-optimal control problem solved in [68]. A particle moves along the y-axis according to y = F(t), and the velocities of different orders are given by F (ν) (t), ν = 0, 1, ..., n − 1, which are absolutely continuous. If we are given a bound on the n th derivative |F (n) (t)| ≤ A for fixed A, and the initial and final states of the particle are: then find the shortest time T 0 to complete this motion of the particle from 0 to the height of l on the y-axis and determine the optimal function F(t) which will achieve this minimal T 0 . The optimal solution F(t) is given by a perfect spline (that is, the n th derivative F (n) changes sign at the knots but has the same absolute value |F (n) |) with minimum The above applications give a perspective on applications of splines. The general definition of optimal splines of r th order and the corresponding problem formulation to determine optimal splines will be provided in the following section. Perhaps the most intuitively clear need for smoothness appears in free-form curves and surfaces, the fundamental tools in computer-aided design, modeling, and graphics. Splines generate free-form curves and surfaces and are used in the design of airplanes [100,101], automobiles [102,103], hulls of ships and tankers [104,105], and bottles [106].
Here splines seem to fall almost as a natural requirement. The continuity of specified derivatives in splines can provide the desirable amount of smoothness. In CAD/CAM and computer graphics [107][108][109][110], a recurring basic problem is to connect points specified by the user on the screen. When we need to join them "smoothly" by a curve, straight lines are not sufficient and splines are often used.
For example, smoothness would be required for the access ramps in highway systems, for conveyors in material flow systems, for minimizing resistance in aerodynamic or hydraulic systems, or even for visual esthetics in display or plotting systems. An extensive graphic system SAS/GRAPH [111] provides a (cubic) spline option for displaying plots of given or computed data. Smooth spirals have been used in highway design for many decades [112,113], where smoothness and controlling the curvature is of concern since a vehicle's turning ability is limited and sharp turns are undesirable. Cubic splines have been used for such applications [113]. Applications of splines in this area have grown very rapidly due to the great progress made in computer technology during past decades. Several survey articles in this area are [114][115][116]. Book-length expositions of splines on computer-aided geometric design include [101,110,117]. Briefer discussions of splines in computer graphics are in [118,119] and with more emphasis on computer-aided manufacturing in [120].

General Problem Formulation
For a given set of data points {x i , y i } n i=1 such that A=x 1 <. . . <x n =B, a spline function S(x) of degree r with the knots at x knot 1 , ..., x knot K is a real valued function defined on the interval [A,B] such that: (i) S(x) passes through all the data points, i.e., S(x i ) = y i , (ii) In each interval A, x knot 1 , x knot 1 , x knot 2 , ..., x knot K , B , where K is the number of knots, S(x) is given by a polynomial of degree r or less, (iii) S(x) and its derivatives of orders 1, 2, ..., r − 1 are continuous in the interval (A, B), (iv) r th derivative of S(x) can be discontinuous at a finite number of points.
Thus, a spline function of degree r is a piecewise polynomial function with (r − 1) continuous derivatives. We could also say that it is a member of C r−1 whose r th derivative is a step function. In other words, it is an r th order indefinite integral of a step function. Polynomial pieces of a spline are 'spliced' appropriately at the knots where r th derivative jumps from one value to another (often with opposite signs), enabling it to turn or change its shape; while all derivatives up to (r − 1) th order remain continuous, imparting smoothness to the spline. It should be noted that x-coordinates of given data points {x i , y i } n i=1 are generally not the knot points.
The problem of finding the optimal spline of r th degree is where {x i , y i } n i=1 are given data points. The formulation (1)-(2), though simple and elegant, is unfortunately not easy to solve. This formulation amounts to the optimization over a Banach space of continuous functions on a compact set [A,B], which is well-known to be notoriously difficult [51, p. 3]. In problem (1)-(2), the decision variable is a function f that minimizes f (r) ∞ . In order to mediate the computational burden, the entire space is discretized into segments over which the optimization can be done with much less computational effort while the solution obtained still satisfies an acceptable accuracy criterion.

Linear Optimization Formulation of the Problem (1)-(2)
This section presents a numerical approach for finding the optimal splines in maximum norm space (typically referred to as L ∞ space) since we are minimizing f (r) ∞ = max f (r) . The main idea of the approach is to formulate a linear optimization problem, the solution of which approximates f (r) ∞ , with certain tolerance, of the optimal spline interpolating given n data points {x i , y i } n i=1 . All n data points are assumed to be equidistant such that A=x 1 , B=x n and In order to discretize the problem, we choose to divide each interval [x i , x i+1 ], i = 1, ..., n − 1 in m subintervals of equal width. Thus we partition interval [A, B] into l evenly spaced subintervals, where l = (n − 1)m. Note that the number of points defining the l subintervals is l + 1, and the points are denoted by x j (n−1)m+1=l+1 j=1 , such that the relationx (i−1)m+1 = x i , i = 1, ..., n − 1 holds between original data points x i and data pointsx j created by subdivisions.
For example, should also be pointed out that the general formula forx j isx j = B (j−1) . ,x l+1 = B. Also, while H denotes the width of each interval [x i , x i+1 ] specified in the data, h denotes the width of each smaller interval x j ,x j+1 created after m subdivisions of each interval. In other words, H = mh.
After the interval [A, B] is discretized, the derivatives are replaced by their discrete counterparts. Derivatives of a function are often approximated by finite differences. A discrete analog of the r th derivative of a continuous function is the r th order difference divided by h r . A general formula of the r th order forward difference is based on r consecutive (sub)intervals and is represented as follows [122][123][124] curate approximations as compared to forward (or backward) differences: thus error in forward differences is of the order h while in central differences it is of the order h 2 . Essentially, this is due to considering the values of the function at h/2 distance away both to the left and to the right of the function in central differences rather than the entire h distance only on one side of the function in forward differences, resulting in a better approximation of the function derivative by central differences.
Briefly, the first order difference h divided by h can be expanded into Taylor series in the following way: . Similar arguments hold for higher order finite differences.
Even though the forward difference approximates the third derivative atx j with the error of the order of O(h), it approximates the third derivative at midpointx j+1 +x j+2 2 with the error of the order of O(h 2 ), as in the discussion of central differences above. Indeed, if we denoteẑ j =x j+1 +x j+2 2 , then which is the third-order central difference, and therefore, the overall accuracy of the approximation is of the order of O(h 2 ). A straightforward approach to find a near-optimal solution of (1)-(2) is to minimize maximum of the r th -order differences directly: Note that in this discretized version (and for the rest of the paper), points denoted by f x j are decision variables, wherex j , j = 1, . . . , (n − 1)m + 1 are endpoints of subintervals.
Since the objective function in the formulation (3) is clearly non-linear due to max and absolute value operators, in order to deal with this nonlinearity, the problem is converted into an equivalent linear counterpart (similar to [52, p. 28]): Since f x (m−1)i+1 = y i are given as data, the actual decision variables to determine are f x j with the exception of f x (m−1)i+1 , i = 1, ..., n, although for more convenient implementation, f x (m−1)i+1 can still be treated as decision variables. Also, due to linearity, (4) can be solved with much higher efficiency than (3). This is generally the goal of all linearization processes used to deal with non-linearity from a computational perspective. Theorem 3.1. The optimal value of the formulation (4) approaches k * , the optimal value of the formulation (1)-(2), as the number of subintervals approaches infinity (m → ∞).
Proof. As established above, the formulation (4) As the number of subintervals approaches infinity (m → ∞) the third-order difference approaches the third derivative, so the formulation (4) becomes Despite having the same objective function, the formulation (5) is theoretically not equivalent to the formulation of the original problem (1)- (2). Any countable (even infinitely countable) set of decision variables f x j is a set of measure zero (a set of points that can be enclosed by intervals with arbitrarily small width), whereas the space over which formulation (1)-(2) is optimized is continuous, namely, the entire interval [A, B]. In other words, formulations (5) and (1)-(2), though they have the same objective function, are optimized over different decision variable spaces, namely, discrete and continuous respectively.
Notwithstanding the differences between formulations (5) and (1)-(2) noted above, their respective optimal values differ by an infinitesimally small value as the number of subintervals approaches infinity. Indeed, for an arbitrarily small ε > 0, we can always find a large enough m, number of subintervals (of each interval), such that x j −x j+1 < ε. Since the third derivative of f exists almost everywhere for a cubic spline by definition, we can define central differences at the midpoints of each subinterval x j ,x j+1 as x j−1 h r and therefore, the optimal solution to (5) differs from the optimal solutions by a quantity of the order of O h 2 . In other words, for any two arbitrarily close decision variables f x j and f x j+1 , the respective third derivative of f evaluated at any point between them differs from the third derivative evaluated at either of these points by a quantity that is arbitrary small except possibly at the points of discontinuity of f (r) , which constitute a set of measure zero since f (r) exists almost everywhere. Therefore, respective optimal values of formulations (5) differ by a quantity of the order of O h 2 .

Numerical Examples
In this case the formulation (4) takes the following form f (x 7 ) = y 4 = 6.6; f (x 9 ) = y 5 = 2.6; and f (x 9 ) are known to be y 1 , y 2 , y 3 , y 4 and y 5 respectively, so the actual decision variables are only f (x 2 ), f (x 4 ), f (x 6 ) and f (x 8 ). The y 9 =2.4596 y 9 =0.4121 y 9 =5.1947 x 10 =10 y 10 =2.7183 y 10 =-0.5440 y 10 =4.7732 x 11 =11 y 11 =3.0042 y 11 =-1 y 11 =4.196 x 12 =12 y 12 =3.3201 y 12 =-0.537 y 12 =3.5274  Note that in most practical cases the exact functional form which the cubic spline is used to approximate is not necessarily polynomial and is unknown, so the optimal k * of optimal splines interpolating given points may differ from the exact k of given test functions. When the test function is polynomial, the optimal k * is very close to the exact k (see Example 4.5).

Scalability Results
To test scalability of our approach, formulation (2) is used to test problems with varying number of subintervals. The computation times are summarized in Table 3.   (5, 124.75) chosen in such a way that the optimal splines has the known value of third derivative equal to 6. As one can see in Table 3, the algorithm scales well since increase in the computation time is almost linear for the number of subintervals.

Conclusions
Classical data interpolation by a general spline with free knots is formulated as linear programming problem to minimize spline curvature and the problem is efficiently solved by using linear programming solvers. Theoretical convergence to the true optimal splines is proved indicating that the interpolatory data points obtained by solving the LP problem are arbitrarily close to the points on the true optimal splines. Testing results based on cubic interpolatory splines provide good approximations to optimal splines within a fast computation time. Future research direction will be to obtain exact functional representation of the optimal splines as well as the exact positions of the knots.