Optimal Piecewise Polynomial Approximation for Minimum Computing Cost by Using Constrained Least Squares

In this paper, the optimal approximation algorithm is proposed to simplify non-linear functions and/or discrete data as piecewise polynomials by using the constrained least squares. In time-sensitive applications or in embedded systems with limited resources, the runtime of the approximate function is as crucial as its accuracy. The proposed algorithm searches for the optimal piecewise polynomial (OPP) with the minimum computational cost while ensuring that the error is below a specified threshold. This was accomplished by using smooth piecewise polynomials with optimal order and numbers of intervals. The computational cost only depended on polynomial complexity, i.e., the order and the number of intervals at runtime function call. In previous studies, the user had to decide one or all of the orders and the number of intervals. In contrast, the OPP approximation algorithm determines both of them. For the optimal approximation, computational costs for all the possible combinations of piecewise polynomials were calculated and tabulated in ascending order for the specific target CPU off-line. Each combination was optimized through constrained least squares and the random selection method for the given sample points. Afterward, whether the approximation error was below the predetermined value was examined. When the error was permissible, the combination was selected as the optimal approximation, or the next combination was examined. To verify the performance, several representative functions were examined and analyzed.


Introduction 1.Importance of Approximation and Previous Studies
In many applications in the scientific and engineering fields, approximation for a complex function or data set is important, such as compression of ECG signals [1]; various voice processing applications such as speech recognition, synthesis, and conversion [2,3]; and correction of sensor data [4,5].Numerous methods have been studied for approximation, with the least squares method being the preferred choice.The least squares method offers high accuracy by minimizing the residuals and is also simple to implement in a computer program.Formulating the optimal coefficients as least squares problems dates back to Gendre (1805) and Gauss (1809).The first application was performed by Gergonne in 1815 [6].Therefore, in this paper, the least squares method is employed for approximation.
The approximation error at the sample points generally tends to decrease as high-order polynomials are used, but this causes overfitting, which leads to significant oscillation between sample points.To resolve the overfitting, it is recommended to use lower-order polynomials by splitting the sections.In the case of using piecewise polynomials, it takes less computation time than using a single high-order polynomial with the same approximation error.Since the maximum number of intervals and orders depends on the number of samples and the complexity, several studies have been proposed to find the appropriate order and number of intervals.In [7-10], piecewise polynomial fitting was proposed.Additionally, ref. [7] proposed least squares piece fitting using a cubic Sensors 2024, 24, 3991 2 of 12 polynomial, ref. [9] used a method of adjusting the boundary of the segments to increase the operation speed, and ref. [10] used piecewise polynomial modeling to lower the error rate.However, in [8][9][10], there was a limitation that the order of the polynomial must be determined by the user.In [4], the whole domain was evenly divided into several intervals, and each interval was approximated by a cubic polynomial using the least squares method with the constraint that it had continuity in all boundaries.The studies cited in [11][12][13][14][15] proposed an approximation method using piecewise linear approximation (PLA), while ref.[16] proposed a method consisting of using several linear functions and then moving the endpoints of the interval appropriately to reduce the approximation error of the interval.
There are two considerations when approximating with piece-wise polynomials: the order of polynomials and the number of intervals.Higher-order polynomials require higher computational costs.Further, this may cause overfitting.Therefore, it is necessary to determine the appropriate order of the polynomials.When using approximated function, i.e., piece-wise function, it requires additional steps to determine the subdomain corresponding to the input value.If there are many intervals, the approximation precision increases, but more time is also needed to determine the subdomain.Thus, the order and the number of polynomials should be balanced and optimized.In order to accomplish this, an optimization scheme is proposed in contrast with [4][5][6][7][8][9][10][11][12][13][14][15][16], which utilized a predetermined polynomial order and number of polynomials.
Several studies have adopted optimization methods for curve fitting.Among the optimization methods, the hp-adaptive pseudo-spectral method, proposed in [17], is most similar to the algorithm proposed in this paper.The method determines the locations of segment breaks and the order of the polynomial in each segment.To be more specific, when the error within a segment displays a consistent pattern, the number of collocation points is increased.On the other hand, if there are isolated points with significantly higher errors than the rest of the segments, then the segment is split at those points.This produces solutions with better accuracy than global pseudo-spectral collocation while utilizing less computational time and resources.The hp-adaptive pseudo-spectral method and the optimal piecewise polynomial (OPP) approximation algorithm proposed in this paper are similar in that they increase accuracy by increasing the number of orders and intervals.The OPP approximation algorithm compares computational costs to obtain an approximation of the minimum cost while satisfying the error norm constraint for the approximation.

OPP Approximation Algorithm
As mentioned in the introduction, polynomial approximation methods are necessary for sensor approximation and other approximation tasks.Through a literature review concerning such methods, we propose the optimization of piecewise polynomials.As an improvement over previous approximation methods, the proposed OPP approximation algorithm automatically determines both the order and number of intervals while focusing on computational efficiency to obtain the approximate polynomial with the smallest computational cost within the specified error norm.
As seen in Figure 1, approximation is processed by finding the optimized order and number of intervals that satisfy the error norm constraint with the minimum computational cost.To achieve this, the algorithm adopts two cost functions.One is for the approximation error that is used by the least squares.The calculated approximation error is compared with the error norm constraint for approximation.The other is computational cost function.It indicates the program run time needed to calculate the value of the function for an input value, x, and is composed of two runtime costs: the cost of polynomial function call and the cost of a binary search tree to determine the relevant interval.In other words, the computational cost is calculated given the order of the polynomial and the number of intervals.The costs for all possible combinations are calculated and sorted in ascending order offline.This is used to search for the order and number of intervals satisfying the error constraint with the minimum cost.Therefore, the OPP approximation algorithm is useful in systems that must be efficient to compute, such as real-time systems and Sensors 2024, 24, 3991 3 of 12 embedded systems.When approximating, if the approximation functions of each interval are placed independently, discontinuity occurs over the entire interval.To address this, a constraint is introduced using the Lagrange multiplier method to smoothly connect the approximation functions of each interval.Applying the approximation function obtained using the OPP approximation algorithm to the systems can minimize the computational cost while satisfying the error norm constraint.In this way, obtaining an approximation function before applying the system can reduce computational time and memory.This paper is organized as follows.In Section 2, previous knowledge related to the algorithms proposed Section 3 is described.In Section 4, the algorithm is examined with representative test functions.Subsequently, the OPP approximation algorithm is discussed, and further work to improve the performance is proposed in Section 5.

Nomenclature
The algorithm and all formulas in this study use the notation in Table 1.The symbols necessary to understand the algorithm are as follows:  , means the coefficient of the  ℎ order term of the  ℎ polynomial.  represents the coefficients of the approximation polynomial in the  ℎ interval as a vector, and  is a ( + 1) vector concatenated as   .  is the approximation cost and   is the computational cost at runtime. * and  * indicate the optimal polynomial order and the number of intervals, respectively.* describes the optimized value for   within   and the minimum   .Scalars, vectors, and matrices are written in lowercase, lowercase boldface, and uppercase boldface, respectively.This paper is organized as follows.In Section 2, previous knowledge related to the algorithms proposed Section 3 is described.In Section 4, the algorithm is examined with representative test functions.Subsequently, the OPP approximation algorithm is discussed, and further work to improve the performance is proposed in Section 5.

Preliminaries 2.1. Nomenclature
The algorithm and all formulas in this study use the notation in Table 1.The symbols necessary to understand the algorithm are as follows: a k,i means the coefficient of the i th order term of the k th polynomial.q k represents the coefficients of the approximation polynomial in the k th interval as a vector, and q is a (m + 1)n vector concatenated as q k .c a is the approximation cost and c c is the computational cost at runtime.m * and n * indicate the optimal polynomial order and the number of intervals, respectively.* describes the optimized value for c a within ϵ a and the minimum c c .Scalars, vectors, and matrices are written in lowercase, lowercase boldface, and uppercase boldface, respectively.The j th sample in the k th interval y k,j The j th function value in the k th interval η(k) Number of samples for the k th interval (m + 1) × 1 vector of polynomial coeffs in the k th interval q (m + 1)n × 1 vector concatenated all q k a k,i Coefficient of the i th order term for the k th polynomial v (n + 1) × 1 vector of boundary values c a Average sum of error squares at sample points ϵ a Error norm constraint for approximation Approximation polynomial for the k th interval

Formulation of Constrained Least Squares
The least squares method is a representative approach in regression analysis to approximate functions or discrete samples by minimizing the sum of the squares of the residuals in the results of each individual equation.The residual is the difference between the given sample and the approximate value, represented by weighted squares, as shown in (1).The matrix form of the least squares method is represented as follows: where T , and The least squares method is accomplished by finding q, which makes the partial differential of error, i.e., the gradient, 0. Consequently, this leads to optimal polynomials with the minimum average of the residual sum of squares at the sample points.For the sake of simplicity, Equation (1) can be shortened into a single monomial expression using the sparse diagonal matrix F as follows: where should be smooth at the boundaries, v k−1 and v k , with adjacent polynomials p k−1 and p k+1 , respectively.Without a loss of generality, in the proposed algorithm, the 1st-order differentiability was appended as a constraint, Gq = 0, for the smoothness.Note that if necessary, it is possible to increase the order for differentiability at the intersection of the adjacent intervals.Consequently, Equation (2) was modified with the Lagrange Multiplier, λ, as follows: where . By partial differentiation of (3) with respect to q and λ, two equations are obtained, i.e., ∂ ∂q e = 0 and Gq = 0.By solving the above equation, the optimal coefficients, which minimize c a , are obtained. where Sensors 2024, 24, 3991 5 of 12

Overall Algorithm Flow
The proposed algorithm is intended for use in systems where computing time is critical, such as real-time systems and/or embedded systems.With the algorithm's runtime, it is particularly important to reduce function call times.Since the approximate function is composed of several polynomials, it must be known how much time is required to execute the function at runtime.To accomplish this, the computational cost function was defined from the four arithmetic operations and binary search tree.This is explained in more detail in the next subsection.Offline, based on the computational cost function, an approximation polynomial with the smallest calculation time was obtained, then used in the system.
The overall flow for the OPP approximation is described in Figure 2. First, the maximum values of polynomial order, m max , and the number of intervals, n max , to explore were set.Afterward, the computational costs for all possible combinations of the number of intervals and the order were calculated and tabulated in ascending order (m ∈ [2, m max ], n ∈ [1, n max ]).Subsequently, q and c a were calculated using the constrained least squares for the k th pair (m, n) k , the sample points (x, y), and the polynomial intervals v.This was repeated with randomly selected v N times, and the case with the smallest c a was selected.If c a was greater than ϵ a , the next pair (m, n) k+1 was examined and q and c a were calculated again.The loop was repeated until c a was less than ϵ a .When c a became smaller than ϵ a , the piecewise polynomials with the order and the number of intervals were determined as the optimized approximate.If the c a of the pair (m, n) max was greater than ϵ a for the pair, it was considered that finding the piecewise polynomial approximation function failed in the specified range.In this case, it would be possible to continue searching by using larger m max and/or n max values, or by relaxing ϵ a . = (−  (  ) −1      +     ), where  = (    ) −1 .

Overall Algorithm Flow
The proposed algorithm is intended for use in systems where computing time is critical, such as real-time systems and/or embedded systems.With the algorithm's runtime, it is particularly important to reduce function call times.Since the approximate function is composed of several polynomials, it must be known how much time is required to execute the function at runtime.To accomplish this, the computational cost function was defined from the four arithmetic operations and binary search tree.This is explained in more detail in the next subsection.Offline, based on the computational cost function, an approximation polynomial with the smallest calculation time was obtained, then used in the system.
The overall flow for the OPP approximation is described in Figure 2. First, the maximum values of polynomial order,   , and the number of intervals,   , to explore were set.Afterward, the computational costs for all possible combinations of the number of intervals and the order were calculated and tabulated in ascending order (  ∈ [2,   ],  ∈ [1,   ]).Subsequently,  and   were calculated using the constrained least squares for the  ℎ pair (, )  , the sample points (, ), and the polynomial intervals  .This was repeated with randomly selected   times, and the case with the smallest   was selected.If   was greater than   , the next pair (, ) +1 was examined and  and   were calculated again.The loop was repeated until   was less than   .When   became smaller than   , the piecewise polynomials with the order and the number of intervals were determined as the optimized approximate.If the   of the pair (, )  was greater than   for the pair, it was considered that finding the piecewise polynomial approximation function failed in the specified range.In this case, it would be possible to continue searching by using larger   and/or   values, or by relaxing   .

Computational Cost
In this section, the computational cost is described.Since the approximated function was piecewise with several polynomials, at runtime, it took two computation times: the Sensors 2024, 24, 3991 6 of 12 time taken for a single polynomial computation and the time taken for selecting a specific polynomial.Therefore, the computational cost function is defined as follows: The computing time for a single polynomial depends on the order of polynomial.This is defined as c p (m), as follows: where n a , n r , and n f are the number of assignments, arithmetic, and 'for' instructions, respectively.In addition, r a , r r , and r f are cycle counts of assignment, arithmetic, and 'for' instructions.m, the order of polynomial, indicates the repetition count.
Since the approximate function is implemented with several polynomials, the clock cycles are required for the binary search, which is intended to find the suitable intervals for a certain input x.This is defined as c b and determined as the average of the short path case and long path case in a binary search tree.Two examples are illustrated in Figure 3.

Computational Cost
In this section, the computational cost is described.Since the approximated function was piecewise with several polynomials, at runtime, it took two computation times: the time taken for a single polynomial computation and the time taken for selecting a specific polynomial.Therefore, the computational cost function is defined as follows: (, ) =   +   . (5) The computing time for a single polynomial depends on the order of polynomial.This is defined as   (), as follows: where   ,   , and   are the number of assignments, arithmetic, and 'for' instructions, respectively.In addition,   ,   , and   are cycle counts of assignment, arithmetic, and 'for' instructions., the order of polynomial, indicates the repetition count.
Since the approximate function is implemented with several polynomials, the clock cycles are required for the binary search, which is intended to find the suitable intervals for a certain input .This is defined as   and determined as the average of the short path case and long path case in a binary search tree.Two examples are illustrated in Figure 3. Since the average clock cycles to determine the corresponding polynomial depend on the probability that the input is in a certain interval,   is a probability distribution function for the input value.For the sake of simplicity, the probability that an input is in a Since the average clock cycles to determine the corresponding polynomial depend on the probability that the input is in a certain interval, c b is a probability distribution function for the input value.For the sake of simplicity, the probability that an input is in a particular interval is the same for all intervals.Consequently, the costs associated with the number of intervals are as follows: where c bs = (d m − 1)(n a r a + n d r d + n r r r + n w r w + n i r i ), p s = n s /n, c bl = d m (n a r a + n d r d + n r r r + n w r w + n i r i ), and p l = n l /n.c bs and c bl represent the binary search tree costs for the short and the long path, respectively.Similarly, n s and n l are the numbers of the short path and the long path, respectively.Note that n s + n l = n.d m indicates the depth of the tree and is an integer value satisfying 2 d m −1 ≤ n < 2 d m .In addition, n d , n h , and n i are the numbers of divisions, 'while', and 'if' instructions, respectively.Finally, r d , r h , and r i are cycle counts of division, 'while', and 'if' instruction for the ARM Cortex-M7 core (see [18]), respectively.The calculated c c (m, n) is organized into a table by m and n (see Figure 4) and sorted in ascending order.In this paper, it is a priority to increase m when the costs are the same.
number of intervals are as follows: where   = (  − 1)(    +     +     +     +     ) ,   =   / ,   =   (    +     +     +     +     ), and   =   /.  and   represent the binary search tree costs for the short and the long path, respectively.Similarly,   and   are the numbers of the short path and the long path, respectively.Note that   +   = .  indicates the depth of the tree and is an integer value satisfying 2   −1 ≤  < 2   .In addition,   ,  ℎ , and   are the numbers of divisions, 'while', and 'if' instructions, respectively.Finally,   ,  ℎ , and   are cycle counts of division, 'while', and 'if' instruction for the ARM Cortex-M7 core (see [18]), respectively.The calculated   (, ) is organized into a table by  and  (see Figure 4) and sorted in ascending order.In this paper, it is a priority to increase  when the costs are the same.

Application of the OPP Approximation Algorithm
The approximation function determined through the OPP algorithm is applied to the system and used.Implementing the OPP algorithm has several advantages.First, the OPP algorithm improves the calculation process by minimizing the resources required for realtime calculations.Second, this speeds up the execution time and improves the overall performance of the system.It makes resource management easier by reducing memory and computational load in the system.This is especially important for efficiently using resources in systems with limited hardware capabilities.Third, the algorithm can process pre-computations depending on the size and complexity of the data, allowing for efficient expansion.Thanks to this scalability, the OPP algorithm is suitable for a wide range of applications across various domains.Including the OPP approximation algorithm in the system design not only simplifies the computational process, but also improves the system's ability to efficiently manage resources and handle complex datasets.

Experimental Results
In order to verify the performance of the proposed OPP algorithm, it was implemented using MATLAB with several nonlinear functions.In this experiment, we set the number of samples to 100,   = 10,   = 20, and   = 0.0001.And the boundary

Application of the OPP Approximation Algorithm
The approximation function determined through the OPP algorithm is applied to the system and used.Implementing the OPP algorithm has several advantages.First, the OPP algorithm improves the calculation process by minimizing the resources required for real-time calculations.Second, this speeds up the execution time and improves the overall performance of the system.It makes resource management easier by reducing memory and computational load in the system.This is especially important for efficiently using resources in systems with limited hardware capabilities.Third, the algorithm can process pre-computations depending on the size and complexity of the data, allowing for efficient expansion.Thanks to this scalability, the OPP algorithm is suitable for a wide range of applications across various domains.Including the OPP approximation algorithm in the system design not only simplifies the computational process, but also improves the system's ability to efficiently manage resources and handle complex datasets.

Experimental Results
In order to verify the performance of the proposed OPP algorithm, it was implemented using MATLAB with several nonlinear functions.In this experiment, we set the number of samples to 100, m max = 10, n max = 20, and ϵ a = 0.0001.And the boundary values were determined as the values when the approximate errors were the smallest after randomly changing 100 times.Figure 5 is the result of approximation for the logarithmic function.The logarithmic function was chosen as the test function because there are many sensors whose output results are logarithmic, such as temperature sensors and distance sensors.Figure 6 shows the approximation results for the sine function that is frequently used in many applications.Figures 7 and 8 show the approximate results for the triangular and square functions, respectively.The triangular function is adopted to examine an approximation of the continuous nonlinear function with undifferentiable points.In Figure 7, the approximation is shown with a smooth curve at a sharp point.And the square function is used for curve fitting of the function, which has discontinuous points.Figure 8 shows that overfitting occurs in the section where the value changes rapidly, resulting in oscillation.Figure 9 is the result of approximation of the sigmoid function.The sigmoid function, also known as the logistic function, is commonly used in various fields such as neural networks, machine learning, and statistics.Figure 10 is the result of approximation for the ReLU function, which is one of the most commonly used activation functions in deep learning models.Figure 11 is the result of the user-specified function for testing a nonlinear function that is more complex than the previous differential functions across all intervals.The function is represented as follows: 7, the approximation is shown with a smooth curve at a sharp point.And the square function is used for curve fitting of the function, which has discontinuous points.Figure 8 shows that overfitting occurs in the section where the value changes rapidly, resulting in oscillation.Figure 9 is the result of approximation of the sigmoid function.The sigmoid function, also known as the logistic function, is commonly used in various fields such as neural networks, machine learning, and statistics.Figure 10 is the result of approximation for the ReLU function, which is one of the most commonly used activation functions in deep learning models.Figure 11 is the result of the user-specified function for testing a nonlinear function that is more complex than the previous differential functions across all intervals.The function is represented as follows: Overall, the approximation for continuous and differentiable functions over all intervals is a satisfactory result.However, in functions with continuous, but undifferentiable, points and functions with discontinuous points, they have lower accuracy and higher   (, ) than continuous functions.This can be solved by increasing the sample point.Another method is to vibrate the boundary value, which will be studied later.Overall, the approximation for continuous and differentiable functions over all intervals is a satisfactory result.However, in functions with continuous, but undifferentiable, points and functions with discontinuous points, they have lower accuracy and higher c c (m, n) than continuous functions.This can be solved by increasing the sample point.Another method is to vibrate the boundary value, which will be studied later.

Conclusions
To approximate the complex function or discrete sample points, the optimal piecewise polynomial (OPP) approximation algorithm was proposed.The maximum values of polynomial order and the number of intervals to explore were preset, and the computational cost was calculated to determine the order and number of intervals for the approximation function.Then, the combination of the order and the number of intervals was sorted in ascending order based on the computational cost offline.Subsequently, the coefficient of the polynomial and the approximation error at the sample points were determined using constrained least squares.If the error was greater than the given error bound,

Conclusions
To approximate the complex function or discrete sample points, the optimal piecewise polynomial (OPP) approximation algorithm was proposed.The maximum values of polynomial order and the number of intervals to explore were preset, and the computational cost was calculated to determine the order and number of intervals for the approximation function.Then, the combination of the order and the number of intervals was sorted in ascending order based on the computational cost offline.Subsequently, the coefficient of the polynomial and the approximation error at the sample points were determined using constrained least squares.If the error was greater than the given error bound, the next combination was examined until the error was less than the bound.Ultimately, the OPP approximation algorithm determined the fastest approximation function at a runtime within a permissible error margin.The performance of the proposed algorithm was demonstrated through several nonlinear functions.
There are some further works needed to obtain a more accurate approximation function.In this paper, the optimal intervals were obtaioned by random sampling, i.e., the Monte Carlo method.To improve the performance, the gradient-based method will also be applied to determine the optimal boundaries, which will minimize the approximation error.Moreover, the performance will be verified and analyzed in experiments with actual embedded systems.

Figure 2 .
Figure 2. Overall algorithm flow for OPP approximation.Figure 2. Overall algorithm flow for OPP approximation.

Figure 2 .
Figure 2. Overall algorithm flow for OPP approximation.Figure 2. Overall algorithm flow for OPP approximation.

Figure 4 .
Figure 4. Computational cost according to the order of polynomial (m) and the number of intervals (n) (m max = 10, n max = 20).

Table 1 .
Meanings of symbols.