A New Perturbation Approach to Optimal Polynomial Regression

Abstract: A new approach to polynomial regression is presented using the concepts of orders of magnitudes of perturbations. The data set is normalized with the maximum values of the data first. The polynomial regression of arbitrary order is then applied to the normalized data. Theorems for special properties of the regression coefficients as well as some criteria for determining the optimum degrees of the regression polynomials are posed and proven. The new approach is numerically tested, and the criteria for determining the best degree of the polynomial for regression are discussed.


Introduction
In experimentation, one usually obtains a data set in the form of (x i , y i ) i = 1,2, . . .N. The continuous functional relation of the dependent variable y i on the independent variable x i is of practical importance, since data are needed for the missing intervals of the variables.Regression analysis is one of the most widely used methods to determine such a relationship, the most common of which being the linear regression.Depending on the nature of the data, nonlinear regression might be inevitable since the data may not always be perfectly approximated by a straight line.Nonlinear regression may be performed with usage of any simplified functional relationship as well as a polynomial with degree n, n ě 2.
In this study, the polynomial regression problem is handled with a slightly different approach than the existing analysis in the literature.First, the given data are normalized by the division of the maximum values of each variable.The polynomial regression is then applied to this normalized data.Note that this normalization process does not alter the form of the polynomial representation; rather, it has direct effects on the magnitudes of the coefficients only.By this process, the regression coefficients acquire some important properties that are outlined in three theorems.The theorems employ the concept of the order of magnitudes of the perturbation theory and are about the properties of the regression coefficients.The additional two theorems are about the error introduced in the regression analysis.The theory developed is applied next to several data sets to outline the algorithm and determine the optimal value of the degree n in a polynomial regression.
A good review of the nonlinear regression with concepts rather than the mathematical formulations is presented by Motulsky and Ransnas [1].The Anderson's procedure [2] for determining the optimum degree in a polynomial regression is to start from a prescribed certain n value and the procedure is to test in sequence whether the coefficients are 0. If the coefficient of the highest degree is zero, one lower degree polynomial is taken until a non-zero coefficient for the highest degree polynomial term is obtained.In the present analysis, the requirement of a zero highest degree coefficient is somewhat relaxed for the normalized data; if the coefficient of the highest degree is small, i.e., of O(ε), ε << 1, then a lower degree polynomial can be selected.For some theoretical results on the topic and optimal designs for the Anderson's procedure, see Dette [3] and Dette and Studden [4].Nonparametric regression techniques were utilized to test the validity of a kth-order polynomial regression model by Jayasuri [5].Chebyshev polynomials have been used to approximate the polynomial obtained by the regression analysis to a lower degree polynomial with required accuracy, keeping the polynomial coefficient within the estimated intervals at the same time by Tomašević et al. [6].

Regression Analysis
Assume that a set of data is given or obtained from experimentation (x i , y i ) i = 1,2, . . .N and that the data can be approximated by an nth degree polynomial: The total difference with squaring all the differences between the data and the approximation is The aim is to find the best coefficients that make the above total differences minimum; hence, which is called the method of least squares [7].The above set of equations lead to a matrix equation where The optimum coefficients are then A " X ´1pnqY The standard regression error is defined to be [7] S y{x " where N ´(n + 1) is called the degree of freedom.

New Theorems
In perturbation theory, one deals with dimensionless quantities rather than the dimensional ones to enable healthy comparisons of magnitudes of terms.The data set obtained from experimentation may have any unit, and the simplest way to obtain dimensionless quantities is to divide the data by the maximum value in the set where x i , y i are called the normalized data.If one assumes that the measured quantities are all positive, the normalized data set is confined in a square region in the first quadrant p0 ď x i ď 1, 0 ď y i ď 1q.
The regression analysis will be carried out for the normalized data.The normalization is crucial and many interesting features about the coefficients can be derived with this normalization.

Theorem 1
For the polynomial regression of degree n for the normalized data, if all a i ě 0, then a i " Opε k i q, k i ě 0 pε ăă 1q.

Proof
The theorem states that there cannot be a large coefficient with an order of magnitude greater than 1 if all coefficients turn out to be positive.Since the data are confined in a square, for x " 1, y pnq ď Op1q.Hence, from (9) a n `an´1 `... `a1 `a0 ď Op1q (10) or replacing the magnitudes of the terms If at least one of the k i is less than zero, there is an unbalanced large term of Op 1 ε ´ki q, ´ki ą 0, which spoils the inequality.Hence, if all coefficients are positive, the coefficients can be at most O(1) Theorem 2 For the polynomial regression of degree n for the normalized data if there are large coefficients, i.e., a i " Opε k i q, k i ă 0 pε ăă 1q, then the coefficients cannot have the same signs, and other large term(s) a m " Opε k m q, k m ă 0 should appear with opposite signs.

Proof
For x " 1, y pnq ď Op1q.Hence, from (12) If there is a large term with k i less than zero, this term must be balanced by other large term(s) a m " Opε k m q, k m ă 0 with opposite signs so that the sum adds to at most an O(1) term Theorem 3 For the polynomial regression of degree n for the normalized data, the sum of the regression coefficients are bounded such that with the equality sign holding for the specific set of data where x N = x max , y N = y max .

Proof
For x " 1, due to normalization, y ď Op1q.Using the normalized regression equation and substituting x " 1, yp1q " If y max corresponds to x max , then yp1q -1 due to the approximation of the regression; hence, the equality sign holds for (15) The previous three theorems are all about features of the regression coefficients.They can be used to check the preciseness of the determination of the coefficients.The following theorem is about the standard regression error for the normalized data.

Theorem 4
For a good polynomial regression of the normalized data, the standard regression error is bounded by S y{x ď Opεq pε ăă 1q (16)

Proof
If the representation is good, the curve passes close enough to each data point; since the data are within a square box of length 1, the distance between each datum and the curve can only be a small fraction of 1.Hence, ˇˇy i ´ypnq px i q ˇˇď Opεq pε ăă 1q Substituting the above into (7) yields, For a good representation, the number of data points N should be much larger than the degrees of freedom n + 1; hence, N N ´pn `1q " Op1q.Substituting the order of magnitude into (18) yields, S y{x ď Opεq Theorem 4 can be used roughly to check the appropriateness of the polynomial regression.As ε Ñ 0, the fit becomes better.
The last theorem gives a criterion for using a polynomial with degree n instead of n + 1.

Theorem 5
Assume that the normalized data are expressed with a polynomial regression of degree n: with b n ~O(1).For the polynomial regression of degree n + 1 of the same data, if a n+1 ~O(ε), (ε << 1), then the average approximation error for using the degree n polynomial instead of the degree n+1 polynomial is bounded by ˇˇy pn`1q px i q ´ypnq px i q ˇˇď Opεq (21)

Proof
Since x i " Op1q, due to normalization, y pnq px i q -n ř i"1 b i " Op1q via Theorem 3.
a i " Op1q also via Theorem 3. Since a n`1 " Opεq, n ř i"1 a i ~O (1).Replacing the order of magnitudes to (21), epn, n `1q ď 1 N N Opεq ď Opεq Theorem 5 leads to a useful criterion for determining the degree of the regression polynomial.If the leading coefficient in a regression polynomial of degree n comes out to be of order 1 and by increasing the degree to n + 1, if the leading coefficient becomes much smaller than 1, there is no point in using a more complicated regression formula.Hence, one may stop at degree n.

Numerical Applications
Applications of the theory are depicted by numerical simulations.In this work, the study is confined to a polynomial type regression analysis.Usually, the data obtained from an experiment do not give precise information about the degree of the polynomial or whether a polynomial representation is suitable at all or not.To outline our algorithm, carefully selected data of polynomial and functional types are examined.The data are somewhat distorted so that an exact match between the data points and regression curves are not possible in accordance with the real cases.Applications of the theorems are indicated on each data set.
In Table 1, an artificial data set distorted from a linear relationship is chosen.n represents the degree of the polynomial.For each data set, the determinant of X(n) (see Equation ( 5)), the regression coefficients, the sum of the coefficients, the standard regression errors, and the difference of the n´1th degree regression error and nth degree regression error are given.For n = 1, since the data are suitable for a linear relationship, all indicators are good, i.e. the determinant is not too singular, the coefficients are all positive and not larger than O(1) (Theorem 1), the highest order coefficient is of order 1 (Theorem 5), the sum of the coefficients is close to 1 (Theorem 3), and the standard error is small (Theorem 4).When one tries to fit a parabolic curve to the same data set, the coefficient of the highest order term becomes a small term (a 2 = ´0.0628),an obvious indication that the data set does not need a polynomial of degree 2 (Theorem 5).The standard error increased slightly, which is another indicator that the parabolic curve is unnecessary.The difference of the standard errors between the two steps is negative, which indicates that one should stop at this stage and use the regression analysis of the previous step (n = 1).Table 2 is produced from a distorted set of parabolic data.When n = 2, the coefficient of highest order term is not small, the sum of the coefficients are approximately 1, and the standard error is reduced to a much smaller number compared to the previous case of n=1 (Theorem 4).However, when a cubic polynomial is fit to the data, one sees that the coefficient of the highest order term is small compared to 1 (a 3 = 0.1414) (Theorem 5), the sum of the coefficients start deviating from 1 (Theorem 3), the standard error increased compared to the previous step, the difference of the standard errors becomes negative, and the determinant is much more singular, all indicating that n = 3 is not a suitable choice and one should stop at n = 2.Note that, if the data were exactly parabolic, a 3 = 0. Hence, the criterion should not be to require the highest degree coefficient to be zero; instead, if it appears to be small, there is no point in using a more complex representation.Higher representations may lower the standard error but may lead to a phenomenon called polynomial wiggling, which casts doubts about the preciseness of the missing intermediate data.Table 3 is produced from a distorted data of a cubic polynomial.At n = 4, the determinant becomes very singular, the highest degree coefficient appears to be small (a 4 = 0.1626) (Theorem 5), the sum of the coefficients start deviating from 1 (Theorem 3), the standard regression error is higher than the previous case, and the difference turns out to be negative, all indicating that n = 3 is the best choice.In Table 4, functional-type data are considered.The data are a distorted form of a logarithmic relationship.Given the previous criterion, the ideal representation of the data is a cubic polynomial because the determinant is too much singular for n = 4, the sum of the coefficients start deviating from 1 and the difference of the standard errors become negative.Note that, since the original data are not of a polynomial form, the highest order coefficient at n = 4 is not small, but, at this stage, one has larger opposite sign coefficients (Theorem 2), which is an indicator that one should stop and take the n value of the previous stage.Finally, the distorted data of a function with fractional power are considered in Table 5. Starting at n = 4, the determinant is very singular, there are large opposite sign coefficients (Theorem 2), the sum starts deviating from 1, and, although the difference of the errors does not turn to be negative, the gain is very marginal.Therefore, n = 2 or n = 3 may be used to express the given data.

Concluding Remarks
For the case of a normalized data, new theorems are posed and proven for the polynomial regression coefficients and errors.The concepts of orders of magnitudes in the perturbation theory is utilized.These theorems turn out to be useful in determining the optimum value of the highest degree in a polynomial representation.The steps of the algorithm is given below: (1) Start from a linear relationship with n = 1 and increase n by one at each step; (2) Calculate the determinant, coefficients, sum of coefficients, standard regression error, and difference of errors (for n ě 2); (3) Check the singularity of the determinant, the highest degree coefficient, the remaining coefficients, the sum of the coefficients, the standard regression error, and the difference of errors; (4) Stop at n if the highest order coefficient turns out to be small, and/or the difference of errors is negative, and use n ´1 as an ideal representation; and (5) Stop at n if the determinant is too singular, there are opposite sign large coefficients, the sum of coefficients start deviating from the ideal value, or the standard error is small.One may use n ´1 as an ideal representation.
If the data are suitable for a polynomial representation, then indicators at stage 4 are observed and the decision of the optimal n value is clearer.When the functional dependence is not suitable for a polynomial representation, one criterion, or both criteria, of stage 4 need not be observed, and the fainter criterion of stage 5 may be employed.
For the normalized data, this work suggests a simple and alternative way of determining optimal n values in contrast to the standard statistical approaches.

Table 1 .
Polynomial regression for normalized data close to a linear function.

Table 2 .
Polynomial regression for normalized data close to a parabolic function.

Table 3 .
Polynomial regression for normalized data close to a cubic function.

Table 4 .
Polynomial regression for normalized data close to a logarithmic function.

Table 5 .
Polynomial regression for normalized data close to a fractional power function.