Kernel-Free Quadratic Surface Support Vector Regression with Non-Negative Constraints

In this paper, a kernel-free quadratic surface support vector regression with non-negative constraints (NQSSVR) is proposed for the regression problem. The task of the NQSSVR is to find a quadratic function as a regression function. By utilizing the quadratic surface kernel-free technique, the model avoids the difficulty of choosing the kernel function and corresponding parameters, and has interpretability to a certain extent. In fact, data may have a priori information that the value of the response variable will increase as the explanatory variable grows in a non-negative interval. Moreover, in order to ensure that the regression function is monotonically increasing on the non-negative interval, the non-negative constraints with respect to the regression coefficients are introduced to construct the optimization problem of NQSSVR. And the regression function obtained by NQSSVR matches this a priori information, which has been proven in the theoretical analysis. In addition, the existence and uniqueness of the solution to the primal problem and dual problem of NQSSVR, and the relationship between them are addressed. Experimental results on two artificial datasets and seven benchmark datasets validate the feasibility and effectiveness of our approach. Finally, the effectiveness of our method is verified by real examples in air quality.


Introduction
For regression problems, sometimes there is a priori information, such as the response variable increasing as the explanatory variable increases. It is more natural to expect that the the air quality will decrease when the pollution gas concentration increases. However, the model sometimes obtains regression coefficients that do not match this a priori information, which can reduce the credibility and prediction accuracy of the model. Therefore, to solve this problem, we restrict the range of values of the regression coefficients to ensure the soundness of the model. At present, several types of constraints have been utilized, including non-negative constraints [1][2][3][4], monotonicity constraints [5][6][7][8], smoothing constraints [9][10][11], etc. Powell et al. [12] proposed a Bayesian hierarchical model for estimating constraints conditional random fields to analyze the relationship between air pollution and health. Moreover, nonnegative constraints have been applied to various problems. The non-negative least squares problem (NNLS) was introduced by Lawson [13]. Chen et al. [14] presented non-negative distributed regression as an effective method specifically designed for analyzing data in wireless sensor networks. Shekkizhar et al. [15,16] proposed non-negative kernel regression to handle graph construction from data and dictionary learning. Additionally, Chapel et al. [17] proposed non-negative penalized linear regression to address the challenge of unbalanced optimal transport. Due to its excellent generalization ability, support vector regression (SVR) [18] has been widely used in various fields, such as the financial industry [19,20] and construction 1.
NQSSVR is proposed by utilizing the kernel-free technique, which avoids the complexity of choosing the kernel functions and their parameters, and has interpretability to some extent. In fact, the task of NQSSVR is to find a quadratic regression function to fit the data, so it can achieve better generalization ability than other linear regression methods.

2.
The non-negative constraints with respect to the regression coefficients are added to construct the optimization problem of NQSSVR, which can obtain a monotonically increasing regression function with explanatory variables on a non-negative interval. In some cases, the value of the response variable grows as the explanatory variable grows. For example, when exploring the air quality examples, the air quality index will increase as the concentration of gases in the air increases.

3.
Both the primal and dual problems can be solved, since our method does not involve kernel functions. In the theoretical analysis, the existence and uniqueness of solutions to the primal and dual problems, as well as their interconnections, are analyzed. In addition, the properties of regression function on the domain of definition are given.

4.
Numerical experiments on artificial datasets demonstrate the visualization results of the regression function obtained by our NQSSVR. The results on benchmark datasets show that the comprehensive performance of the method is relatively better than that of linear-SVR and NNSVR. In addition, more importantly, by exploring the practical application of air quality, it can be shown that our method is more applicable than QLSSVR and -SQSSVR.
The paper is structured as follows. Section 2 introduces a brief introduction to the -SQSSVR model, and some definitions and notations. In Section 3, we construct the primal and dual problems for NQSSVR and analyze the corresponding properties. Section 4 presents the results of numerical experiments conducted on datasets. Finally, Section 5 provides conclusions from this study.

Background
In this section, we give the related definitions and notations, and review the -SQSSVR model.

Definition and Notations
The following mathematical notations are utilized in this paper. Lowercase bold and uppercase bold represent vectors and matrices, respectively. I is the identity matrix of any size, S m is the set of m-dimensional symmetric matrices, R n×m is the set of n × m dimensional matrices. Next, define the operators as follows.

-SQSSVR
Given the training set where x i ∈ R m , y i ∈ R, i = 1, 2, · · · , n. The task of -SQSSVR is to seek the quadratic regression function where W ∈ S m , b ∈ R m , and c ∈ R. To obtain the regression function (5), the optimization problem is established as follows: where C > 0 is a penalty parameter, and ξ ( * ) = (ξ 1 , ξ * 1 , · · · , ξ n , ξ * n ) T is the slack vector. The optimization problem (6)-(9) is a quadratic programming problem, so it can be solved directly. In addition, this model uses the quadratic surface kernel-free technique, which avoids the difficulty of choosing the kernel function and the corresponding parameters.

Kernel-Free QSSVR with Non-Negative Constraints (NQSSVR)
In this section, we establish the primal and dual problems of the kernel-free QSSVR with the non-negative constraints (NQSSVR). The properties of primal and dual problems are discussed, and the properties of the regression function with non-negative constraints are proved.

Primal Problem
Given the training set T (4), to find the regression function (5), the following optimization problem is formulated where W = (w kl ) m×m ∈ S m , b = (b 1 , · · · , b m ) ∈ R m , c ∈ R. w kl 0, b k 0, k, l = 1, · · · , m mean that each component of W and b is greater than or equal to zero. C > 0 is the penalty parameter, and ξ ( * ) = (ξ 1 , ξ * 2 , · · · , ξ n , ξ * n ) T is a slack vector. In the above optimization problem (10)- (14), we impose constraints on the regression coefficients, namely w kl 0, b k 0, k, l = 1, · · · , m. Restricting the range of values of regression coefficients can help us to obtain regression functions that are more consistent with a priori information. In addition, the optimization problem does not involve kernel functions, which can avoid the complicated process of kernel functions, and its parameters selection further reduced computation time.
According to Definitions 1-3, the primal optimization problem (10)- (14) is simplified to the following form: min z,c,ξ ( * ) , c ∈ R, I ∈ S m , s i lvec(x i ). z ≥ 0 means that each component of z is greater than or equal to to zero. The matrix G is positive semidefinite matrix, since

Some Theoretical Analysis
In this subsection, the theoretical properties of the primal and dual problems, as well as the regression function after adding the non-negative constraints, are analyzed into properties. Theorem 1. Given the training set T (4) and C > 0, if G is a positive definite matrix and (z * , c * , ξ * , ξ * * ) is the optimal solution to the primal problem (15)- (19), then z * is unique.

Theorem 2.
For the training set T (4) and C > 0, if the matrix G is positive definite, the optimal solution α ( * ) =(α 1 , α * 1 , · · · , α n , α * n ) T of the dual problem (32)-(35) exists and is unique, and the optimal solution of the primal problem (15)- (19) can be expressed as Proof. By the (21) equation in the KKT condition, we have If α ( * ) there exists components, α j and α * k . such that or Next, the properties of the regression function (5) after adding non-negative constraints with respect to the regression coefficients are analyzed. The domain D is defined as follows: where w kl 0, and b k 0, k, l = 1, · · · , m mean that each component of W and b is greater than or equal to zero.
Proof. The function g(x) can be written as It only remains to justification that this holds for the k-th component of x, the quadratic function containing [x] k can be expressed as Taking the derivative of the above equation yields On the domain D, the function g([x] k ) monotonically non-decreasing is equivalent to being non-negative at the right end of the above equation, so it is a necessary and sufficient condition to prove that the latter holds as follows: Sufficiency is obvious, and we only need to prove necessity. Supposing b k < 0 and taking x = ([x] 1 , . . . . . . , [x] m ) T = 0, then we obtain the following: The above relation is contrary to the known conditions. Supposing the existence of w kl < 0 with all other components being zero. Obviously, when the [x] l is sufficiently large, the following formula holds: This is a contradict. Similarly, it can be shown that Theorem 3 holds.

Numerical Experiments
To verify the validity of our proposed NQSSVR model, we compare it with other methods, including linear SVR (lin-SVR), SVR with Gaussian kernel (rbf-SVR), and polynomial kernel (poly-SVR), and linear SVR with non-negative constraints (NNSVR), as well as QLSSVR and -SQSSVR. The primal and dual problems of the NQSSVR method are denoted as NQSSVR(p) and NQSSVR(d), respectively. The above experiments are tested on 2 artificial datasets, 7 UCI [30] datasets, and AQCI datasets. All numerical experiments in this section are conducted on a computer equipped with a 2.50GHz (i7-9700) CPU and 8G RAM using MatlabR2016(a).
To validate the fitting performances of various methods, the following four evaluation criteria are introduced as shown in Table 1. Without loss of generality, letŷ i and y i be the predicted and mean values, respectively. The penalty parameters C and -insensitive parameter, as well as the Gaussian kernel parameter σ, are selected from {2 i | i = −6, −5, · · · , 5, 6}, while the polynomial kernel parameter p is selected from {1, 2}. All methods are selected through 5-fold cross-validation to obtain the optimal parameters.

Evaluation Criteria Formulas
Average test time T2 Time to select parameters

Artificial Datasets
The 2 artificial datasets are conducted to validate the performance of the NQSSVR model.

Example 1.
The fitting results of an artificial dataset are shown in Figure 2. The data points are denoted by "·".    Table 2 shows the results of the our proposed model and SVR with kernel functions on the two datasets mentioned above. When considering the linear regression problems, it is evident that the five models yield n similar outcomes. However, when addressing the non-linear data, our method demonstrates superior performance compared to the other three methods, as evidenced by the smaller average values of RMSE and MAE. Moreover, the difference in R 2 values between our method and the optimal result is minimal. Notably, the T2 values reveal that our method exhibits faster computation times than SVR with kernel function. This advantage comes from the fact that it does not contain a kernel function, thus eliminating the need for kernel parameter selection. For example 2, the influence of the parameter on the accuracy of our proposed method is analyzed. As can be seen by Figure 3, the penalty parameters C and insensitive loss parameter have a greater impact on the accuracy of the NQSSVR model. So, reasonable parameters can improve the accuracy of the model. In the next experiments, we choose the optimal parameters for the model within the defined parameter range.  Next, the average test time is compared for NQSSVR(p), NQSSVR(d), rbf-SVR, and poly-SVR. Since lin-SVR is only applicable to linear regression, no comparison is made here. The CPU running time of the above four methods in different dimensions and data points is shown in Table 3. Where the input dimensions m of data point are 2, 4, 8, and 16 and the number n of data points is 200, 400, 600, 800, and 1000, respectively. It is noteworthy that the time variation of NQSSVR(p) remains small as the number of data points increases for the same input dimension, outperforming both the rbf-SVR and poly-SVR methods. Moreover, when the number of data points is consistent, NQSSVR(p) exhibits shorter average test time costs compared to rbf-SVR and poly-SVR. Furthermore, as the input dimension of data points increases, the average test time for the dual problem is found to be shorter than that for the primal problem.

Benchmark Datasets
In this section, to further validate the reliability of the proposed method, the NQSSVR model is compared with the lin-SVR, poly-SVR, rbf-SVR, NNSVR, QLSSVR, and -SQSSVR models on seven benchmark datasets. Details of all datasets are listed in Table 4. All datasets are normalized before conducting data experiments, and are divided into training datasets, test datasets and a validation datasets in a ratio of 3:1:1. All methods are compared on evaluation criteria: MAE, RMSE, T1, T2. The top two relatively better results are highlighted in bold. All results were repeated 5 times and their mean values are calculated.  Table 5 lists the regression results of the eight methods on the seven datasets. In terms of the evaluation criteria RMSE and MAE, it can be seen that NQSSVR is significantly better than lin-SVR and NNSVR. For most of the datasets, the NQSSVR model outperforms QLSSVR and ε-SQSSVR, and is not significantly different from rbf-SVR and poly-SVR. In terms of time, our method is second only to QLSSVR and outperforms other methods. To compare the performances of our proposed method and other six methods, the Friedman test and post hoc test are employed. Initially, the Friedman test is conducted with the null hypothesis states that all methods have the same performances. Furthermore, we can calculate the Friedman statistics for each evaluation criterion using the following formula.
where N and K are, respectively, the numbers of datasets and methods, R i is the average rank of the i-th method.
According to the Formula (59), the Friedman statistics corresponding to the three criteria are 12.2124, 13.8361 and 35.1600, respectively. Next, for α = 0.05, the critical value of Friedman statistic is calculated to be F α = 2.2371. Since the Friedman statistic on each regression criteria is greater than F α , so we reject the null hypothesis. That is, these 8 methods have significantly different performances on the 3 evaluation criteria. To further compare the difference of each method, we proceed with a post hoc test. Specifically, if the difference of average ranks for two methods is larger than the critical difference (CD), then their performances are considered to be significantly different. Where the CD value can be calculated by the Formula (60) For α = 0.05, we know q α = 3.0308. Thus, we obtain CD = 3.9685 by the Formula (60). Figure 4 visually displays the results of Friedman test and Nemenyi post hoc test on three regression evaluation criteria, respectively. Where the average ranks of each method for three criteria are marked along an axis. The axis is turned so that the lowest (best) ranks is to the right of each criterion. Groups of methods that are not significantly different are linked by a red line. Statistically, the performance of NQSSVR (p) is not significantly different from rbf-SVR and poly-SVR in terms of RMSE, MAE. And our method ranks better than both the kernel-free quadratic surface and lin-SVR models on RMSE, MAE. In terms of time, our model ranks third and fourth, outperforming the SVR with kernel functions and -SQSSVR. In general, the comprehensive performance of our method is similar to rbf-SVR and poly-SVR, and completely superior to lin-SVR and NNSVR.

Air Quality Composite Index Dataset (AQCI)
This section uses two AQCI datasets, the monthly AQCI dataset and the daily AQCI dataset. These two datasets containing 18 data points and 841 data points, respectively. Each data point has six input features including nitrogen dioxide (NO 2 ), sulfur dioxide (SO 2 ), PM2.5, ozone (O 3 ), carbon monoxide (CO), PM10, respectively. And the output response is AQCI. Our method is compared with QLSSVR, -SQSSVR and NNSVR.
In Figure 5, the value of AQCI has a tendency to increase as the values of the remaining five input features increase, except for the O 3 feature. Therefore, the regression function we obtain should be monotonically increasing on the monthly AQCI dataset.  Table 6 shows the experimental results of our NQSSVR and the other three methods on these two datasets. The accuracy of our model is better than that of QLSSVR and -SQSSVR on the datasets because our NQSSVR imposes non-negative constraints with respect to the regression coefficients. In addition, our model has greater accuracy than the NNSVR model. Because NNSVR can only obtain a linear regression function, but NQSSVR can obtain a quadratic regression function. To investigate the effect of adding non-negative constraints on the accuracy of the regression function, we compare the regression coefficients W and b obtained by NQSSVR(p) with those obtained using the other three methods. Since NNSVR is a linear model, it has only linear term coefficients b and does not involve nonlinear term coefficients W. The regression coefficients obtained by the four methods are small, and for comparison purposes, we enlarge the W and b by a factor of 100 and 10, respectively, before drawing the figure. When the regression coefficient is negative, the color of the color block is closer to blue.
We want to obtain a regression function that is monotonically increasing, by Theorem 3 it is equivalent to the non-negative constraints with respect to regression coefficients. From the Figures 6 and 7, we can see that -SQSSVR and QLSSVR obtained W and b contain negative numbers, so the regression functions obtained do not match the a priori information. However, our model yields regression coefficients that all match the a priori information. Therefore, adding non-negative constraints can improve the accuracy and reasonableness of the model. Since our method can obtain a quadratic regression function, it is therefore more accurate than the linear regression function obtained by NNSVR.

Conclusions
For the regression problem, a novel kernel-free quadratic surface support vector regression with non-negative constraints (NQSSVR) is proposed by utilizing the kernelfree technique and introducing the non-negative constraints with respect to regression coefficients. Specifically, by using a quadratic surface to fit the data, the regression function is nonlinear and does not involve kernel functions, so the model is unnecessary to select kernel functions and corresponding parameters, and the obtained regression function has better interpretability. Moreover, adding non-negative constraints with respect to regression coefficients to the model ensures that the obtained regression function conforms to the monotonic non-decreasing characteristics on non-negative interval. In fact, when exploring air quality examples, there is a prior information that air quality indicators will increase with the increase in all gas concentrations in the atmosphere. Fortunately, we have proven that the quadratic regression function obtained by NQSSVR is monotonically non-decreasing on the non-negative interval if and only if the non-negative constraints with respect to the regression coefficients hold true. The results of numerical experiments on the two artificial datasets, seven benchmark datasets and air quality datasets demonstrate that our method is feasible and effective.
In this paper, we impose a non-negative restriction on the regression coefficients based on prior information. In the subsequent optimization problems, we can add different restrictions to the values of the regression coefficients based on the prior information. For example, part of the regression coefficients are restricted to non-negative intervals and part of the regression coefficients are unrestricted.

Conflicts of Interest:
The authors declare no conflict of interest.