An Improvement of GM ( 1 , N ) Model Based on Support Vector Machine Regression with Nonlinear Cross E ff ects

This paper presents GM (1, N) models with linear cross effect and nonlinear cross effect, and discusses the difference of driving factors between these two types of models to solve the cross effects of GM (1, N) model. The model with a linear cross effect in this paper preserves the solution of whitenization in the GM (1, 1) model. While the model with nonlinear cross effect integrates the sequences of systemic features, driving factors, and the cross effect of these driving factors. While applying support vector machine (SVM) regression, it transfers the nonlinear relationship among these sequences to a linear relationship. To test the GM (1, N) model that is based on support vector machine (SVM) with nonlinear effect, the study applies it to forecast the total output of the pharmaceutical industry. The range of the data is selected from 2005–2017, which the data from 2005–2013 are used to fit into the model. The GM (1, N) model based on SVM with nonlinear cross effect achieves 0.6566 and 0.2956 in its fitted total of relative error and the forecast total of relative error, respectively. The new model presents a more accurate analysis on fitting and forecast precision than the classic GM (1, N) model and GM (1, N) with the linear cross effect model.


Introduction
Grey system theory is an interdisciplinary scientific area that Deng first introduced in the early 1980s [1,2].Since then, the theory has become quite popular with its ability to deal with the systems that have partially unknown parameters [3].Presenting superiority to conventional statistical models, grey models only require a limited amount of data to estimate the behaviour of unknown systems [4].As an important part of grey system theory, grey forecasting modeling method has been widely used in recent years because of its advantages in small sample data modeling, simplicity, and easy calculation.In theoretical and methodological research, scholars at home and abroad have improved the single variable grey forecasting model that is represented by GM (1,1) model.The problems of expansion and optimization have been studied and abundant achievements have been achieved.Deng proposed GM (1, N) model on the basis of GM (1,1) model, and applied it to analyze the coordination of economic, social, and scientific and technological systems in order to analyze the influence of a set of related factors on the system characteristic sequence and to predict the change trend of the system characteristic sequence [5].Although the GM (1, N) model has dynamic characteristics that are similar to the GM (1, 1) model, it is easy to cause large errors in prediction due to the missing of a practical precise solution in the grey differential equation [6].When comparing with the GM (1, 1) model, the GM (1, N) model has new features as: It implies the driving effects among each influencing factor.
Regarding the feature of GM (1, N) model above, a question has been raised as to how the driving effects of influencing factors can be quantified, Xiao and Deng (2001) modified GM (1, N) model with applying linear regression on definition formulas [7].In this basis, a study by Mao and Chirwa (2006) attempted to assume ∑   , which is the driving factor as a grey constant.A solution is conducted by applying the albino differential equation [8].However, the solution that is based on albino differential equation is pointed as being complex in GM (1, N) (Hsu, 2009;Wu et al., 2013) [9,10].Tien (2009) applied the Simpson integration rule to simplify the albino differential equation in GM (1, N) model [11].
Apart from the problem of quantified driving effects, some studies also addressed the time-lag problem in the current GM (1, N) model [12,13] (Jones et al., 2004;Wu and Chen, 2005).The study that was conducted by Hao et al. (2011) developed a new grey GM (1, N) model with the features of multi-variable and hysteresis [14].This model is able to find the optimal lagging parameter via applying particle swarm optimization (PSO).PSO benefits a more accurate prediction in GM (1, N) model, but Han et al. (2013) improved the application of PSO in GM (1, N) model, and created another time-lagging model [15].This model is able to determine the order of GM (1, N) while applying grey correlation and an average absolute relative error.More studies continue to improve GM (1, N).Wu et al. (2013) applied fractional order and generated operators into GM (1, N), and improved the model to GM (1, N, T), which has a better accuracy in prediction [16].While Wu et al. (2015) addressed a certain situation where the existing accumulative sequence of correlated variables is too large, the influencing factors might not be able to be transferred as grey constant [17].To improve this weakness, (Yuan et al., 2016) attempted to improve initial value and background value in GM (1, N) model, in order to reduce the impact of the correlated variable on prediction [18].
Current studies are keen to improve the feasibility of GM (1, N) model in reality.Feng et al. (2014) mentioned that the coefficient of the albinism differential equation should be dynamic, thus their study developed the MGM (1, N) model [19].It enables dynamically considering the coefficient of accumulative sequence.Ma and Liu (2015) analysed the discrete GM (1, N) model, and applied a convolution integral method to improve the model [20].The improved model is able to solve the GM (1, N) model with multiple variables.To be more realistic, Kayacan et al. (2010) considered the relationship among accumulative sequences as non-linearization [21].However, it has not discussed the cross effect among the driving factors.Wang et al. (2011) also improved the linear relationship in accumulative sequence to non-linear relationship [22].When comparing with the study by Kayacan et al. (2010), this study considered the cross effect among the driving factors, but it still has a strong constraint that assumes the relationship among the accumulative sequence is linearization.
In reality, accumulative sequences in systems, driving factors, and cross effects may be linearization or non-linearization.It is more common to present a non-linear relationship [23][24][25][26][27][28].Therefore, current GM (1, N) models are hardly difficult to address this.Our study aims to apply support vector machine (SVM) regression to discuss the non-linear cross effect in GM (1, N) model.The study is structured, as follows: Section 2 presents the GM (1, N) model with cross effect.Section 3 improved GM (1, N) model to the new GM (1, N) model with the non-linear cross effect.Section 4 applies a numerical example to test the new model.The final chapter concludes the entire study.

Classic GM (1, N) Model
is the sequence of the data in a system, and the correlation factor sequences are as follows: and "first accumulated sequence" means Equation ( 2) is the classic GM (1, N) model.

Definition 2. Assume that the cross effect of the system feature sequence in driving factors, exists and the cross effect of driving factors is the correlation coefficients of the regression, thus, the GM (1, N) model with cross effect can be presented as follows:
In Equation (3), , a is the evolution coefficient of the system feature sequence, and is the sum of individual influential effect of each driving factor.
is the cross effect of driving factors that have relative relationship; where  and  refer to two different influence factors with cross-effect, and  is a set of all influence factors in the GM (1, N) model.

) ( rs b f
is the cross-effect regression of relative coefficients, which , each driving factor is uncorrelated.This is defined as the GM (1, N) model.While

) ( rs b f
has two situations, it may be a linear regression, but it may be the nonlinear regression in most of the scenarios.Therefore, we firstly consider ) ( rs b f as a linear regression, and secondly consider it as a non-linearization.

Lemma 1. The solution of GM (1, N) model with linear cross effect can be presented, as follows.
(1) (1) are grey constant.By following the solution in GM (1, 1) model, we can achieve a similar time response of GM (1, N) model with a linear cross effect, which is: Generally, the change of accumulate sequence in is random and enormous.This leads the above solution to be unrealistic.Therefore, we develop another solution in the following chapter.

GM (1, N) Model with Nonlinear Cross Effect
As the assumption in the previous chapter, where we assumed linearization is the relationship among

= , and
. This assumption is unrealistic, as, in reality, non-linearization is more common.Therefore, we attempt to improve the linearization to non-linearization in the relationship among the accumulate sequence of system feature, accumulate sequence of driving factors, and accumulate cross effect of driving factors.Subsequently, we apply SVM regression to determine the nonlinear mapping relations among . SVM is a type of supervised machining learning models with the associated certain algorithms through regression analysis (Sebald and Bucklew, 2001).It applies VC dimension theory and structural risk minimization.These two theories enable SVM to better solve problems with a small sample size, non-linearization, high dimension, and local minimum (Cortes and Vapnik, 1995).Therefore, applying SVM can solve the new GM (1, N) model with a nonlinear cross effect, where it transfers the nonlinear relationship of driving factors to linearization.To address this point, we assume that a sample is { } .These criteria can guarantee nonlinear problems that are to be transferred as an optimal function that aimed to find the maximum boundary of SVM, and it can be presented as the following equation.
In Equation (7), is penalty parameter that is used to control the complexity of models.Through solving the above equation, we can achieve a discriminant function, which is: To develop a solution, our study applies radial basis function and also assumes some input variables include the accumulation of = in driving factors, and in driving factors with the cross effect.Additionally, the output variable is determined as

Empirical Analysis
To test the feasibility of the GM (1, N) model with a non-linear cross effect in prediction, our study applies it to forecast the total output of the pharmaceutical industry.In detail, the influencing factors that we select include investment in industrial capital, total product sales revenue, total profit from product sales, total health costs, gross domestic product, total factor productivity, average disposable income of urban residents, and consumer price index.The data are drawn from Chinese Industrial Yearbook, Industrial Statistics Yearbook, Chinese Medical Yearbook, and China Statistical Yearbook.The range of the data is selected from 2005-2017, where the data from 2005-2013 are used to fit into models, and the data from 2014-2017 are used to develop the prediction in the fitted model. 1 X denotes the number of total output of the pharmaceutical industry in this numerical example. 2 X is total investment in the pharmaceutical industry.3 X is total product sales revenue.

4
X is total profit from product sales.5 X is total health costs.6 X is gross domestic product.7 X is average disposable income of residents.8 X is consumer price index.The results of the models fitting and prediction are presented in the following tables.
In Table 1, the values of  and  in 2009 are missing in the original dataset.Therefore, the imputation of the missing data is operated to guarantee that the data sample is still representative of the population.The application of the fitting function by using non-linear regression in this study allows for giving estimated values to  and  , which are 547,000 and 45,333, respectively.Table 1 presents the dimensions of original data from 2005-2017.It is distinct in that the dimensions are inconsistent.The original data sequence needs to conduct initial value processing in order to avoid the drift of multivariate grey modelling data Matrix.After the prediction sequence is obtained, the initial inverse transformation is applied, and the dimension and magnitude of the characteristic sequence of the system are restored.Classic GM (1, N) model, GM (1, N) with a linear cross effect and GM (1, N) that are based on SVM with non-linear cross effect are established based on the data series after initial value processing from 2005-2013.Tables 2 and 3 show the simulation results and prediction results obtained after the initial value inverse transformation.Based on Tables 2 and 3, it can be seen that the GM (1, N) model based on SVM with non-liner cross effect has better performance in fitting precision and predicted accuracy.Specifically, the fitted total of relative error that was developed by classic GM (1, N) model is 0.7801, the forecast total of relative error is 0.5018, and GM (1, N) with linear cross effect model has more accurate values on these two aspects, which are 0.6566 and 0.2956, respectively.The reason of the better accuracy in GM (1, N) with a linear cross effect is that it considers the linear cross effect among the three influencing factors, including the crowding-out effect among the factors.In this basis, our improved GM (1, N) model is able to consider the integrated driving effect includes non-linear cross effect including crowding-out effects.Therefore, it develops a better fitting precision and predicted accuracy than the previous two models.

Conclusions
Our study develops a new GM (1, N) model that discusses the difference of linearization and non-linearization in the cross effect.For the GM (1, N) model with linear cross effect, the solution can be developed by applying albino differential equation in traditional GM (1, 1) model.While for the GM (1, N) model with nonlinear cross effect, which presented in our study, it can firstly use SVM regression to transfer non-linearization to linearization, and then develop a solution.Both of the methods have followed the principle of accumulation in GM (1, 1) model, but our new GM (1, N) model addresses the non-linear relationship between the sequence in system feature, accumulate sequence in driving factors and cross effect, and find a solution by using SVM regression.Through the test on fitting and prediction by adopting the data regarding medical service in China, we find that the GM (1, N) model based on SVM regression with a nonlinear cross effect has better performance on these two indicators than the Classic GM (1, N) model and GM (1, N) with a linear cross effect model.
r is the relative coefficient of the driving factors r and s, and rs b and c are the undetermined parameters.least square estimation of the parameter list of ~ is line non-singular matrix, thus, the non-singular matrix of B ~is time ' k ', thus the following step aims to solve  non-linearization, to solve this problem, it can create a nonlinear mapping ) (x ϕ to map the sample from the input space to the high-dimensional feature space H , and then develop linear SVM regression in feature space H .The sample should satisfy the criteria in order to operate this procedure, where

Table 1 .
Original data of the choice variables from 2005 to 2017 in China.

Table 2 .
Comparison of fitted values and relative errors in three models.

Table 3 .
Comparison of the prediction and relative errors in three models.