Product Design Time Forecasting by Kernel-Based Regression with Gaussian Distribution Weights

Abstract: There exist problems of small samples and heteroscedastic noise in design time forecasts. To solve them, a kernel-based regression with Gaussian distribution weights (GDW-KR) is proposed here. GDW-KR maintains a Gaussian distribution over weight vectors for the regression. It is applied to seek the least informative distribution from those that keep the target value within the confidence interval of the forecast value. GDW-KR inherits the benefits of Gaussian margin machines. By assuming a Gaussian distribution over weight vectors, it could simultaneously offer a point forecast and its confidence interval, thus providing more information about product design time. Our experiments with real examples verify the effectiveness and flexibility of GDW-KR.


Introduction
Product design is a complex and dynamic process, and its duration is affected by a number of factors, most of which are of fuzzy, random and uncertain characteristics.As product design tasks occur in different companies, uncertain characteristics may vary from product to product.The heteroscedasticity thus constitutes another important feature of product design.The mapping from the factors to design time is highly nonlinear, and it is impossible to describe this mapping relationship by definite mathematical models.The degree of reasonability of the supposed distribution of product design time is a key factor in product development control and decisions [1][2][3].
The triangular probability distribution was chosen by Cho and Eppinger [1] to represent design task durations, and a process modeling and analysis technique for managing complex design projects was proposed by using advanced simulation.However, if the assumed distribution of design activity durations does not reflect the true state, the proposed algorithm may fail to obtain ideal results.Yan and Wang [2] proposed a time-computing model with its corresponding design activities in concurrent product development process.Yang and Zhang [3] presented an evolution and sensitivity design-structure matrix to reflect overlapping and their impact on the degree of activity sensitivity and evolution in the process model, and the model can be used for better project planning and control by identifying overlapping and risk for process improvements, but with the two algorithms mentioned above, normal duration of each design activity should be determined before the algorithm is executed, and if activity durations are incompatible with the actual ones, the proposed algorithm may fail to function well.Apparently, the accuracy of predetermined design time is crucial to the planning and controlling of product development processes.
Traditionally, approximate design time is analyzed by means of qualitative approaches.With the rapid development of computer and regression techniques, new forecast methods keep emerging.Bashir and Thomson [4] came up with a modified Norden model to estimate project duration in conjunction with the effort-estimation model.Griffin [5] related the length of the product development cycle to project, process and team structure factors by a statistical method, and quantified the impact of project newness and complexity on the increasing length of development cycle, but with no proposal for design time forecasts.Jacome and Lapinskii [6] developed a model to forecast electronic product design efforts based on a structure and process decomposition approach.Only a small portion of the time factors, however, are taken into account by the model.Xu and Yan [7] proposed a design-time forecast model based on a fuzzy neural network, which exhibits good performance when the sample data are sufficient.However, only a small number of design cases are available to a company, which weakens the validity of the fuzzy neural network.Therefore, a novel approach should be adopted.
Recently, kernel methods have been identified as one of the leading means for pattern classification and function approximation, and successfully applied in various fields [8][9][10][11][12][13][14].Support vector machine (SVM), initially developed by Vapnik for pattern classification, is one of the most used models.With the introduction of the ε-insensitive loss function, SVM has been extended in use to solve nonlinear regression problems, and thus is also called support vector regression (SVR).ε-insensitive loss functions contribute to the sparseness property of SVR, but the value of ε, chosen a priori, is hard to determine.A new parameter v was then introduced and v-SVR proposed, whereby v controls the number of support vectors and training errors [11].v-SVR has overcome the difficulty of ε determination.In recent years, much research has been done on kernel methods.Kivinen et al. considered online learning in a reproducing kernel Hilbert space in [15].Liu et al. [16] proved that the kernel least-mean-square algorithm can be well posed in reproducing kernel Hilbert spaces without adding an extra regularization term to penalize solution norms as was suggested by [15].Chen et al. developed a quantized kernel least mean square algorithm based on a simple online vector quantization method in [17], and proposed the quantized kernel least squares regression in [18].Wu et al. [19] derived the kernel recursive maximum correntropy in kernel space and under the maximum correntropy.Furthermore, by combining fuzzy theory with v-SVR, Yan and Xu [20] proposed Fv-SVM to forecast the design time, which could be used to solve regression problems with uncertain input variables.However, both Fv-SVM and v-SVR assume that the noise level is uniform throughout the domain, or at least, its functional dependency is known beforehand [21].It is thus clear that the time forecast of product design based on Fv-SVM is deficient simply due to the heteroscedasticity of product design.For better planning and controlling of product development process, any good forecast method is expected to yield not only highly precise forecast values, but also valid forecast intervals.
In terms of Gaussian margin machines [22], the weight vector of binary classifier maintains a Gaussian distribution, and what should be struck for is the least information distribution that classifieds training samples with a high probability.Gaussian margin machines provide the probability that a sample belongs to a certain class.The idea given by Gaussian margin machines is extend to the regression for the forecast of product design time.Shang and Yan [23] proposed Gaussian margin regression (GMR) on the basis of combining Gaussian margin machines and kernel-based regression.However, GMR assumes that the forecast variances are same, which is inconsistent with the heteroscedasticity that exits in design time forecast.Like Fv-SVM, GMR also fails to provide valid forecast intervals.By combining Gaussian margin machine and extreme learning machine [24,25], a confidence-weighted extreme learning machine was proposed for regression problems of large samples [26].
The present study adopts the kernel-based regression with Gaussian distribution weights (GDW-KR) by combining Gaussian margin machines with the kernel-based regression, aiming to solve problems of small samples and heteroscedastic noise in design time forecasting, providing both forecast values and intervals.Inheriting the merits of Gaussian margin machines, GDW-KR maintains a Gaussian distribution over weight vectors, seeking the least information distribution that will make each target be included in its corresponding confidence interval.The optimization problem of GDW-KR is simplified, and an approximate solution of the simplified problem is obtained by using the results of regularized kernel-based regression.On the basis of this model, a forecast method for product design time and its relevant parameter-determining algorithm are then put forward.
The rest of this paper is organized as follows: Gaussian margin machines are introduced in Section 2. GDW-KR and the method for solving the optimization problem are described in Section 3. In Section 4, the application in injection mold design is presented, and GDW-KR is then compared with other models.An extended application of GDW-KR is also given.Section 5 draws the final conclusions.

Gaussian Margin Machines
Suppose the samples tpx i , y i qu l i"1 , where x i P R m is a column vector and y i P t´1, 1u is a scalar output.The weight vector w of a linear classifier is supposed to follow a multivariable normal distribution N m pµ 1 , Σ 1 q with mean µ 1 P R m and covariance matrix Σ 1 P R mˆm .For the sample x i , we get the normal distribution: The linear classifier is designed to properly classify each sample with a high probability, that is: where ρ P p0.5, 1s is the confidence value.By combining Equations ( 1) and ( 2), we get: GMM aims to seek the least informative distribution that classifies the training set with high probability, which is achieved by seeking a multivariable normal distribution N m pµ 1 , Σ 1 q with minimum Kullback-Leibler divergence with respect to an isotropic distribution N m p0, aI m q.The Kullback-Leibler divergence between N m pµ 1 , Σ 1 q and N m p0, aI m q is denoted by D KL `Nm pµ 1 , Σ 1 q||N m p0, aI m q ˘(the subscript KL is the abbreviation of Kullback-Leibler and D is the abbreviation of divergence), and is obtained by calculating: The optimization problem of GMM is described as: After omitting the constant terms in the objective function and transforming the constraints of Equation (5), we get: where Φ ´1pρq is the inverse cumulative distribution function of a standard normal distribution.Φ ´1pρq is further equal to ?2er f ´1p2ρ ´1q, where er f ´1 denotes the inverse Gauss error function.
Proof of Theorem 1. See Appendix A.

Optimization Problem of GDW-KR
A finite number of independent non-duplicate observations tpx i , t i qu l i"1 with x i P R m and t i P R are considered.A kernel-based regression model approximates the unknown regression function f pxq as follows: where kpx, x j q is a predefined kernel function, and w " pw 1 , . . ., w l q T .Definition 1. (kernel function) A kernel is a function k that for all x, z from a space χ (which needs not be a vector space) satisfies: kpx, zq "ă φpxq, φpzq ą, where φ is a mapping from the space χ to a Hilbert space F that is usually called the feature space φ : x P χ Þ Ñ φpxq P F [28].By assuming w " N l pµ, Σq with µ P R l and the positive definite covariance matrix Σ P R lˆl , we maintain a distribution over alternative weight vectors rather than committing to a single specific vector.Let y i denote the forecasted value by the model for a given observation x i , and we obtain: where K i is the ith row of the symmetric kernel matrix K, and K ij " kpx i , x j q, i " 1, . . ., l, j " 1, . . ., l.
Weight vectors are required to make the target value be included in the confidence interval of the forecast value.Thus, we have the following constraint conditions: The confidence interval needs to be large enough to impose a high confidence level.To make the level higher than 95%, η should be greater than 1.96 computed by Φ ´1p1 ´p1 ´0.95q{2q.Considering the independence of noise between samples, K i ΣK T j‰i is set to be 0. Since the row vector K i cannot be a zero vector, we have K i ΣK T i ą 0, where Σ is a positive definite matrix.Hence, the covariance matrix of KΣK T should be a positive definite diagonal matrix: Entropy 2016, 18, 231 5 of 17 which indicates that kernel matrix K must be invertible because rankpKΣK T q " l and rankpKΣK T q ď rankpKq ď l.
Under the constraint conditions ( 11) and ( 12), GDW-KR aims at the least informative distribution that has the smallest Kullback-Leibler divergence with respect to an isotropic Gaussian distribution N l p0, aI l q for some constant scalar a ą 0. Thus, the optimization problem of GDW-KR is expressed as:

Simplification of Optimization Problem
In the problem ( 13), the number of unknown parameters is l `lpl `1q{2, which can be lowered by handling properly its constraints.First of all, let us suppose: where Λ " diagpλ 2 1 , . . ., λ 2 l q, and λ i ą 0, i " 1, ¨¨¨, l.If the diagonal elements of Λ are treated as unknown parameters taking the place of Σ, the number of unknown parameters in the problem ( 13) is reduced to 2l.Then, the objective function of Equation ( 13) is rewritten as: As lnpdetpK ´1ΛK ´1qq " lnpdetpK ´1K ´1Λqq, we have: where P " K ´1K ´1.Since trpK ´1ΛK ´1q " trpK ´1K ´1Λq and both K ´1 and K are symmetric and invertible matrices, we obtain: where pPq ii ą 0. Disregarding the term ´1 2 lndetP in the objective function, problem ( 13) is rewritten as: Assuming λ i " λ where in i " 1, ¨¨¨, l, the problem of GMR is obtained as: Entropy 2016, 18, 231 6 of 17 Comparing the problems ( 18) and ( 19) reveals that GMR is a special case of GDW-KR.

Analysis of Optimization Problem
Proper generalization of GDW-KR can be guaranteed by Theorem 1 based on the two-sided PAC-Bayesian theorem.However, that of GDW-KR is realized here by analyzing Equation ( 18) based on the empirical Rademacher complexity [29].Definition 2. (empirical Rademacher complexity) Let G be a family of functions mapping from X to ra, bs and px 1 , . . ., x l q a fixed sample of size l with elements in x.Then, the empirical Rademacher complexity of G with respect to px 1 , . . ., x l q is defined as: where σ " pσ 1 , . . ., σ l q T with σ i s independent uniform random variables taking values in t´1, `1u [30].
Theorem 2. GDW-KR can be properly generalized, which is guaranteed by keeping the balance between the empirical Rademacher complexity and the fitting error.
Proof.The objective function of the problem ( 18) is rewritten as: Suppose the function set is as follows: where c is a positive real number.Let ŜpQ c q denote the empirical Rademacher complexity of Q c .Suppose another function set is defined as: where φ is the feature mapping corresponding to the kernel k.
For any hpxq in H c , letting β " and: Entropy 2016, 18, 231 Then, H c is a superset of Q c .Based on the derivation in [30], we obtain ŜpQ c q ď ŜpH c q and the following: Then, we have: In view of Equation ( 21), c can be minimized by minimizing µ T Kµ.Calculating by Cauchy-Schwarz inequality yields: Since the kernel function is predefined, 1 2 µ T µ in Equation ( 20) can reduce the empirical Rademacher complexity of Q c .
Under the constraints of the problem (18), the smaller λ i , the less the fitting error.The term: prevents λ i from getting too small or too large, and thus the model l ř j"1 µ j kpx ˚, x j q is free from overfitting and underfitting the training data.So the term can be taken as a special loss function.Thereby, it can be concluded that proper values of a and η guarantee the balance between the empirical Rademacher complexity and the fitting error.Thus, GDW-KR promises a desirable generalization performance.Then, we have Theorem 2. Theorem 2 shows that balancing the empirical Rademacher complexity and the fitting loss is consistent with the two-sided PAC-Bayesian theorem for GDW-KR.

Solution of Optimization Problem
The results of regularized kernel-based regression are used to obtain the approximate solution of the problem (18).Regularized kernel-based regression is described as: where C is the regularization parameter.Let µ be the solution to Equation (27).Using the KKT conditions, µ is analytically computed as: Then, assuming that µ is known as µ and ignoring the term 1 2a µ T µ in the objective function, then we rewrite Equation (18) as: where The second derivative of the objective function of the problem ( 29) is λ i ´2 `pPq ii {a that must be larger than 0 when λ i ą 0. Let λ i be the solution to Equation ( 29), which is determined by: Thus, the algorithm consists of the following steps: Step 1: Make independent non-duplicated observations tpx i , t i qu l i"1 .
Step 2: Select the kernel function, and choose the proper relevant parameter (s).
Step 3: Compute K ´1 and P.
Step 4: Solve the problem (27), and let µ be its solution.

Kernel Function and Model Selection
The kernel function plays an important role in kernel function methods.There are three common types of kernel functions: linear function, polynomial function and radial basis function (RBF).Many actual applications demonstrate that RBF tends to display its desirable performance under general smoothness assumptions.With no additional knowledge of the data set available, that makes the very reason for our adoption of the kernel function [31]: Hyper-parameters also bear heavily on the generalization performance of kernel function methods.Model selection is to seek proper values of hyper-parameters commonly by means of cross-validation and grid search [32].The k-fold cross-validation [12,13] partitions the training data into k disjoint subsets of approximately equal size.A series of k models are then trained, each using a different combination of k ´1 subsets.The model selection criterion, such as the mean squared error, is then evaluated for each model in each case, utilizing the subset of the data not used in training that model.Recently, evolutional algorithms, such as genetic algorithm and particle swarm optimization, have been adopted to guide the parameters selection process [33][34][35][36].Regularized kernel-based regression uses genetic algorithm to seek the proper values of σ and C.An individual in genetic algorithm represents a possible parameter combination.The fitness of each individual is calculated by the k-fold cross-validation.

Experiments
Experiments were performed to verify the effectiveness of the proposed GDW-KR.The models were built using MATLAB 7.7.The quadratic problems involved were solved through the optimization toolbox QP in MATLAB.The experiments were made on a computer with a Win7 32 bit OS running on 3.1-GHz Intel Core i5-3450 with 4 GB RAM.

Formulation of Product-Design Time Forecast
To validate the proposed method, the design of plastic injection molds is studied.An injection mold is a kind of single-piece-designed product and the design process is usually driven by customer orders.The design process of injection mold is involved in many product development projects.The design time forecast is meaningful for the optimization of the whole product development process.
Factor values of product-design time are obtained by fuzzy measurable house of quality (FM-HOQ) [7].Suppose that a design order for a kind of injection mold and the specification of the molding product are given to us.Then the customer demands should be analyzed and some useful mold characteristics should be extracted.The technical customer demands are taken into account.Some demands are originally described as quantitative information (e.g., the mold life is 3000 h), while others are expressed as qualitative information (e.g., the molding product precision is high).A unified fuzzy measurement scheme for all these demands is established, five linguistic levels are used [7].The importance degrees of these demands are also represented by fuzzy weight sets.
For the specific mold design, the designer should specify the grades of membership of demand weights and demand measures, whose assignments can be made based on the customer demands given on the design order, and on the designer's objective evaluation of the degrees of importance and scope of the demands.A survey-based methodology is applied for identifying engineering characteristics and time factors, which is performed through self-administered questionnaires from several mold companies in Nanjing.Then, nine kinds of engineering characteristics are selected: mold structure, cavity number, wainscot gauge variation, injection pressure, injection capacity, ejector type, runner shape, manufacturing precision and form feature number.Then we can construct a planning FM-HOQ to map and measure characteristics for technical demands.Among the time characteristics with large influencing weights are structure complexity (SC), model difficulty (MD), wainscot gauge variation (WGV), cavity number (CN), mold size (MS) and form feature number (FFN), the first three of which are expressed as linguistic variables and the last three as numerical ones.Here, the influencing weights that indicate the influence degree on product-design time are different from the indexes of importance in FM-HOQs. Figure 1 presents the application procedure of our model.as numerical ones.Here, the influencing weights that indicate the influence degree on product-design time are different from the indexes of importance in FM-HOQs. Figure 1 presents the application procedure of our model.

Product-Design Time Forecast Based on GDW-KR
In our experiments, 72 sets of molds with corresponding design time were obtained from a typical company.The detailed characteristic data and design time of these molds compose the corresponding patterns, as shown in Table 1.Numerical variables were normalized to be within [0, 1] by:

Product-Design Time Forecast Based on GDW-KR
In our experiments, 72 sets of molds with corresponding design time were obtained from a typical company.The detailed characteristic data and design time of these molds compose the corresponding patterns, as shown in Table 1.Numerical variables were normalized to be within [0, 1] by: where l denotes the number of samples, d the number of numerical variables, x d i the origin value of the dth number variable, and x d i the normalized value of the dth number variable.The linguistic variables, VL, L, M, H and VH, were transformed into the crisp values in terms of expertise: 0.1, 0.25, 0.5, 0.75 and 0.95.First of all, η should be determined, mainly based on the confidence level at which the forecast interval includes the target.To make the confidence level higher than 95%, η should be greater than 1.96 computed by Φ ´1p1 ´p1 ´0.95q{2q.The value of η is then set to 1.96, and the same is true of η ˚.
The target outputs were normalized to be within [0, 1].
The root mean square error (RMSE), the mean absolute percentage error (MAPE) and the mean absolute error (MAE) are three criteria used to optimize model parameters: where ti is the forecast value for x i .The underlying assumption for using the RMSE is that the errors are not biased and follow a normal distribution [37].The MAPE cannot be used if there is a zero value in tt 1 , ..., t l u, and puts a heavier penalty on negative errors (t i ă ti ) than on positive errors.The MAE is suitable to be used for uniformly distributed errors.Because model errors are likely to follow a normal distribution rather than a uniform distribution, the RMSE is a better criterion than the MAE [37].Thus, we apply the RMSE as a criterion for optimizing model parameters.
The whole data set is divided into several subsets.We choose one subset as the testing set and other ones as the training set.The combination of the genetic algorithm and 5-fold cross-validation is implemented to seek its optimal parameters to minimize the RMSE for the training set.In the genetic algorithm, each individual is evaluated by performing 5-fold cross-validation on the training set.After the optimal parameters are obtained, the model is estimated by using the training set.Then, we calculate the forecast values and three criteria for the testing set.This procedure is repeated until each subset has been used once as the testing set.The testing results of the experiments are averaged over disjoint testing sets which cover the entire dataset.The selection ranges of σ and C are [0.01,5] and [0.01, 10 6 ] respectively.The value of a was selected from [10 ´6, 10 6 ].
The whole data set is first divided into six disjoint subsets.When subset 6 is used as the testing set, the optimal combinational parameters of regularized kernel-based regression are selected as σ " 2.119 and C " 998746.999,and the optimal parameter of GDW-KR turns out to be a " 910.190.As illustrated by Figure 2, our GDW-KR gives the valid forecast intervals, excluding T1 and T10.In T10, the forecast interval fails to cover its corresponding target value.In T1, the interval range is too large to provide useful information.

parameters.
The whole data set is divided into several subsets.We choose one subset as the testing set and other ones as the training set.The combination of the genetic algorithm and 5-fold cross-validation is implemented to seek its optimal parameters to minimize the RMSE for the training set.In the genetic algorithm, each individual is evaluated by performing 5-fold cross-validation on the training set.After the optimal parameters are obtained, the model is estimated by using the training set.Then, we calculate the forecast values and three criteria for the testing set.This procedure is repeated until each subset has been used once as the testing set.The testing results of the experiments are averaged over disjoint testing sets which cover the entire dataset.The selection ranges of  and C are [0.01,5] and [0.01, 10 6 ] respectively.The value of a was selected from [10 −6 , 10 6 ].
The whole data set is first divided into six disjoint subsets.When subset 6 is used as the testing set, the optimal combinational parameters of regularized kernel-based regression are selected as   2.119 and  C 998746.999, and the optimal parameter of GDW-KR turns out to be  a 910.190.As illustrated by Figure 2, our GDW-KR gives the valid forecast intervals, excluding T1 and T10.In T10, the forecast interval fails to cover its corresponding target value.In T1, the interval range is too large to provide useful information.Actual forecast values are listed in Table 2 for comparison of the models.The RMSE, the MAPE, the MAE and the average testing time are introduced to compare the forecast performance of Actual forecast values are listed in Table 2 for comparison of the models.The RMSE, the MAPE, the MAE and the average testing time are introduced to compare the forecast performance of different models.Here, the testing time means the time that is spent on solving the optimization problem and on obtaining the testing results when the hyper-parameters are given.Table 3 shows the results from four forecast models, which indicate that GDW-KR promises as high precision as other models do, and that GDW-KR can generate the forecast intervals simultaneously, thus facilitating product development to a certain extent.The whole data set is then divided into 4 disjoint subsets.Figure 3 illustrates the results of GDW-KR from the first 54 training samples, and demonstrates that GDW-KR still performs well.Table 4 shows error statistics of four forecast models.GDW-KR does provide a satisfactory performance with small samples, and has thus been proved to be of better performance, appropriate to cases with small samples.

Extended Application of GDW-KR
Besides design time forecast, GDW-KR can also be extended to other regression problems with small samples.The Slump Test dataset, the Machine CPU dataset and the Yacht Hydrodynamics dataset, which are all from the UCI repository [38], are used to evaluate the extended application of GDW-KR.In these datasets, Fv-SVM behaves the same as v-SVR, as there is no fuzzy variable.Thus, the results of Fv-SVM are not presented here.Each dataset is divided into 6 disjoint subsets.In our experiments, both the target output and numerical attributes were normalized to be within [0, 1].
The Concrete Slump Test covers seven input and three output variables as well as 103 data points.The 28-day Compressive Strength is taken as the desired output variable.For the case of the Concrete Slump Test, the results of GDW-KR are compared with those of other two models.Concrete Slump Test results are shown in Figure 4.The three error indices of different three models are given in Table 5.On the Slump Test, GDW-KR offers forecast values with high accuracy and forecast intervals with good validity.

Extended Application of GDW-KR
Besides design time forecast, GDW-KR can also be extended to other regression problems with small samples.The Slump Test dataset, the Machine CPU dataset and the Yacht Hydrodynamics dataset, which are all from the UCI repository [38], are used to evaluate the extended application of GDW-KR.In these datasets, Fv-SVM behaves the same as v-SVR, as there is no fuzzy variable.Thus, the results of Fv-SVM are not presented here.Each dataset is divided into 6 disjoint subsets.In our experiments, both the target output and numerical attributes were normalized to be within [0, 1].
The Concrete Slump Test covers seven input and three output variables as well as 103 data points.The 28-day Compressive Strength is taken as the desired output variable.For the case of the Concrete Slump Test, the results of GDW-KR are compared with those of other two models.Concrete Slump Test results are shown in Figure 4.The three error indices of different three models are given in Table 5.
On the Slump Test, GDW-KR offers forecast values with high accuracy and forecast intervals with good validity.
GDW-KR.In these datasets, Fv-SVM behaves the same as v-SVR, as there is no fuzzy variable.Thus, the results of Fv-SVM are not presented here.Each dataset is divided into 6 disjoint subsets.In our experiments, both the target output and numerical attributes were normalized to be within [0, 1].
The Concrete Slump Test covers seven input and three output variables as well as 103 data points.The 28-day Compressive Strength is taken as the desired output variable.For the case of the Concrete Slump Test, the results of GDW-KR are compared with those of other two models.Concrete Slump Test results are shown in Figure 4.The three error indices of different three models are given in Table 5.On the Slump Test, GDW-KR offers forecast values with high accuracy and forecast intervals with good validity.For the Machine CPU dataset and the Yacht Hydrodynamics, the error statistics of three forecast models are presented in Tables 6 and 7, respectively.Figures 5 and 6 indicate the forecast results when using subset 6 as the testing set.For the Machine CPU dataset and the Yacht Hydrodynamics, the error statistics of three forecast models are presented in Tables 6 and 7, respectively.Figures 5 and 6 indicate the forecast results when using subset 6 as the testing set.

Conclusions
The control and decision of product development are based on the reasonable degree of the distribution of product design time.In design time forecasting, the problems of small samples and heteroscedastic noise ought to be considered.
This paper has presented a new model of kernel-based regression with Gaussian distribution weights for product-design time forecasts, which combines Gaussian margin machines with kernel-based regression.The kernel method performs well for the problem of small samples.Unlike GMR, which assumes that the covariance matrix of the forecast values in the training set is an identity matrix multiplied by a positive scalar, GDW-KR assumes that this matrix is a positive definite diagonal matrix.GDW-KR is more suitable for addressing the problem of heteroscedastic noise than GMR, and has the advantage of providing both point forecasts and confidence intervals simultaneously.
The plastic injection mold was studied before modeling.For convincing evaluation, experiments with 72 real samples were conducted.Results from them have verified that GDW-KR promises not only as high forecast accuracy as Fv-SVM and v-SVR but forecast intervals crucial to the control and decision of product development.Undoubtedly, GDW-KR benefits from the merits of Gaussian margin machines.

Figure 1 .
Figure 1.The application procedure of the GDW-KR model.

Figure 1 .
Figure 1.The application procedure of the GDW-KR model.

Figure 2 .
Figure 2. Testing results of GDW-KR when using subset 6 as the testing set.

Figure 2 .
Figure 2. Testing results of GDW-KR when using subset 6 as the testing set.

Figure 3 .
Figure 3. Testing results of GDW-KR from the first 54 training samples.

Figure 4 .
Figure 4. Concrete slump test results of GDW-KR when using subset 6 as the testing set.

Figure 4 .
Figure 4. Concrete slump test results of GDW-KR when using subset 6 as the testing set.

Figure 5 .
Figure 5. Machine CPU results of GDW-KR when using subset 6 as the testing set.Figure 5. Machine CPU results of GDW-KR when using subset 6 as the testing set.

Figure 5 .
Figure 5. Machine CPU results of GDW-KR when using subset 6 as the testing set.Figure 5. Machine CPU results of GDW-KR when using subset 6 as the testing set.

Figure 5 .
Figure 5. Machine CPU results of GDW-KR when using subset 6 as the testing set.

Figure 6 .
Figure 6.Yacht Hydrodynamics results of GDW-KR when using subset 6 as the testing set.

Figure 6 .
Figure 6.Yacht Hydrodynamics results of GDW-KR when using subset 6 as the testing set.

Table 1 .
Training and testing data of injection model design.

Table 2 .
Forecast results from four different models when using subset 6 as the testing set.

Table 3 .
Error statistics of four forecast models.

Table 4 .
Error statistics of four forecast models from 54 training samples.

Table 5 .
Error statistics of three forecast models on the Slump Test dataset.

Table 5 .
Error statistics of three forecast models on the Slump Test dataset.

Table 6 .
Error statistics of three forecast models on the Machine CPU.

Table 7 .
Error statistics of three forecast models on the Yacht Hydrodynamics.

Table 6 .
Error statistics of three forecast models on the Machine CPU.

Table 7 .
Error statistics of three forecast models on the Yacht Hydrodynamics.