Radial Basis Function Cascade Correlation Networks

A cascade correlation learning architecture has been devised for the first time for radial basis function processing units. The proposed algorithm was evaluated with two synthetic data sets and two chemical data sets by comparison with six other standard classifiers. The ability to detect a novel class and an imbalanced class were demonstrated with synthetic data. In the chemical data sets, the growth regions of Italian olive oils were identified by their fatty acid profiles; mass spectra of polychlorobiphenyl compounds were classified by chlorine number. The prediction results by bootstrap Latin partition indicate that the proposed neural network is useful for pattern recognition.


Introduction
Artificial neural networks (ANNs) are widely used pattern recognition tools in chemometrics.The most commonly used neural network for chemists is the back-propagation neural network (BNN).The BNN is a feed forward neural network, usually trained by error back-propagation [1,2].BNNs have been applied to a broad range of chemical applications.Recent analytical applications of BNNs in fields such as differential mobility spectrometry [3] and near infrared spectroscopy [4] have been reported in the literature.
BNNs have been proven a useful type of ANNs in chemometrics.However, BNNs converge slowly during training especially when the network contains many hidden neurons.This slow and chaotic

OPEN ACCESS
convergence is partially caused by the simultaneous adjustments of weights of all hidden neurons during the training of BNNs, which is referred to as the "moving target problem".To avoid this problem, a network architecture named cascade correlation network (CCN) was proposed by Fahlman and Lebiere [5].A CCN begins its training with a minimal network, which only has an input layer and an output layer.During training, the CCN determines its topology by adding and training one hidden neuron at a time, resulting in a multilayer structure.In this training strategy, the moving target problem is avoided because only weights of single hidden neuron in the network are allowed to change at any time.CCNs have been applied to the prediction of the protein secondary structure [6] and estimation of various ion concentrations in river water for water quality monitoring [7].
A temperature constrained cascade correlation network (TCCCN) [8], which combines the advantages of cascade correlation and computational temperature constraints was devised to provide reproducible models.By modifying the sigmoid transfer function, a temperature term is added to constrain the length of the weight vector in the hidden transfer function.The temperature is adjusted so that the magnitude of the first derivative of the covariance between the output and the residual error is maximized.As a result, fast training can be achieved because of the large weight gradient.TCCCNs have been successfully applied to many areas in analytical chemistry, such as identification of toxic industrial chemicals by their ion mobility spectra [9], classification of official and unofficial rhubarb samples based on their infrared reflectance spectrometry [10], and prediction of substructure and toxicity of pesticides from low-resolution mass spectra [11], etc.
Besides BNNs and CCNs, the radial basis function network (RBFN) is another important type of neural network.A RBFN is a three-layered feed forward network, which applies a radial basis function (RBF) as its hidden layer transfer function.The most commonly applied RBF is the Gaussian function.The determination of the number, centroids and radii of hidden units of RBFN can be achieved by different ways, such as random generation, clustering, and genetic algorithms.The RBFN can also be trained by back-propagation.Wan and Harrington developed a type of RBFN that is a self-configuring radial basis function network (SCRBFN) [12].In a SCRBFN, a linear averaging (LA) clustering algorithm is applied to determine the parameters of the hidden units.Class memberships of the training objects are used during clustering in the LA algorithm.
Recently, many novel supervised learning methods have gained increasing popularity, such as the support vector machine (SVM) and Random Forest (RF).The SVM was introduced by Vapnik [13].The SVM first maps the training data into high dimension feature space by using kernel functions.An optimal linear decision hyperplane is determined by maximizing the margin between the objects of two classes.The RF method was developed by Breiman [14].It is derived from the decision trees algorithm.During the RF training, many decision trees were trained by the ensemble learning techniques.The classification result is then calculated by voting from all the trees built.
A radial basis function cascade correlation network (RBFCCN) that combines the advantages of CCNs and RBFNs was devised in the present work.The RBFCCN benefits from the RBF as the hidden transfer function instead of the commonly used sigmoid logistic function.The RBFCCN also has a cascade-correlation structure.The network performance was tested using both synthetic and actual chemical data sets.The partial least squares-discriminant analysis (PLS-DA) was also tested as the standard reference method.The theory of the PLS-DA can be found in the literature [15,16].Comparisons were made with the BNN, RBFN, SCRBFN, PLS-DA, SVM, and RF method.Two synthetic data sets, which are detection of a novel class data set and imbalanced data set, and two chemical data sets, which are Italian olive oil data set and polychlorobiphenyl (PCB) data set were evaluated.The bootstrap Latin partitions (BLPs) [17] validation method was used in this study.

Theory
The network architectures of a RBFN and a RBFCCN are given in Figures 1 and 2, respectively.By applying the cascade correlation algorithm, the RBFCCN has a different network topology compared with conventional RBFNs.In RBFCCNs, the transfer function applied in the hidden neuron is the Gaussian function.Unlike a RBFN that usually has only one hidden layer, the RBFCCN has a multilayered structure.Each hidden layer contains only one neuron.In RBFCCNs, the kth hidden neuron is connected with k + l -1 inputs, where l denotes the number of input neurons.The output of the ith object from the kth hidden neuron o ik is given by: for which g k is the notation of the Gaussian function; x ik is the input vector; x ikp is the corresponding pth element of x ik .The µ kp term denotes the pth element of centroid µ k , and σ k denotes the kth radius.The o ik term will depend on two factors: the Euclidean distance between the sample and the centroid and the radius.In the cascade-correlation training architecture, the hidden units are added and trained sequentially during training.

Initialize the RBFCCN
The RBFCCN initialization is given in Figure 3. RBFCCN begins its training with a minimal network, which only has an input layer and an output layer.The number of input neurons l is equal to the number of variables of the data set.The number of output neurons n is equal to the number of classes in the training set.The neurons in the output layer are linear.
In this work, binary coding is used to determine the training target value.Each class has a corresponding binary sequence of unity or zero in which an element of unity indicates the identity of the object's class membership.For example, the output vector for objects belonging to the second class in a training set of four classes will be encoded (0, 1, 0, 0) as the training target value, i.e., the desired output vector of the trained network model is (0, 1, 0, 0).

Add and initialize a hidden neuron
Figures 4 and 5 demonstrate adding the first and second hidden neurons to the RBFCCN, respectively.Unlike the CCN that adds and trains a pool of candidate neurons, the RBFCCN adds and trains only one hidden neuron at a time because the initialization method applied in the RBFCCN is deterministic.
The trained neuron of the RBFCCN is unique.Once the kth hidden neuron is added to the RBFCCN, the centroid µ k is initialized with the mean vector of the target objects, and the initial radius σ k is given by the mean of the standard deviations of the target objects.The target objects of the kth hidden neuron are training objects from t k th training class.When k ≤ n, for which n denotes the number of training classes, t k = k.When k > n, t k is the class that contains the maximum total residual error among all training classes.According to the central limit theorem, it is assumed that all the objects from the same class tend to be normally distributed in the input space.The Gaussian function will represent a class of objects in the input space in this case.The initial hidden units represent clusters of the training data just as LA clustering does.This initialization method has advantages over the random initialization method in that the value is fixed so it will converge faster.Adding second hidden neurons into a RBFCCN.This network has three input neurons, two hidden neurons and two output neurons.The neurons and connections being trained are marked in red.µ 2 , σ 2 and W 2 are parameters to be trained.

Train the hidden neuron
The training strategy of the hidden neuron is adopted from that of the CCN.After initialization, the centroids µ k and radii σ k are trained by maximizing the covariance between the output and the target value of a hidden unit by appropriate optimization algorithms.The covariance C k from the kth hidden unit is given by: for which o ik is the output of the ith observation and the kth hidden neuron; y ik is the corresponding target value; m is the total number of training objects.Once a hidden neuron is trained, the centroid and radius of it will remain unchanged for the rest of the network training process.
Instead of using all training objects as target values, only objects in the t k th training class are selected as the target value for the training of the hidden neurons, for which t k is the target class membership used in initializing kth hidden neuron.As a result, the target value y ik of the ith object and the kth hidden neuron is given by: where c i is the class membership of the ith training object.

Train the weights in the output layer
The weights in the output layer are recalculated and stored after each hidden neuron is added and trained.As the case in TCCCNs, the input units do not connect to the output units directly.The predicted value k  ˆ of the network with kth hidden neurons is calculated by the product of the output matrix O k and the weight matrix W k , which is given by for which O k is the output matrix for the hidden neurons.The matrix is augmented with a column of unity, which allows a bias value to be calculated.Therefore, the output matrix O k has m rows and k + 1 columns, for which m denotes the total number of training objects.The weight matrix W k stores the weight vector of the output layer.The W k matrix has k + 1 rows and n columns, for which n denotes the number of classes of the training object.Singular value decomposition (SVD) is applied to determine the values of the weight vectors.The SVD of O k is given by: in which U k and V k are eigenvectors that respectively span the column and row spaces for the O j matrix, and S k is the singular value matrix.By using SVD, the pseudoinverse of O k can be computed with . According to Eq. 4, W k is given by: for which Y is the target value matrix of the whole training set.

Evaluate the stopping condition of RBFCCN
The RBFCCN can be trained until a given number of hidden units were added and trained, or a given error threshold is achieved.The relative root mean square error of calibration (RRMSEC) was used in this work.The RRMSEC is given by: for which m is the total number of training objects, n is the number of classes, y ij is the target value for the ith object and class j, ij y ˆ is the network model output for object i and class j, and j y is the average target value for class j.To have a relative metric, the standard error of calibration is corrected by the standard deviation.By applying the RRMSEC thresholds, the experimental results only depend on different network topologies.Different training algorithms such as QuickProp, Rprop, and Bayesian approach affect the convergence time and achieve equivalent classification accuracies for the training sets.shows the RRMSEC with respect to hidden unit number trained by the RBFCCN.The RRMSEC thresholds were determined by training a RBFCCN model using one training data set from the bootstrapped Latin partition until the RRMSEC is not significantly improved.Once the RRMSEC threshold is determined, it is applied to train all the other neural networks.Of course, this method is biased in favor of the RBFCCN but it is required so that all the other reference classifiers have the same performance.However, the primary goal of this research is to compare the prediction accuracies and the ability of the different classifiers to generalize when trained to similar target values of classification accuracies.Because the training methods of the diverse set of classifiers that are used for comparison are inherently different, it is important to address that the RRMSEC threshold is only applied to train the network models to the same classification accuracy for the training sets.

Identify the class membership
The class membership of an object is determined by its corresponding output vector from the network model using the following strategies.When all the outputs are below a given threshold, the object is labeled as unknown.The threshold is 0.5 in this study.Otherwise, the class is determined by the winner-take-all method, in which the unknown is classified by the index of the maximum element in the output vector.The SVM and RF have their own novel class evaluation procedure, which is not discussed in this paper.Therefore, the SVM and RF methods were excluded from the novel class evaluation.

Advantages of RBFCCN
The RBFCCN offers several advantages.The cascade-correlation architecture has the ability of incremental learning.The term incremental learning means that the network builds its topology during training by adding and training one hidden unit at a time.The first advantage is that the incremental learning ability avoids the moving target problem in the BNN and the network converges rapidly.Second, by training to a threshold of residual error, the cascade-correlation architecture does not require to determine the amount of hidden units in the network before training.Third, multiple networks can be obtained by training only once.These trained networks are networks with hidden units ranging from one to the total number of hidden units added to the cascade-correlation network.Fourth, by using RBF transfer functions, RBFCCNs are suitable for performing novel class evaluation, i.e., the ability to identify unknown data or outliers in a data set.

General information
All calculations were performed on an AMD Athlon XP 3000+ personal computer running Microsoft Windows XP SP3 operating system.The programs were in-house scripts written in MATLAB version 7.5, except for the analysis of variance (ANOVA), SVM, and RF.ANOVA was performed in Microsoft Excel version 12.0.The SVM calculations were performed by the LIBSVM software version 2.89 with MATLAB interface [18].The RF program was obtained from reference [19].The training of RBFCCN was implemented through fminbnd and fminunc functions by their default parameters from the optimization toolbox version 3.1.2of MATLAB.The fminunc function uses the Broyden-Fletcher-Goldfarb-Shanno (BFGS) quasi-Newton method with a cubic line search procedure.The fminbnd function is based on golden section search and parabolic interpolation algorithm.In the RBFCCN, RBFN, and SCRBFN, the weights of the output neurons were updated by the SVD algorithm.The SVD algorithm was implemented by the MATLAB function pinv.All ANNs and PLS-DA applied binary coding for determine the classes from the outputs.
Instead of training neural networks to achieve the minimum error of an external validation set, all the neural networks (the BNNs, SCRBFNs, RBFNs, and RBFCCNs) compared in this work were trained to a given RRMSEC in each data set.All the neural networks and PLS-DA applied the binary coding method to set the training target value, and the method to identify the class membership stated above.The BNNs used in this work consists of three layers: one input layer, one hidden layer and one output layer.The sigmoid neuron was used in the hidden layer, and the output layer was linear.The two-stage training method of RBFN was applied.The centroids and radii of RBFN were initialized by the K-means clustering, and optimized by back-propagation.The centroid of the kth hidden neuron µ k was initialized by the mean of the objects in the kth cluster, and the radius of the kth hidden neuron σ k was initialized by: for which µ q is the three nearest neighbors of µ k .The details of this method are described in reference [20].In the SCRBFNs, the parameter λ in the linear averaging clustering algorithm was adjusted gradually to achieve the RRMSEC.
For the RBFN model, the number of hidden neurons h equals to the number of training classes.For the BNN model, h is empirically proposed by: h = (9) for which l denotes the number of variables of the data set, n denotes the number of classes of the training object, and round denotes round to the closest integer.Because two synthetic data sets were relatively simple in data size that have less variables and classes, h was fixed without further evaluations.To demonstrate the numbers of hidden neurons was appropriate, independent tests were performed by evaluating BNNs on two chemical data sets with 0.5h and 2h hidden neurons so that the network performances can be observed by significantly decreasing or increasing the hidden layer size.Table 1 gives the average prediction accuracies of the BNN models of Italian olive oil and the training set of the PCB data set.For the Italian olive oil data set, the BNN models with 7 and 14 hidden neurons did not significantly differ with respect to prediction accuracy.The BNN models with four hidden neurons had too few hidden units to model the data sufficiently.For the PCB dataset, the effect of the three different numbers of hidden neurons on the prediction results was not significant.The BNNs with extra hidden neurons will not overfit the data if trained to the same RRMSEC.As a result, the heuristic equation of h was appropriate.To determine the learning rates and momenta of the BNNs and RBFNs, these networks were trained by three different sets of learning rates and momenta with BLPs.The number of bootstraps was 30 and the number of partitions was two.Table 2 gives the prediction results by the Italian olive oil data set and the training set of the PCB data set.The training parameters of the back-propagation networks did not significantly affect the comparison of the modeling methods.These sets of learning parameters were also trained by the two synthetic data sets and same results were obtained.For each data set, there was no statistical difference of the BNN and RBFN prediction results at a 95% confidence interval by two-way ANOVA with interaction.Therefore, the learning rates and momenta were fixed respectively at 0.001 and 0.5 for all further evaluations.The PLS-DA was implemented by the non-linear iterative partial least squares (NIPALS) algorithm.The number of latent variables was determined by minimizing the root mean squared prediction error in each test.As a result, the PLS-DA was a biased reference method.The numbers of latent variables in the PLS-DA models may vary between runs.
All the SVMs used the Gaussian RBF as their kernel functions.Two SVM parameters: the cost c and the RBF kernel parameter γ must be adjusted before each prediction.The grid search of parameter pairs (c, γ), in which c = 2 i , i = -2, -1, 0, …, 20; γ = 2 j , j = -10, -9, -8, …, 10, was performed to determine their value by achieving the best training accuracies.The defaults of the remaining parameters were used.Because the result of the RF algorithm is not sensitive to the parameter selected, 1,000 trees with the default setting of the number of variables to split on at each node is used in all evaluations.
The BLPs generates precision measures of the classification.Bootstrapping is a method that resamples the data.Latin partition is a modified cross-validation method, in which the class distributions are maintained at constant proportions among the entire data set and the randomized splits into training and prediction sets.After the data set was partitioned during each bootstrap, it was evaluated by all the modeling methods in the study.Because bootstrapping runs the evaluation repeatedly, the confidence interval of the prediction errors can also be obtained.The number of bootstraps was 30 and the number of partitions was two for evaluating all the data sets in this study.The results are reported as prediction accuracy, which is the percentage of correctly predicted objects.To determine the classification ability of RBFCCN, four data sets were tested, including the novel class data set, imbalanced data set, Italian olive oil data set, and the PCB data set.The numbers of variables, objects, and classes of data sets are given in Table 3.The modeling parameters of the ANNs, PLS-DA, SVM, and RF method are given in Table 4. Similar to the latent variables used in the PLS-DA models, the numbers of hidden neurons used to train SCRBFN and RBFCCN models may vary between different runs.Therefore, only typical latent variables and numbers of hidden neurons are reported.a This number is the number of variables after the modulo method of preprocessing.
b The PCB congeners that contain 0, 1, 9 and 10 chlorine atoms were considered as one class.

Detection of a novel class using a synthetic data set
This synthetic data set was designed to test the BNN, RBFN, and RBFCCN abilities to respond to a novel class during prediction.The training set comprised two variables and four classes.Each training class and the test objects had 100 objects.Each class was normally distributed with means of (0.0, 0.0), (40.0, 0.0), (0.0, 40.0), and (40.0, 40.0), respectively, with a standard deviation of 1.5.The test objects were distributed about a mean of (20.0, 20.0) with a standard deviation of 1.5.Both networks were trained repeatedly 30 times on this data set to obtain statistically reliable results.

Synthetic imbalanced data set
An imbalanced data set is a data set that the numbers of objects are not equal in each class.This data set was designed to compare the performances when the data set is highly imbalanced.This data set had two variables.The training set comprised three normally distributed classes.Two classes were majority classes, which have 300 objects respectively distributed with means of (3.0, 0.0), (-3.0, 0.0) and with standard deviations of unity.The other training class was the minority class that has only 10 objects distributed about a mean of (0.0, 0.0) with a standard deviation of 0.1.The test class had the same distribution with the minority training class.The ANNs were trained to the RRMSEC thresholds of 0.2.The network performances were evaluated by predicting the minority class in the training set.All modeling methods were reconstructed 30 times to obtain statistically reliable results.

Italian olive oil data set
Italian olive oil data were obtained from references [21,22].This data set is a well-studied standard reference data set.Different source regions of Italian olive oil were classified by the profile of eight different fatty acids.To minimize the effect of class imbalance and obtain fair comparison results, objects from smaller classes that have less than 50 objects were removed from the evaluation data.The number of classes was six.Each variable in the training sets was scaled between 0 and 1.The variables of the test sets in each Latin partition were scaled using the range acquired from the training set to obtain unbiased results.The training RRMSEC thresholds were 0.4.

PCB data set
In the PCB data set, PCB congeners with different numbers of chlorine atoms were classified by their electron ionization mass spectra.The data set was used previously [8,12].The mass spectra were obtained from reference [23].These spectra were split into the training set and the external validation set.The PCB congeners in the training set contained 2 to 8 chlorine atoms.Most of the PCB congeners have duplicate spectra with variable quality.Among these duplicate spectra, the one with the lowest record number was selected as training spectra, because it was the spectrum of highest quality.The PCB congeners in the external validation set contained 0 to 10 chlorine atoms.The external validation set was built from the remaining duplicate spectra, PCB congeners that have less than 10 objects, and 27 non-PCB compounds.The congeners that contain 0, 1, 9 and 10 chlorine atoms were uniquely different from any of the training classes.The external validation set contained 45 unique spectra.
Each spectrum was centered by its mean and normalized to unit vector length.The spectra were transformed to a unit mass-to-charge ratio scale that ranged from 50 to 550 Th and any peaks outside this range were excluded.Because the raw data were underdetermined, i.e., there were more variables than objects, the dimensions of PCB data set were further reduced by using the modulo method of preprocessing [24,25].This compression method is especially effective for mass spectral data.Based on the previous study [8] by the principal component analysis (PCA), the divisor value of 18 was chosen.The compressed spectra were centered about their mean and normalized to unit vector length.The training RRMSEC thresholds were 0.1.

Detection of a novel class using a synthetic data set
The bivariate plot of the synthetic data set is given in Figure 7.The response surface of the BNN is given in Figure 8.
RBFN and RBFCCN networks have similar response surfaces that are given in Figure 9.For each sampling point, the maximum of the output neurons is plotted.Because of the different shapes and properties of the sigmoid function and the Gaussian function, these networks have unique response surfaces.The BNN model gave an open, sigmoidal shaped response surface that divides the output space into regions that correspond to the four classes.When the BNN model extrapolates outside the region defined by the data objects, the response can be larger than unity, which occurs when the output units are linear.Alternatively, the RBFCCN and RBFN had a Gaussian shaped response surface that has a finite span of the output space, which is closed and compact.The maximum response of RBFCCN is unity.For each sampling point, the maximum of the output neurons is plotted.
The test set was designed to be uniquely different from the data in the training set.The ideal prediction results of these test objects should be no excitation from any of the output neurons, i.e., the outputs are (0, 0, 0, 0).Because in most cases one output element was larger than 0.5, the BNN models misclassified most of the test objects as one of the training classes.The RBFCCN and RBFN models correctly identified all test objects as unknown.Compared to the RBFN models, the prediction results of RBFCCN models were closer to the ideal solution, because the outputs from the RBFN models spread more widely than the RBFCCN models.

Synthetic imbalanced data set
The bivariate plot of the synthetic imbalanced data set is given in Figure 11.It can be observed that objects in two majority classes have larger spans than the minority classes in the input space.The predictions of small classes by different ANNs are given in Table 5.The prediction results of the SCRBFN and RBFCCN models are better than the prediction results of the BNN and PLS-DA models.The RBFCCN, SVM, and RF methods have better predictions among all seven methods.The RBFN models have slightly worse prediction result than the three methods above.The trained models of the ANNs will have a relatively loose fit to the training set by setting the training error threshold to 0.2.The BNN and PLS-DA models trend to first model the majority classes in the prediction class.As a result, predictions of minority classes are poor.

PCB data set
The principal component scores of the PCB data are given in Figure 13.The principal components and mean were calculated only from the training set.The training set was labeled with upper case letters.8.The average prediction accuracies of the SVM, RF, RBFCCN, and BNN models were better than the average prediction accuracy of the SCRBFN, RBFN, and PLS-DA model.After internal validation, the entire training set was trained and the external validation set was predicted repeatedly 30 times.The prediction accuracies of external validation set are given in Table 9.The prediction accuracy without unknown is the prediction accuracy calculated by the external validation set excluding the congeners that contain 0, 1, 9 and 10 chlorine atoms.The total prediction accuracy is the prediction accuracy calculated by the complete external validation set.Because the prediction set contained low quality spectra that make the data set more difficult to classify, the result is generally worse than BLP validation.The SVM, BNN, and RF method obtained better results than other methods.The RBFCCN models yielded average prediction accuracy of 81.7% without unknown, which was ranked fifth among all the seven methods.Both the RBFCCN and SCRBFN models correctly indentified most of the unknown objects.The BNN and RBFN models were capable of classifying the test objects, but they can hardly identify the unknown objects.This result is consistent with the result from the synthetic novel class data set.As a result, the BNN and PLS models yielded total prediction accuracies lower than 65%.Table 9.Average numbers of correctly predicted spectra with 95% confidence intervals of PCB external validation data set.All modeling methods were reconstructed 30 times.The prediction accuracy without unknown is the prediction accuracy calculated by the external validation set excluding the congeners that contain 0, 1, 9 and 10 chlorine atoms.The total prediction accuracy is the prediction accuracy calculated by the complete external validation set.

Conclusions
The proposed RBFCCN network combines the concepts of RBFN and CCN.During the training of RBF hidden units, a RBFCCN applies both the initialization technique similar to that of the SCRBFN and the optimization technique of CCNs.The cascade correlation algorithm furnishes the incremental learning ability of the RBFCCN.The incremental learning ability ensures the RBFCCN automatically builds its network topology during training.Before training RBFCCNs, no prior information about network topology is required.As a result, training RBFCCNs are more convenient than training BNNs.Another advantage of cascade-correlated structure is that it avoids the moving target problem and converges more rapidly than the BNNs.RBFCCNs, BNNs, RBFNs, SCRBFNs, PLS-DAs, SVMs and RFs were tested with four data sets.The test results were obtained with statistical measurements of confidence intervals.The SVM and RF methods proved their excellence over the neural network approaches on these classification problems.All three neural networks were generally yielded better performance than PLS-DA in prediction.Compared with the RBFN and SCRBFN models in four test data sets, the RBFCCN models generally yielded better prediction accuracies.The RBF transfer function applied in RBFCCNs makes RBFCCNs a reliable approach for novel class evaluation.RBFCCNs generally yielded better novel class evaluation ability compared with RBFNs, BNNs and PLS-DA by setting an output threshold 0.5.The RBFCCN is also capable of modeling imbalanced data set.The RBFCCN was statistically shown to be a robust and effective classification algorithm for chemometrics, especially in novel class evaluation and outlier detection.
Future work will involve in developing novel training methods to train the networks more rapidly.Investigations of different optimization algorithms such as the genetic algorithms and particle swarm optimizations to train RBFCCNs are necessary.In addition, it is important to compare RBFCCNs with other methods for outlier or novel class evaluation, such as one-class SVM in chemical data sets.

Figure 1 .
Figure 1.Network architecture of a RBFN.This network has three input neurons, two hidden neurons, and two output neurons.

Figure 2 .
Figure 2. Network architecture of a RBFCCN.This network has three input neurons, two hidden neurons, and two output neurons.

Figure 3 .
Figure 3. Network initialization of a RBFCCN.This network has three input neurons and two output neurons.

Figure 4 .
Figure 4. Adding first hidden neurons into a RBFCCN.This network has three input neurons, one hidden neuron and two output neurons.The neurons and connections being trained are marked in red.µ 1 , σ 1 and W 1 are parameters to be trained.

Figure 5 .
Figure 5. Adding second hidden neurons into a RBFCCN.This network has three input neurons, two hidden neurons and two output neurons.The neurons and connections being trained are marked in red.µ 2 , σ 2 and W 2 are parameters to be trained.

Figure 6 .
Figure 6.The RRMSEC with respect to hidden unit number trained by the RBFCCN using one training data set from the bootstrapped Latin partition.Magenta line with cross sign marker: novel class data set; black line with box marker: imbalanced data set; red line with plus sign marker: Italian olive oil data set; blue line with circle marker: PCB data set.

Figure 6
Figure6shows the RRMSEC with respect to hidden unit number trained by the RBFCCN.The RRMSEC thresholds were determined by training a RBFCCN model using one training data set from the bootstrapped Latin partition until the RRMSEC is not significantly improved.Once the RRMSEC threshold is determined, it is applied to train all the other neural networks.Of course, this method is biased in favor of the RBFCCN but it is required so that all the other reference classifiers have the same performance.However, the primary goal of this research is to compare the prediction accuracies

Figure 7 .
Figure 7. Two-variable plot of the synthetic novel class data set.A, B, C, and D denote the training sets, and E denotes the test set.The 95% confidence intervals were calculated around each training class.

Figure 8 .
Figure 8.The BNN response surface of the synthetic novel class data set.For each sampling point, the maximum of the output neurons is plotted.

Figure 9 .
Figure 9.The RBFN and RBFCCN response surface of the synthetic novel class data set.For each sampling point, the maximum of the output neurons is plotted.

Figure 10 .Figure 10
Figure 10.Average prediction outputs from the test set.BNN, RBFCCN and RBFN models were obtained by training each network 30 times.The 95% confidence intervals are indicated as the thin lines around the BNN outputs.Different colors represent excitations from different output neurons.

Figure 11 .
Figure 11.Two-variable plot of the synthetic imbalanced data set.A (red), B, and C denote the training classes.D (green) denotes test class.The 95% confidence intervals were calculated around each training class.

Figure 13 .
Figure 13.A principal component score plot for the PCB data set.The letters with upper case represents the training set.The underlined letters with lower case represents the external validation set.The external validation set was projected onto the first two principal components from the training set.Each axis is labeled with the percent total variance and the absolute eigenvalue from the training set.The 95% confidence intervals were calculated and given as an ellipse around each class from the training set.The PCB congeners are: (A) 2; (B) 3; (C) 4; (D) 5; (E) 6; (F) 7; (G) 8; (H) 9; (i) 10; (j) 1; (k) 0, the numbers denotes the number of chlorine atoms in the PCB congeners.

Table 1 .
Average prediction accuracies of the BNN models with 95% confidence intervals of Italian olive oil and the training set of the PCB data set.The BNN was trained by different number of hidden neurons with 30 BLPs.

Table 2 .
Average prediction accuracies of the BNN and RBFN models with 95% confidence intervals of Italian olive oil and the PCB data sets.The BNN and RBFN were trained by three different sets of learning rates and momenta with 30 BLPs.

Table 3 .
The numbers of variables, objects, and classes of the data sets evaluated.

Table 4 .
The modeling parameters of the ANNs, PLS-DA, SVM, and RF method.Hidden units are the number of hidden units in the trained network model.Latent variables are the number of latent variables used in the PLS-DA models.The RBF kernel parameter is denoted by γ in the SVM method.Mtry is the number of variables to split on at each node in the RF method.

Table 5 .
Average numbers of correctly predicted objects with 95% confidence intervals from class D in an imbalanced data set by different models.All modeling methods were reconstructed 30 times.

Table 6 .
Average numbers of correctly predicted objects with 95% confidence intervals of Italian olive oil data set by different modeling methods with 30 BLPs.

Table 7 .
ANOVA table of the Italian olive oil data set by different source regions and modeling methods.F crit is the critical value.

Table 8 .
Average numbers of correctly predicted spectra with 95% confidence intervals of PCB data set by different modeling methods with 30 BLPs.