Enhancing Electronic Nose Performance Based on a Novel QPSO-KELM Model

A novel multi-class classification method for bacteria detection termed quantum-behaved particle swarm optimization-based kernel extreme learning machine (QPSO-KELM) based on an electronic nose (E-nose) technology is proposed in this paper. Time and frequency domain features are extracted from E-nose signals used for detecting four different classes of wounds (uninfected and infected with Staphylococcu aureus, Escherichia coli and Pseudomonas aeruginosa) in this experiment. In addition, KELM is compared with five existing classification methods: Linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), extreme learning machine (ELM), k-nearest neighbor (KNN) and support vector machine (SVM). Meanwhile, three traditional optimization methods including particle swarm optimization algorithm (PSO), genetic algorithm (GA) and grid search algorithm (GS) and four kernel functions (Gaussian kernel, linear kernel, polynomial kernel and wavelet kernel) for KELM are discussed in this experiment. Finally, the QPSO-KELM model is also used to deal with another two experimental E-nose datasets in the previous experiments. The experimental results demonstrate the superiority of QPSO-KELM in various E-nose applications.


Introduction
An electronic nose (E-nose), combined with artificial intelligence algorithms, is designed for mimicking the mammalian olfactory system to recognize gases and odors. The gas sensor array in an E-nose comprises several non-specific sensors and will generate characteristic patterns when exposed to odorant materials. Patterns of known odorants can be used to construct a database and train a pattern recognition model through quite a few pattern recognition algorithms. In this way, something unknown which can be discriminated by its odor is classified well [1][2][3]. During the past decades, much work has been done to investigate the E-nose technology which has been widely used in a multitude of fields, such as food quality control [4][5][6][7], disease diagnosis [8][9][10][11], environment quality assessment [12,13] and agriculture [14][15][16].
Previous work has proved the effectiveness of detecting bacteria by investigating volatile organic compounds (VOCs) emitted from cultures and swabs taken from patients with infected wounds [17][18][19]. In the pattern recognition, firstly, training data are employed to train the classifier. Then, the performance of this classifier is assessed by using the remaining independent testing samples. The final accuracy can be computed by comparing predicted classes with their true classes. So far, various kinds of classification models have been explored in E-nose applications, which can generally be divided into two categories. One is the linear classifier, such as k-nearest neighbor (KNN) [20][21][22], linear discriminant analysis (LDA) [23,24], partial least squares regression (PLSR) [25]

Materials and Experiments
The datasets used in the paper were obtained by a home-made E-nose, which details can be found in our previous publication [42]. However, to make the paper self-contained, the system structure and experimental setup are briefly repeated here.

E-Nose System
The sensor array in the research is constructed due to the high sensitivity and quick response of the sensors to the metabolites of three different bacteria. The E-nose system consists of 15 sensors: Fourteen metal oxide gas sensors (TGS800, TGS813, TGS816, TGS822, TGS825, TGS826, TGS2600,  TGS2602, TGS2620, WSP2111, MQ135, MQ138, QS-01 and SP3S-AQ2) and one electrochemical sensor (AQ sensor). A 14-bit data acquisition system (DAS) is used as interface between the sensor array and a computer. The DAS converts analog signals from sensor array into digital signals which are stored in the computer for further processing. Figure 1 shows the schematic diagram of the experimental system. It can be observed that the E-nose system is composed of an E-nose chamber, a data acquisition system (DAS), a pump, a rotor flowmeter, a triple valve, a filter, a glass wild-mouth bottle and a computer. The filter is used to purify the air. The pump is used to convey the VOCs and clean air over the sensor array. The rotor flowmeter is used to control the flow rate during the experiments. The three-way valve is used for switch between VOCs and clean air. The experimental setup has also been mentioned in [33]. The experimental procedure in this paper can be summarized as follows. three-way valve switched on Port 1 and the clean air purified by the filter flowed through the sensor chamber for 3 min. In the response stage, the three-way valve switched on Port 2 and the gases containing the VOCs of the wound flowed through the sensor chamber for 5 min. In the recovery stage, the three-way valve switched on Port 1 again and the clean air flowed through the sensor chamber for 15 min. During the three stages of one test, the DAS always sampled the data and stored them in the computer. After one test and before the next one, for eliminating the influence of the residual odors, the sensor chamber was purged by the clean air for 5 min and in the purging process the DAS did not sample the data.   Each mouse was put in a big glass bottle with a rubber stopper. Two holes were made in the rubber stopper with two thin glass tubes inserted. One longer glass tube was used as an exit pipe and hung above the wound as close as possible while the shorter one was used as an intake-tube, inserted into the glass a little and was close to the bottleneck. The gases which contained the VOCs of the wound on the mouse outflowed along the longer glass tube and flowed into the sensor chamber. The air flowed into the glass along the shorter glass tube. Each test process comprises three stages: the baseline stage, the response stage and the recovery stage. In the baseline stage, the three-way valve switched on Port 1 and the clean air purified by the filter flowed through the sensor chamber for 3 min. In the response stage, the three-way valve switched on Port 2 and the gases containing the VOCs of the wound flowed through the sensor chamber for 5 min. In the recovery stage, the three-way valve switched on Port 1 again and the clean air flowed through the sensor chamber for 15 min. During the three stages of one test, the DAS always sampled the data and stored them in the computer. After one test and before the next one, for eliminating the influence of the residual odors, the sensor chamber was purged by the clean air for 5 min and in the purging process the DAS did not sample the data.

Experimental Setup
Four groups of mice were tested in the research, including one control group and three groups infected by Staphylococcu aureus, Escherichia coli and Pseudomonas aeruginosa, respectively. Twenty tests for each groups of mice in the same conditions were made, and finally 80 samples for all four groups of mice were collected from the above procedures. Figure 2 illustrates the sensor responses process when they are exposed to four different target wounds, where X-axis is the response time of the sensors and Y-axis is the output voltage of the sensors. switch between VOCs and clean air. The experimental setup has also been mentioned in [33]. The experimental procedure in this paper can be summarized as follows.
Each mouse was put in a big glass bottle with a rubber stopper. Two holes were made in the rubber stopper with two thin glass tubes inserted. One longer glass tube was used as an exit pipe and hung above the wound as close as possible while the shorter one was used as an intake-tube, inserted into the glass a little and was close to the bottleneck. The gases which contained the VOCs of the wound on the mouse outflowed along the longer glass tube and flowed into the sensor chamber. The air flowed into the glass along the shorter glass tube. Each test process comprises three stages: the baseline stage, the response stage and the recovery stage. In the baseline stage, the three-way valve switched on Port 1 and the clean air purified by the filter flowed through the sensor chamber for 3 min. In the response stage, the three-way valve switched on Port 2 and the gases containing the VOCs of the wound flowed through the sensor chamber for 5 min. In the recovery stage, the three-way valve switched on Port 1 again and the clean air flowed through the sensor chamber for 15 min. During the three stages of one test, the DAS always sampled the data and stored them in the computer. After one test and before the next one, for eliminating the influence of the residual odors, the sensor chamber was purged by the clean air for 5 min and in the purging process the DAS did not sample the data.

KELM
ELM [36,41,[43][44][45] is designed as a single hidden layer feed forward network and has been proved that its learning speed is extremely fast. It provides efficient unified solutions to generalized SLFNs, whose hidden nodes can be any piecewise nonlinear function. KELM generalizes ELM from explicit activation to implicit mapping function and produce better generalization in most applications. A brief introduction of KELM is as follows: . ,x in ] T P R n denotes one sample point in the n-dimensional space and t i = [t i1 ,t i2 , . . . ,t in ] T P R m is the sample class label. The SLFNs and activation function are defined as: where x i is the i-th sample, L is the number of hidden nodes, w j and β j denote the input weights to the hidden layer and the output weight linking the j-th hidden node to the output layer respectively. Meanwhile, b j is bias of the j-th hidden node and o i is the output vector of the input sample x i .
Then, this SLFN can approximate those N samples with zero error, which means that: where t i is the sample class label vector of the input sample x i . That is to say, there exist β j , w j and b j such that: This can be written as: Then, Equation (4) can be also written as matrix form: where H " ‚is hidden layer output matrix.
Then, training such an SLFN is equivalent to finding a least-square solution as follows: where H + is the Moore-Penrose generalized inverse of the hidden layer output matrix H.
Huang et al. suggested adding a positive value 1/C (C is regularization coefficient) to calculate the output weights as follows according to the ridge regression theory: Sensors 2016, 16, 520

of 15
The output function for the SLFN is: where h(x i ) is the output of the hidden nodes and actually maps the data from input space to the hidden layer feature space H. Thus, substitute Equation (7) into Equation (8), the output function can be defined as follows: We define a kernel function k as: and then a KELM can be constructed using the kernel function exclusively, without having to consider the mapping explicitly. We express this kernel function by Equation (11) for given classes p and q: Let K be a NˆN matrix and K = (K pq ) p = 1,2, . . . ,S, q = 1,2, . . . ,S where K pq , is a matrix composed of inner the product in the feature space: K " pK pq q p " 1, 2, . . . , S q " 1, 2, . . . , S , K pq " pk lk q l " 1, 2, . . . , N p k " 1, 2, . . . , N q (12) where S is the number of the total classes, N p and N q are the number of the samples in p-th and q-th classes respectively, K pq is a (N pˆNq ) matrix and K is a symmetrical matrix such that K T pq " K pq . We can define the kernel matrix K = HH T from Equation (10) and the output function of KELM can be written as: Some common kernel functions including linear kernel function, polynomial kernel function, Gaussian kernel function, wavelet kernel function are applied. Kernel parameters of the kernel functions, together with regularization coefficient C in Equation (13) will be optimized by QPSO. In this way, the index of the output node with the highest output value is considered as the label of the input data [44].

QPSO-KELM Model
It is well known that the parameters in algorithms will affect the performances. Therefore, QPSO [46] is used to optimize the value of C in Equation (10) and parameters of the kernel function. The dimension of searching space is corresponding to the number of parameters of KELM with different kernel functions, and the position of each particle represents the parameter values of kernel functions. Because the best generalization performance of KELM can be optimized by QPSO, the testing accuracy can be used as the fitness function of QPSO. The specific steps of QPSO-KELM are described as follows.
Step 1: Normalize all the dataset extracted from the E-nose signals into the range [0,1] and the number of iterations and the population size are set as 30 and 400.
Step 2: Initialize the position and local optimal position of each candidate particle, as well as global best position of the swarm.
Step 3: Calculate each particle's fitness value according to the fitness function. Update the local optimal positions and global best position.
Step 4: Update the position of each candidate particle in each iteration, which can be calculated by Equation (13).
Step 5: Check the termination criterion. If the maximum number of iterations is not yet reached, return to Step 3 or else go to the Step 6.
Step 6: The best combination of parameters of the kernel function can be acquired, which result in the maximal fitness value.
The flowchart of this procedure is illustrated in Figure 3. functions. Because the best generalization performance of KELM can be optimized by QPSO, the testing accuracy can be used as the fitness function of QPSO. The specific steps of QPSO-KELM are described as follows.
Step 1: Normalize all the dataset extracted from the E-nose signals into the range [0,1] and the number of iterations and the population size are set as 30 and 400.
Step 2: Initialize the position and local optimal position of each candidate particle, as well as global best position of the swarm.
Step 3: Calculate each particle's fitness value according to the fitness function. Update the local optimal positions and global best position.
Step 4: Update the position of each candidate particle in each iteration, which can be calculated by Equation (13).
Step 5: Check the termination criterion. If the maximum number of iterations is not yet reached, return to Step 3 or else go to the Step 6.
Step 6: The best combination of parameters of the kernel function can be acquired, which result in the maximal fitness value.
The flowchart of this procedure is illustrated in Figure 3.

Results and Discussion
Different features which are able to effectively represent the response of sensors are extracted from the time domain and frequency domain in order to evaluate the effectiveness of the proposed model. The peak value, the integral in the response stage, coefficients of Fourier coefficients (the DC component and first order harmonic component), and approximation coefficients of db1 wavelet of sensor response curve are chosen to be on behalf of the characteristics of E-nose signals from two transform domains [47][48][49][50]. Then, leave-one-out cross validation (LOO-CV) method is employed to evaluate the performances of different methods in this experiment for making full use of the data set. Another five classification models, namely ELM, SVM, KNN, LDA and quadratic discriminant analysis (QDA), are applied for comparison with KELM. ELM is an algorithm for single-hidden layer feed forward networks training that leads to fast networking requiring low human supervision. The main idea in ELM is that the network hidden layer parameters need not to be learned, but can be randomly assigned. The only parameter is the number of hidden nodes in the hidden layer of SLFN, which is normally obtained by a trial and error method. Thus, the input weights are within (−1, 1) and the hidden layer biases are within (0, 1). 100 experiments were carried out according to the number of hidden nodes in the hidden layer from 1 to 100. Because the input weights and the hidden layer biases were chosen randomly, this experiment was repeated for 100 times. The best performance of all results will be regarded as the final classification results of ELM. For SVM, LIBSVM is employed in this paper, which is devolved by Chang and Lin [51].

Results and Discussion
Different features which are able to effectively represent the response of sensors are extracted from the time domain and frequency domain in order to evaluate the effectiveness of the proposed model. The peak value, the integral in the response stage, coefficients of Fourier coefficients (the DC component and first order harmonic component), and approximation coefficients of db1 wavelet of sensor response curve are chosen to be on behalf of the characteristics of E-nose signals from two transform domains [47][48][49][50]. Then, leave-one-out cross validation (LOO-CV) method is employed to evaluate the performances of different methods in this experiment for making full use of the data set. Another five classification models, namely ELM, SVM, KNN, LDA and quadratic discriminant analysis (QDA), are applied for comparison with KELM. ELM is an algorithm for single-hidden layer feed forward networks training that leads to fast networking requiring low human supervision. The main idea in ELM is that the network hidden layer parameters need not to be learned, but can be randomly assigned. The only parameter is the number of hidden nodes in the hidden layer of SLFN, which is normally obtained by a trial and error method. Thus, the input weights are within (´1, 1) and the hidden layer biases are within (0, 1). 100 experiments were carried out according to the number of hidden nodes in the hidden layer from 1 to 100. Because the input weights and the hidden layer biases were chosen randomly, this experiment was repeated for 100 times. The best performance of all results will be regarded as the final classification results of ELM. For SVM, LIBSVM is employed in this paper, which is devolved by Chang and Lin [51]. KNN requires two parameters to tune: The number of neighbor k and the distance metric. In this work, the values of k vary from 1 to 20, and several distance metrics which are used are Euclidean distance, cityblock distance, cosine distance and correlation distance. The best classification accuracy of different values of k and distance metrics will be regarded as the final results of the KNN.
Tables 1-4 list the classification results of the four feature extraction techniques and five classification models. The kernel function of KELM is set to Gaussian kernel. The bold type numbers in diagonal indicate the number of samples classified correctly, while others indicate the number of samples misclassified.
It can be observed from the above four tables that the classification accuracy of the four wounds is influenced both by different features and classification models. In general, features extracted from frequency domain can achieve better results, while features extracted from time domain do worse. It can be also seen that the classification effect of wavelet coefficients feature works best no matter what kinds of classifier are used, while peak value feature is just performs worst. QPSO-KELM always performs better than other four classifiers regardless of what kinds of features are used. SVM is invariably performs better than rest three classifiers as well. For wounds uninfected, the best performance is achieved when the wavelet feature is put into the QPSO-KELM model, where there is no sample misclassified; for wounds infected with Staphylococcu aureus, QPSO-KELM performs best when the peak value is used as the feature, in which there is only one sample misclassified; for wounds infected with Escherichia coli, the highest classification accuracy is achieved by QDA with the feature of Fourier coefficients; for wounds infected with Pseudomonas aeruginosa, QPSO-KELM achieves best when features are integral value and wavelet coefficients. Figures 4 and 5 show the variation of the classification rate with the number of hidden nodes in the hidden layer of ELM and the k value of KNN for the priority to classification of wavelet coefficients feature. Figure 4 shows only the classification results of one of the 100 repeated experiments to display the change process with the number of hidden nodes in the hidden layer varying from 1 to 100. It can be clearly seen that the classification rate gradually improves with the number of hidden nodes from 1 to 34 and from 79 to 96 from the Figure 4, while the classification rate gradually declines with the number of hidden nodes from 55 to 79. Moreover, ELM can achieve the best classification accuracy of 85% when is the number of hidden nodes are 45, 51 and 55. Figure 5 manifests that the classification rate gradually declines as the k value increases on the whole. For different distance metrics, the cityblock distance performs worst except k = 8, 10, 20. Meanwhile, the cosine distance can achieve the best classification accuracy of 86.25% at the start stage and performs best as well at the last stage.
Another three traditional optimization methods are also investigated and used to devaluate the effectiveness of the proposed model when wavelet coefficients are used as features. PSO [52], Genetic algorithm (GA) [53,54] and Grid search algorithm (GS) are employed to optimize parameters of KELM. For GA and PSO, the maximum number of iterations and the population size are also 400 and 30, respectively, which is the same as those of QPSO. For GS, the ranges of the model parameters are set according to [44].
The range of the cost parameter C and the kernel parameter of the Gaussian kernel function are both [2 -25 ,2 25 ], and the step length is set as 2 0.5 . Their classification performances are shown in Table 5.       Table 3. Classification results of Fourier coefficients.   Table 4. Classification results of wavelet coefficients.  hidden nodes from 55 to 79. Moreover, ELM can achieve the best classification accuracy of 85% when is the number of hidden nodes are 45, 51 and 55. Figure 5 manifests that the classification rate gradually declines as the k value increases on the whole. For different distance metrics, the cityblock distance performs worst except k = 8, 10, 20. Meanwhile, the cosine distance can achieve the best classification accuracy of 86.25% at the start stage and performs best as well at the last stage.   Another three traditional optimization methods are also investigated and used to devaluate the effectiveness of the proposed model when wavelet coefficients are used as features. PSO [52], Genetic algorithm (GA) [53,54] and Grid search algorithm (GS) are employed to optimize parameters of KELM. For GA and PSO, the maximum number of iterations and the population size are also 400 and 30, respectively, which is the same as those of QPSO. For GS, the ranges of the model parameters are set according to [44].

Class
The range of the cost parameter C and the kernel parameter of the Gaussian kernel function are both [2 -25 ,2 25 ], and the step length is set as 2 0.5 . Their classification performances are shown in Table 5.     It is well known that the choice of kernel function plays a crucial role in recognition and generalization capability. Thus, in order to further explore the effects of different kernel functions on the QPSO-KELM model, the effects of four kinds of common kernel functions combined with wavelet features are investigated in this experiment. Their classification performances of different kernel functions are shown in Table 6. It can be clearly concluded that the QPSO-KELM model with Gaussian kernel function performs best from Table 6, while the linear kernel function achieves the worst accuracy. Meanwhile, the performance of the polynomial kernel function is close to that of wavelet kernel function, which achieves 91.25% and 92.50% respectively. It means that the proposed model performs best in all of the above methods.
We also use the proposed model to deal with another two experimental E-nose datasets: (1) dataset of an E-nose which recognizes seven bacteria: Pseudomonas aeruginosa, Escherichia coli, Acinetobacter baumannii, Staphylococcu aureus, Staphylococcus epidermidis, Klebsiella pneumoniae and Streptococcus pyogenes. The classification results of various classification models based on steady-state signals of sensors are shown in Table 7. More details concerning the experiment can be found in [55]; (2) dataset of an E-nose which detects six indoor air contaminants including formaldehyde (HCHO), benzene (C 6 H 6 ), toluene (C 7 H 8 ), carbon monoxide (CO), ammonia (NH 3 ) and nitrogen dioxide (NO 2 ) and classification results are also shown in Table 8. More details include dataset generation regarding the experiment can be found in [56]. Table 7. Accuracy results of various feature extraction techniques and classification models for datasets in [55].

Class
Accuracy  It can be clearly concluded that the proposed QPSO-KELM model achieves the best classification accuracy among all of the above classification models for different datasets. The KELM achieves the best recognition performance of 100% for the dataset in [55] and can also obtain the best recognition accuracy except the recognition rate 70% of NH 3 for the dataset [56]. It demonstrates that the QPSO-KELM approach has outstanding generalized performance with other datasets, which efficacy does not depend on a particular dataset.

Conclusions
In this paper, a new methodology based on the QPSO-KELM model has been presented to enhance the performance of an E-nose for wound infection detection. Four kinds of features extracted from the time and frequency domains have been developed to demonstrate the effectiveness of this classification model for four different classes of wounds. It first introduces the kernel method based on extreme learning machine into the E-nose application of this paper, which provides a new idea for signal processing of E-nose data. Moreover, this paper also provides a good solution for the optimization of kernel function parameters by QPSO, which is a contraction mapping algorithm that outperforms ordinary optimization algorithms in the rate of convergence and convergence ability. Experimental tests have been carried out to verify that the proposed QPSO-KELM model can lead to a higher accuracy rate and manifest that the QPSO-KELM model can obviously enhance E-nose performance in various applications. The model in this study also provides an efficient approach in applications related to classification or prediction, not only in E-nose applications, but also in other uses.