A Discriminant Distance Based Composite Vector Selection Method for Odor Classification

We present a composite vector selection method for an effective electronic nose system that performs well even in noisy environments. Each composite vector generated from a electronic nose data sample is evaluated by computing the discriminant distance. By quantitatively measuring the amount of discriminative information in each composite vector, composite vectors containing informative variables can be distinguished and the final composite features for odor classification are extracted using the selected composite vectors. Using the only informative composite vectors can be also helpful to extract better composite features instead of using all the generated composite vectors. Experimental results with different volatile organic compound data show that the proposed system has good classification performance even in a noisy environment compared to other methods.


Introduction
An electronic nose is an instrument intended to identify the specific components of an odor. While human olfactory sensing is prone to be easily fatigued, an electronic nose has the merit of consistently detecting odors, including those harmful to the human body [1][2][3][4]. Electronic nose systems are used for various purposes, such as quality control applications in the food and cosmetics industries, the detection of odors regarding specific diseases for medical diagnosis, and the detection of gas leaks for environmental protection [3,[5][6][7][8][9]].
An electronic nose consists of a sensor array for chemical detection, which is made of polymer carbon composite materials, and a classifier based on various pattern recognition techniques. Hence, the sensitivity of a sensor array and the design of a classifier are crucial factors for the improvement of electronic noses. There are several types of sensor arrays for electronic noses [10][11][12][13][14][15]. Among them, conducting polymer composites, intrinsically conducting polymer and metal oxides are most commonly used for sensing materials in conductivity sensors. Once volatile organic compounds (VOC) are adsorbed on the sensor surface, a specific response is obtained as a numerical variable by an electronic interface.
In classification problems, the processes can be decomposed into a few steps: feature selection, feature extraction and choosing a classifier. Various static or dynamic information for odor classification can be obtained from the sensor response curve [16][17][18]. In [17,18], five features, which are the relative change in resistance, the curve integral both over the gas adsorption and desorption process and the phase space integral, again over adsorption and desorption, are extracted from the response curves of six metal oxide sensors. The analysis of the dynamic features of metal oxide sensors was presented to classify four types of volatile compounds, namely acetone, acetic acid, acetaldehyde and butyric acid [16] and active analyses were proposed to deal with gas mixture problems [19,20]. In [21][22][23], various compensation methods were proposed to solve the drift problem causing a random temporal variation of the sensor response under identical conditions. The features extracted from the sensor array are fed into a classifier such as the NN (Nearest Neighbor rule) [2] or SVM (Support Vector Machine) [9] for prediction of the class label. In order to improve the performance of a classifier, various feature extraction methods can be used for discriminant analysis and dimensionality reduction [24][25][26][27]. Since each method has its pros and cons, an appropriate method must be selected considering the properties of the data and the problem that needs to be solved. For instance, the PCA (Principal Component Analysis) method [28] does not utilize class information of data samples, and finds the projection vectors that correspond to a set of large eigenvalues of the total scatter matrix of data samples. Thus, it is more appropriate to use the PCA method for data representation, rather than data classification. On the other hand, the LDA (Linear Discriminant Analysis) method [29] seeks the linear transformation that maximizes the ratio of the between-class scatter matrix (S B ) and the within-class scatter matrix (S W ). While it gives good performance for classification problem, it suffers from the SSS (Small Sample Size) problem [29] in case of high-dimensional data.
The above methods extract features based on covariance matrices which differ depending on their objective functions. Unlike this, some methods such as MatFLDA (Matrixized Fisher Linear Discriminant Analysis) [30], 2DFLD (Two-Dimensional Fisher Linear Discriminant) [31], or CLDA (Composit LDA) [32,33], use a different type of covariance matrix, which is called an image-covariance matrix. The elements of an image covariance matrix are defined as the expectation of the inner products of predefined vectors. These methods are often effective for data that has a large correlation between primitive variables or high-dimensional data such as the electronic nose data [34] because they utilize information about the statistical dependency among multiple primitive variables and result in a saving in computational effort.
The composite features are extracted by using the covariance of composite vectors composed of a number of primitive variables in various shapes of windows. However, it is likely that there is redundancy between composite vectors when generating composite vectors. Moreover, If there are problems in the data collection process, or when attributes among the collected primitive variables that have no association with solving the classification problem are included, the feature extraction results do not result in optimal solutions and degrade the classification performance [24]. Therefore, distinguishing good composite vectors containing informative primitive variables before the feature extraction process is important to extract better composite features for classification.
In this paper, we propose a method to select the composite vectors which contain informative variables in an electronic nose data sample measured by a sensor array. We measure the amount of discriminative information that each composite vector has, based on the discriminant distance [35] for each composite vector and rank n cf composite vectors in descending order according to its discriminant score. The informative composite vectors are distinguished before the process of feature extraction, and then the composite features to be used for the classifier are extracted from the only selected composite vectors. There are potential benefits in employing this selection process such as reduction in computation, storage and processing time in addition to prediction performance improvement. In the process of extracting composite features, the computational effort increases in the order of v 2 as the number of composite vectors (v) increases. This implies that the computational complexity can be significantly reduced by the proposed method. By using a classifier in an electronic nose with the extracted composite features, we design the robust electronic nose system to noisy environments ( Figure 1). The experimental results show that the proposed method gives very good classification results even in a noisy environment. The rest of this paper is organized as follows. Section 2 introduces a discriminant distance and presents how to select composite vectors based on their discriminant scores. Section 3 explains the acquisition of electronic nose data and how composite features are extracted using the selected composite vectors for odor classification. Section 4 describes the experimental results and the conclusions follow in Section 5.

Composite Vector Selection Based on Discriminant Distance
Composite vectors can be defined in various ways depending on the shape of a window. The data acquired from a sensor array is stored in an n-dimensional vector, and a composite vector x i ∈ R l consists of l(l < n) primitive variables. Composite vectors are generated by shifting a window as much as s, which is usually smaller than the length of a composite vector, and thus composite vectors overlap with each other, as shown in Figure 2. The correlation between neighboring variables can be better utilized in the use of the covariance of composite vectors. The number of composite vectors v is ⌊ n−l s ⌋ + 1, where ⌊·⌋ is the floor operator, which gives the largest integer value that is not greater than the value inside the operator. Then, the k-th data sample is represented by which is a set of composite vectors. The final composite features for classification are extracted by using the covariance of these composite vectors [36].  However, the overlapped composite vectors as in Figure 2, which may result in redundancy in extracting composite features. Therefore, it needs to find out the composite vectors that promise good class separability among different classes as well as make the samples in the same classes as close as possible. Motivated from the method to select individual variables based on a distance discriminant [35], we define the distance within classes (D i W ) and the distance between classes (D i B ) to compute the discriminant distance for the i-th composite vector ., x l i (k)] T as follows: Here, m j i , m j and N i are the j-th element of the mean of the class c i , the j-th element of the mean of whole training samples and the number of samples in the class c i , respectively. Then, the discriminant distance for the i-th composite vector is computed by D i B − βD i W , which reflects the discriminative information of each composite vector. The value of β can be determined depending on the distribution of data samples. For example, in case of the distribution which has good class separability but large variance in the same class, small penalty (β) on D i W will be better. By investigating the performance with respect to β, we set β as 2. For composite vector selection, we define the measure vector as S ∈ R v whose element S i = D i W − βD i B . Finally, n cf composite vectors corresponding to larger S i s are selected for extracting the final composite features.

Acquisition of Electronic Nose Data
The sensor array used in our system was implemented by dispensing a CB polymer composite-solvent solution in a micromachined gas sensor array chip [15]. While the polymer composite has some drawbacks such as sensor drift, limited sensor life, or sensitivity to temperature and humidity it offers many advantages over other materials when used as gas sensor, e.g., the wide range of polymetric materials, inexpensiveness, stable operation at room temperature, and less power consumption, etc. [10] The sensor array consists of 16 separate sensors with an interdigitated electrode, microheater, and micromachined membrane in each channel for further temperature-controlled measurement applications ( Table 1). The resistance change of each polymer composite film was monitored in response to the incorporation of chemical vapor. The resistance change of polymer composite film was amplified by 20 times and recorded every 0.1 s (Figure 3). Measurement consisted of three steps of stabilization (30 s), exposure (60 s), and purge (110 s). It was performed after the sensor array was placed into the chamber and and the signal of resistance was stabilized. Then, the flow control unit in our system allows the vapors to flow in at desired concentration during about 60 s and afterward flushes the remainder by air flow for about 110 s [37]. The measured data are collected in PC using data acquisition (DAQ) board DAQ6062E and LabVIEW (National Instrumentation, USA). The voltage-divider operated in the range from -10 V to +10 V and gains of 16 identical amplifiers were set to 10 (output/input voltage) for maximum DAQ resolution [15]. Poly(4-methylstyrene) Ch 7 Poly(styrene-co-methyl methacrylate) Ch 8 Poly(ethylene-co-vinylacetate) Ch 9 Poly(bisphenol A carbonate) Ch 10 Poly(4-vinyl pyridine) Ch 11 Poly(vinyl butyral)-co-vinyl alcphol-co-vinyl acetate Ch 12 Poly(vinyl stearate) Ch 13 Ethyl cellulose Ch 14 Polystyrene-black-polyisoprene-black-polystyrene Ch 15 Hydroxypropyl cellulose Ch 16 Cellulose acetate Figure 3. Typical time-responses of 16 channel sensor array with respect to inflow of acetone vapor at 5,000 ppm [2].

Extraction of Composite Features from Selected Composite Vectors
It is very effective for classifying patterns if the within-class variance is small while the between-class variance is large. Similar to LDA, a discriminant analysis using the covariance of composite vectors is derived from the between-class covariance matrix (C B ) and the within-class covariance matrix (C W ) [29]. Assume that each training sample belongs to one of c classes, and that there are N i samples in the class c i . Let X ′ (k) ∈ R n cf ×l denote the set of the selected composite vectors of the k-th sample. Then, C W ∈ R n cf ×n cf is defined as where Here, p i is a prior probability that a sample belongs to class c i . C B ∈ R n cf ×n cf is also defined as The image covariance can be also interpreted from another point of view, not from the view of the composite vectors. If letting χ(k) and m be column vectors of X ′ (k) and M , respectively, C W and C B can be rewritten as χ j (k) consists of the j-th elements in each of the selected composite vectors, which is sampled from X ′ (k) with regularly varying intervals. This is the similar effect that generates l times more data samples of smaller size. The increase of the number of data samples will provide a robust performance to the variation caused by the noise. Composite features are obtained by linear combinations of the composite vectors and each feature is a vector whose dimension is equal to the dimension of the composite vector. For composite feature extraction, the projection matrix W is found by maximizing the following objective function: The set of composite features for Y (k) is obtained by projecting X ′ (k) into the projection matrix W as where Y (k) ∈ R m×l has m composite features [y 1 (k) . . . y m (k)] T . The length of the window (l), the number of composite features (m) and the step size of the shift (s) are important parameters that influence the classification performance. We investigated the classification rates with respect to l, m and s. Table 2 shows the classification rates with respect to l and m. In this case, we set s = l/2 as in [32]. As can be seen in Table 2, the classification rates are not sensitive to l if m is properly decided. We set l and m to 400 and 25, respectively. Then, we investigated the classification rates with respect to s. As can be seen in Table 3, the classification rates are not sensitive to s and the classification rate of s = 200 was slightly better than those of other s values. Therefore, we set s to 200. Also, in order to find the optimal number of the selected composite vectors, we checked the classification rates for the electronic nose data by increasing the number of selected composite vectors n cf . As a result, we set the number of selected composite vector n cf to 150.  The overall procedure of our system can be summarized as follows ( Figure 4): (1) Generate v composite vectors x i (k), i = 1, .., v ∈ R l from an e-nose data sample by shifting the l length of window as much as the step size of shift (s).

Experimental Results
The VOC measurement data consists of 8 classes, which are acetone, benzene, cyclo-hexane, ethanol, heptane, methanol, propanol, and toluene [15]. For each class, we obtained 20 samples, and thus the total data set contains 160 samples. Figure 5 shows the distribution of the data samples in the subspace consisted of two principal component axes. The e-nose sensor used in this experiment measures vapors with a speed of 10 Hz, which corresponds to a sampling rate of 2,000 points per 200 s. Each data sample was measured through 16 channel over 2,000 time points and was represented as a 16 × 2,000 matrix. Then, the raw data was transformed into the 32,000-dimensional vector by using the lexicographic ordering operator for feature extraction (Figure 2).
When setting l and s as 400 and 200, respectively, the total 159 composite vectors can be generated from a 32,000-dimensional data sample. We measured the discriminant scores of each composite vector by using the proposed method. Out of the total 159 composite vectors, we represented the composite vectors with top 60 and 120 scores as '1' and the rest as '0' (Figure 6). In Figure 6, we can see that the 'stabilization' and 'purge' periods contain the discriminative information for odor classification as well together with the 'exposure' period. We compared the classification performance of the proposed method (CVS) with that of the LDA method [26], the FF (Feature Feedback) method [38], the CC-PCA (Component Correction by PCA) method [39], and CC-CPCA (Component Correction by Common PCA) method [22]. We applied PCA after CC-PCA and CC-CPCA, which slightly increased their classification rates. Each method was evaluated using an 8-fold cross validation strategy [40]. In this scheme, the data is first randomly partitioned into 8 equally sized folds. Then, 8 iterations of training and testing are performed, within each of which a different fold of the data (20 data samples) is used for testing, while the remaining 7 folds (140 data samples) are used for training. The nearest neighbor rule was used as a classifier and the l 2 nor was used to measure the distance between two samples. We repeated this test 8 times and computed the average classification rate. All the data samples are normalized using the mean and the variance of the training set.
Since noise is likely to occur in sensing data, we added Gaussian noise with a standard deviation 3 to each data sample, and evaluated the robustness of each method to the noise (Figure 7).    For the original data, all the methods classified each vapor well with high classification rates as can be seen in Figure 8a. When Gaussian noise is added, the classification rates of the other methods decreased rapidly (Figure 8b). In contrast, the proposed method gave consistently high classification rates of 97.3% ∼ 98.4%, which showed that our system performs reliably in a noisy environment.

Conclusions
We have presented a method to select useful composite vectors for odor classification. Composite vectors, which are generated from an electronic nose data sample by shifting the window, are likely to contain redundant information for extracting discriminant features and some noise occurred in measuring with a sensor array. Thus, we evaluated the class separability power of each composite vector based on a discriminant distance and selected the only composite vectors with large discriminative information. This selection process has the advantage to holistically view the electronic nose response by its focus on the extraction of informative response characteristics. The proposed composite vector selection method not only reduced the computational complexity, but also helped to extract better features. Since extracting good features not only relieves the influence of noise in the measured data, but also improves the performance of a classifier such as SVM and NN. When using SVM without any feature extraction, while the classification rate for the original electronic nose data was 98.0%, the classification rate dropped to 51.2% for the data with Gaussian noise. On the contrary, NN with the features extracted by the proposed method gave the classification rates of 99.8% and 98.4% for the same data sets, respectively. Hence, the proposed method can be utilized together with algorithms of other classification processes such as feature selection or classifier design and improve the performance of the overall classification system.
In this paper, we focus on the classification between gas data classes without interference. It is also important to classify the data which contains combinations of gases, different concentration, etc. in e-nose data. In near future, we will deal with the interference between gases and gas combinations.