This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/.)
A great deal of work has been done to develop techniques for odor analysis by electronic nose systems. These analyses mostly focus on identifying a particular odor by comparing with a known odor dataset. However, in many situations, it would be more practical if each individual odorant could be determined directly. This paper proposes two methods for such odor components analysis for electronic nose systems. First, a Knearest neighbor (KNN)based local weighted nearest neighbor (LWNN) algorithm is proposed to determine the components of an odor. According to the component analysis, the odor training data is firstly categorized into several groups, each of which is represented by its centroid. The examined odor is then classified as the class of the nearest centroid. The distance between the examined odor and the centroid is calculated based on a weighting scheme, which captures the local structure of each predefined group. To further determine the concentration of each component, odor models are built by regressions. Then, a weighted and constrained leastsquares (WCLS) method is proposed to estimate the component concentrations. Experiments were carried out to assess the effectiveness of the proposed methods. The LWNN algorithm is able to classify mixed odors with different mixing ratios, while the WCLS method can provide good estimates on component concentrations.
An electronic nose is a biomimetic olfactory system developed based on chemical sensor principles, electronic system design and data analysis techniques. In the biological olfactory system, there are about 350 different odorant receptors in humans and about 1,000 in mice. Different odors are recognized by different combinations of odorant receptors [
The stateoftheart techniques for sensor array data analysis and the applicability of each technique have been discussed by Jurs [
The methods of dimensionality reduction, such as Principal Component Analysis (PCA) and Linear Discrimination Analysis (LDA), seek to reduce the data size required for classification. PCA is an unsupervised method, which finds a set of orthogonal projection directions that capture the largest amount of variation in data without using the class information of the data. On the other hand, LDA makes use of the class labels to find a lowerdimensional vector space for best class separation. For example, a 100% classification rate was achieved by LDA for classification of different tomato maturity states and different qualities of green tea samples [
Regression analysis is a statistical data analysis approach which seeks a continuous fitting function of independent variables to model the dependent variables. The LeastSquares method can be used to find such fitting function by minimizing the sum of squared differences between each of the known data point and the fitting function. The NASA’s Jet Propulsion Laboratory (JPL) used a set of selfdeveloped polymer composite sensors to quantify single and mixed contaminants [
Although the classification methods represent a promising technology for analyzing electronic nose data, its applications are mainly focused on discrimination between different odors. Moreover, odors containing the same components but with different mixing ratios are generally perceived as different smells. For this reason, a traditional classification method will not be applicable for differentiating the smells. A more practical solution is to partition the odor space into subspaces and classify an odor into one of the subspaces. This paper adopts a supervised strategy to categorize the mixed odor dataset into several groups according to the components. The nearest neighbor method is then used to classify the response pattern into one of the predefined groups. A weighting scheme is proposed to rescale the distance between two data points and thus the classification accuracy could be improved. Another solution for analysis of odor mixture is to directly determine the concentration of each component present in the examined mixture by analyzing the response pattern. Regression methods are applied in this paper to build odor models. The component concentrations are estimated by solving a weighted and constrained leastsquares problem, in which each of the squared error term is weighted to reflect the reliability of each estimated sensor response.
The rest of this paper is organized as follows: Firstly, the proposed methods for analyzing mixed odors will be described in Section 2. Then, the data collection methods and experimental results will be provided to evaluate and support the proposed methods in Section 3. Finally, Section 4 will conclude the contribution of this work.
Traditionally, an electronic nose is not designed to analyze mixed odors but merely to differentiate between different smells. This paper proposes to determine the components that are most significant in a mixture by analyzing the sensor response pattern of the odor mixture. This work is based on the following two assumptions [
♦
♦
Based on the assumption of homogeneity, the normalized mixed odor dataset could be categorized according to the contained components without considering the concentration of each component. For example, the categorization results for odors of three components would be like the one shown in
Assume that there are
Note that
As aforementioned, the proposed
Although the proposed LWNN method can be used to efficiently determine the set of components present in an odor mixture, the concentration of each component is still unknown. Nevertheless, a regression method could be used to estimate the component concentration. According to the assumption of homogeneity, the sensor generated by the
Based on the linear additive assumption, the response of the
Note that the response of each component is weighted with a weighting term β_{i,j} and an offset term β_{i,offest} is introduced in
The parameters in
In order to get a close form expression for each of the weighting terms, the product of the weighting terms is set to one:
According to
The proposed methodology that uses a weighted and constrained leastsquares method (WCLS) to estimate the component concentrations of a mixed odor is presented in
Although mixing of odors can yield linear additive trend, it is not necessarily common. The effect of mixing can often lead to (1) masking or dominance by a stronger component [
This section presents the performance of the KNNbased methodologies, which are listed below:
KNN: KNN using the default Euclidean distance metric.
PCA+KNN: KNN over the reduced space generated by Principal Component Analysis (PCA).
LDA+KNN: KNN over the reduced space generated by Linear Discrimination Analysis (LDA).
WNN: The proposed Locally Weighted Nearest Neighbor method.
For each method, except for LWNN, in which the K value is fixed to one, the value of K varies from one to five. The performances of the four KNNbased methods were evaluated by using the collected odor data. The training dataset were partitioned into seven component sets according to the components:
♦ M: methanol.
♦ E: ethanol.
♦ A: acetone.
♦ ME: mixture of methanol and ethanol.
♦ EA: mixture of ethanol and acetone.
♦ AM: mixture of acetone and methanol.
♦ MEA: mixture of methanol, ethanol and acetone.
The results are summarized in
The reason is that PCA seeks to separate all the data points as widely as possible. However, the local correlation structure of each component set may be distorted. As shown in
Although the method of KNN applied with LDA outperforms the proposed LWNN method; LWNN is the most efficient way among the examined KNNbased methods since there is no additional computation to determine the best K value. Moreover, LWNN does not require solving any costly eigenvalue problem, which is necessary for both PCA and LDA. Nevertheless, the proposed LWNN method yields an acceptable accuracy to classify and identify the component set.
This section reports the performance of the proposed methodology that uses a weighted and constrained leastsquares method to estimate the concentration of each component present in an odor mixture. The randomly assigned training dataset are used to build odor models:
P_{M}: the odor model for methanol.
P_{E}: the odor model for ethanol.
P_{A}: the odor model for acetone.
M: the odor model for mixtures of methanol, ethanol and acetone.
Two methodologies for estimating component concentrations are tested and compared in this section:
CLS: the constrained leastsquares method.
WCLS: the proposed weighted and constrained leastsquares method.
The metric of
This study aimed to determine the mixture components and estimate the concentration of each of the contained component, assuming homogeneity and linear additive. A KNNbased method, LWNN, is proposed to determine the components present in a mixed odor by classifying its sensor responses to the closest previously partitioned component sets. Furthermore, a local weighting scheme, which associates each component set with an independent weighting vector, is proposed to rescale the distance between a testing data point and the centroid of a component set. For each component set, a higher weight is assigned to the sensor response when the sensor yields a very consistent response to that class.
To further estimate the component concentrations, odor models have been built by regressions. Based on these odor models, a weighted and constrained leastsquares problem is solved to estimate the concentration of each of the component present in the examined mixture. A weighting scheme is adopted to reflect the reliability of each estimated sensor response. If the estimated response value of a sensor is close to the observed response, a large weight would be assigned to the squared error between the estimated and observed sensor response.
To evaluate the effectiveness of the proposed methods, a set of odor data has been collected by mixing three highly volatile solvents with different mixing ratios. LDA has been noted for its ability to discriminate between different component sets regardless of its high computational cost. Furthermore, the proposed LWNN method is shown to be comparable to the commonly applied KNNbased methodology but with lower computational cost since there is no additional computation to determine the best K value for better classification performance. However, LWNN is not suitable for estimation of component concentrations and becomes complex when the number of component increases. The proposed methodology that uses a weighted and constrained leastsquares method (WCLS) also demonstrates to provide a good estimate for component concentrations especially for odor mixtures, yet WCLS may provide erroneous concentration estimates for pure odors.
The authors would like to acknowledge the support of the National Science Council of Taiwan, under Contract No. NSC 972220E007036 and NSC 982220E007017. We also acknowledge the support of the ChungShan Institute of Science and Technology, under Contract No. CSIST808V207. The authors would like to thank National Chip Implementation Center (CIC) for technical support.
A schematic plot of the normalized data set of mixed odors consists of three odor components: A, B and C. The data points are partitioned into seven component sets according to the contained odor components: A, B, C, AB, BC, CA, and ABC.
A schematic plot of the experimental setup for data collection.
The normalized odor patterns of three vaporized solvents with different concentrations: (a) methanol, (b) ethanol and (c) acetone.
The response of each sensor over three vaporized solvents:
The projections of
The estimated errors of the CLS method and the proposed WCLS over the testing dataset for
The root mean squared error of all the estimated results in
The training stage of LWNN.
(1) Start with a set of predefined classes, 
(2) Compute its centroid

The testing stage of LWNN.
For each data point 
(1) Compute the weighted distances between 
(2) Classify 
The training stage of WCLS.
For each pure odor, 
(1) Build the pure odor model according to 
(2) Build the mixed odor model according to 
Compare each of the weighting terms by 
The testing stage of WCLS.
For each testing odor data, 
(1) Estimate the component concentration by solving a weighted leastsquares problem as 
Accuracy of the KNNbased methods.
KNN  93.94  93.94  *95.45  93.94  93.94 
LDA + KNN  96.97  *100.00  98.48  98.48  96.97 
PCA + KNN  *48.48  39.39  39.39  40.91  25.76 
LWNN  *95.45  —  —  —  — 
The concentration estimation results of the proposed methodology for mixed odors with three components.
 

MEA1  34  34  34  33.32  30.12  38.61  3.50 
MEA2  34  34  51  37.29  33.34  49.54  2.11 
MEA3  34  34  68  34.69  31.50  68.82  1.57 
MEA4  34  51  34  38.85  49.33  29.74  3.85 
MEA5  34  68  34  35.82  64.28  38.30  3.45 
MEA6  51  34  34  56.11  36.80  26.81  5.34 
MEA7  68  34  34  66.88  29.93  38.24  3.45 
The concentration estimation results of the proposed methodology for mixed odors with two components.
 

ME1  34  34  0  31.70  31.19  0.46  2.11 
ME2  34  51  0  37.79  45.23  2.84  4.31 
ME3  34  68  0  39.66  64.52  2.67  4.13 
ME4  51  34  0  47.44  32.20  0.00  2.30 
ME5  51  51  0  52.46  47.45  0.00  2.22 
ME6  68  34  0  64.05  31.94  0.00  2.57 
 
EA1  0  34  34  0.48  32.50  28.64  3.23 
EA2  0  34  51  2.82  28.63  49.35  3.63 
EA3  0  34  68  2.93  24.32  68.49  5.85 
EA4  0  51  34  5.77  39.01  38.21  8.06 
EA5  0  51  51  5.39  37.60  54.68  8.61 
EA6  0  68  34  9.61  48.33  40.86  13.25 
 
AM1  34  0  34  19.27  13.62  24.29  12.87 
AM2  34  0  51  20.29  7.45  46.22  9.42 
AM3  34  0  68  20.92  2.77  65.17  7.89 
AM4  51  0  34  33.37  7.59  32.14  11.13 
AM5  51  0  51  36.08  3.06  51.03  8.79 
AM6  68  0  34  51.07  1.96  47.15  12.43 
The concentration estimation results of the proposed methodology for pure odors.
 

M1  28  0  0  7.55  16.20  0.00  15.06 
M2  45  0  0  28.94  11.38  0.00  11.36 
M3  57  0  0  40.94  9.07  0.00  10.65 
M4  68  0  0  53.96  6.57  0.00  8.95 
M5  85  0  0  73.01  3.92  0.00  7.28 
 
E1  0  28  0  0.02  29.57  0.00  0.91 
E2  0  45  0  1.32  41.71  0.00  2.05 
E3  0  56  0  6.52  48.33  2.49  5.99 
E4  0  68  0  8.00  54.99  4.30  9.16 
E5  0  85  0  6.12  68.56  15.00  13.33 
 
A1  0  0  28  0.00  20.17  0.00  19.92 
A2  0  0  45  0.00  15.19  18.17  17.80 
A3  0  0  56  0.00  11.64  32.61  15.08 
A4  0  0  68  0.00  7.54  47.22  12.76 
A5  0  0  85  0.00  0.97  71.11  8.04 