Least Square Regression Method for Estimating Gas Concentration in an Electronic Nose System

We describe an Electronic Nose (ENose) system which is able to identify the type of analyte and to estimate its concentration. The system consists of seven sensors, five of them being gas sensors (supplied with different heater voltage values), the remainder being a temperature and a humidity sensor, respectively. To identify a new analyte sample and then to estimate its concentration, we use both some machine learning techniques and the least square regression principle. In fact, we apply two different training models; the first one is based on the Support Vector Machine (SVM) approach and is aimed at teaching the system how to discriminate among different gases, while the second one uses the least squares regression approach to predict the concentration of each type of analyte.


Introduction
The paper deals with the problems of gas detection and recognition, as well as concentration estimation. The fast evaporation rate and toxic nature of many Volatile Organic Compounds (VOCs) could be dangerous for the health of humans at high concentration levels in air and workplaces, therefore the detection of these compounds has become a serious and important task in many fields. In fact, VOCs are also considered as the main reason for allergic pathologies, lung and skin diseases.

OPEN ACCESS
Other applications of systems for gas detection are in environmental monitoring, food quality assessment [1], disease diagnosis [2][3], and airport security [4].
There are many research contributions on the design of an electronic nose system based on using tin oxide gas-sensors array in combination with Artificial Neural Networks (ANN) for the identification of the Volatile Organic Compounds (VOC's) relevant to environmental monitoring, Srivastava [5] used a new data transformation technique based on mean and variance of individual gas-sensor combinations to improve the classification accuracy of a neural network classifier. His simulation results demonstrated that the system was capable of successfully identifying target vapors even under noisy conditions. Simultaneous estimates of many kinds of odor classes and concentrations have been made by Daqi et al. [6]; they put the problem in the form of a multi-input/multi-output (MIMO) function approximation problem.
We formulate the problem of gas detection and recognition in the form of a two-class or a multiclass classification problem. We perform classification for a given set of analytes. To identify the type of analyte we use the support vector machine (SVM) approach, which was introduced by Vapnik [12] as a classification tool and strongly relies on statistical learning theory. Classification is based on the idea of finding the best separating hyperplane (in terms of classification error and separation margin) of two point-sets in the sample space (which in our case is the Euclidean seven-dimensions vector space, since each sample corresponds to the measures reported by the seven sensors which constitute the core of our system). Our classification approach includes the possibility of adopting kernel transformations within the SVM context, thus allowing calculation of the inner products directly in the feature space without explicitly applying the mapping [13]. As previously mentioned, we adopt a multi-sensor scheme and useful information is gathered by combining the outputs of the different sensors. In fact, in general the use of just one sensor does not allow identification of a gas, as the same sensor output may correspond to different concentrations of many different analytes. On the other hand, by combining the information coming from several sensors of diverse types under different heater voltages values we are able to identify the gas and to estimate its concentration.
The paper is organized as follows. In Section 2 we describe our Electronic Nose (ENose), while Section 3 gives a brief overview of the SVM approach. Section 4 is devoted to the description of our experiments involving five different types of analytes (acetone, benzene, ethanol, isopropanol, and methanol). Finally the conclusions are drawn in Section 5.

Electronic Nose
An electronic nose is an array of gas sensors, whose response constitutes an odor pattern [14]. A single sensor in the array should not be highly specific in its response but should respond to a broad range of compounds, so that different patterns are expected to be related to different odors. To achieve high recognition rates, several sensors with different selectivity patterns are used and pattern recognition techniques must be coupled with the sensor array [10]. Our system ( Figure 1) consists of five different types of gas sensors supplied with different heater voltages to improve the selectivity and the sensitivity of the sensors which are from the TGS class of FIGARO USA, Inc. The sensing element is a tin dioxide (SnO 2 ) semiconductor layer. In particular three of them are of TGS-822 type, each one being supplied with a different heater voltage (5.0 V, 4.8 V, and 4.6 V, respectively, see Figure 2), one of the TGS-813 type, and the last one is of the TGS-2600 type. Because the gas sensor response is heavily affected by environmental changes, two auxiliary sensors are used for the temperature (LM-35 sensor from National Semiconductor Corporation), and for the humidity (HIH-3610 sensor from Honeywell). The gas sensors and the auxiliary sensors are put in a box of 3000 cm 3 internal volume. Inside the box we put a fan to let the solvent drops evaporate easily. All sensors are connected to a multifunction board (NI DAQPad-6015), which is used in our system as an interface between the box and the PC. The National Instruments DAQPad-6015 multifunction data acquisition (DAQ) device provides plugand-play connectivity via USB for acquiring, generating, and logging data; it gives 16-bit accuracy at up to 200 kS/s, and allows 16 analog inputs, 8 digital I/O, two analog outputs, and two counter/timers. NI DAQPad-6015 includes NI-DAQmx measurement services software, which can be quickly configured and allows us to take measurements with our DAQ device. In addition NI-DAQmx provides an interface to our LabWindows/CVI [15] running on our Pentium 4 type PC.
The integrated LabWindows/CVI environment features code generation tools and prototyping utilities for fast and easy C code development. It offers a unique, interactive ANSI C approach that delivers access to the full power of C Language. Because LabWindows/CVI is a programming environment for developing measurement applications, it includes a large set of run-time libraries for instrument control, data acquisition, analysis, and user interface. It also contains many features that make developing measurement applications much easier than in traditional C language environments.
For support vector machine (SVM) training and testing in multi-class classification we use LIBSVM-2.82 package [16]. LIBSVM-2.82 uses the one-against-one approach [17] in which, given k distinct classes, k(k -1)/2 binary classifiers are constructed, each one considering data from two different classes. LIBSVM provides a parameter selection tool for using different kernels and allows cross validation. For median-sized problems, cross validation might be the most reliable way for parameter selection. First, the training data is partitioned into several folds. Sequentially a fold is considered as the validation set and the rest are for training. The average of accuracy on predicting the validation sets is the cross validation accuracy [18]. In particular the leave-one-out cross validation scheme consists of defining folds which are singletons, i.e. each of them is constituted by just one sample.

Support Vector Machine (SVM)
Support vector machines (SVMs) are a set of related supervised learning methods used for classification and regression of multi dimensional data sets [19,14]. They belong to the family of generalized linear classifiers. This family of classifiers has both the abilities of minimizing the empirical classification error and maximizing the geometric margin. In fact a SVM is also known as maximum margin classifier [9]. In this section we summarize the main features of SVM. Detailed surveys can be found in [3,14,[20][21]. SVM looks for a separating hyperplane between the two data sets. The equation of such hyperplane is defined by: where w is the weight vector which defines a direction perpendicular to the hyperplane, x is the input data point, and b is the bias value (scalar), for a proper normalization. The margin is equal to ||w|| -1 . Therefore maximizing the margin is equivalent to minimizing ||w||. The advantage of this maximum margin criterion is both robustness against noise and uniqueness of the solution.
In many practical cases the data are not linearly separable, then the hyperplane tries to both maximize the margin and minimize the sum of classification errors at the same time. The error i represents the class membership) with respect to a target margin  and for a hyperplane defined by f is:  is called the margin slack variable which measures how much a point fails to have margin . If The error i  is greater than zero if the point x i is correctly classified but with margin smaller than  .
Finally, the more x i falls in the wrong region, i.e. satisfies equation 3, the bigger is the error. The cost function to be minimized is: where C is a positive constant, which determines the trade off between accuracy in classification and margin width [20][21]. Therefore, this constant can be regarded as a regularization parameter. When C has a small value, the optimal separating hyperplane tends to maximize the distance with respect to the closest point, while for large values of C, the optimal separating hyperplane tends to minimize the number of non-correctly classified points.
If the original patterns are not linearly separable, they can be mapped by means of appropriate kernel functions to a higher dimensional space called feature space. A linear separation in the feature space corresponds to a non-linear separation in the original input space [11]. Kernels are a special class of functions that permit the inner products to be calculated directly in the feature space, without explicitly applying the mapping. The family of kernel functions adopted in machine learning range from simple linear and polynomial mappings to sigmoid and radial basis functions [22]. In this paper linear kernel is used.

Experiments and Results
In our experiments we used five different types of volatile species with different concentrations. They are acetone, methanol, ethanol, benzene, and isopropanol. The data set for these volatile species is made up of samples in R 7 space where each sample correspond to the outputs of the gas and auxiliary sensors.

Samples Preparation
Our box contains the PCB (Printed Circuit Board) where we fixed two different types of sensors, i.e. gas sensors and auxiliary sensors. It also contains a fan for circulating the analyte inside during the test. The system encompasses one input for inlet air coming from an air compressor which has been used to clean the box and the gas sensors after each test. One output is used for the exhaust air. The inner dimensions of the box are 22 cm length, 14.5 cm width, and 10 cm height, while the effective volume is 3,000 cm 3 . The amount of volatile compounds needed to create the desired concentration in the sensor chamber (our box) was introduced in the liquid phase using a high-precision liquid chromatography syringe. Since temperature, pressure and volume were known, the liquid needed to create the desired concentration of volatile species inside the box could be calculated using the ideal gas theory, as we explain below. The analyte concentration versus analyte volume injected is shown in Table 1.
A syringe of 10 µL is used for injecting the test volatile compounds. We take methanol as an example for calculating the ppm (parts-per-million) for each compound. Methanol has a molecular weight MW = 32.04 g/mol and density  = 0.7918 g/cm 3 . The volume of the box is 3,000 cm 3 ; therefore, for example, to get 100 ppm inside the box, from Table 1, we used 0.3 cm 3 of methanol. 3.00 2,000 6.00 The density of methanol is Where:  = the density of the gas of Methanol in g/L, P = the Standard Atmospheric Pressure (in atm) is used as a reference for gas densities and volumes (equal 1 atm), MW = Molecular Weight in g/mol, R = universal gas constant in atm/mol.K (equal 0.0821 atm/mol.K), T = temperature in Kelvin (T K = T C + 273.15). As a result we get d = 1.33 g/L. Mass = v gas *  = v liq *  (7) where v gas is the volume occupied by the gas of methanol which is equal to 0.3*10 -3 l,  is the density of the gas of Methanol as calculated before,  is the constant density of methanol, therefore; v liq = (v gas x ) /   v liq = (0.3 * 10 -3 * 1.33) / 0.7918, the volume (v liq ) is 0.503*10 -6 l which provides 100 ppm of methanol. This means that if we want to get 100 ppm of methanol we must put 0.503 µL of liquid methanol in the box by using the syringe. Table 2 shows different concentrations of Methanol (in ppm) versus its quantities (in µL).

Results
In the first analysis, we used a SVM with linear kernel, and we applied a multi-class classification by using the LIBSVM-2.82 package [16]. The optimal regularization parameter C was tuned experimentally by minimizing the leave-one-out cross-validation error over the training set.
In fact the program was trained as many times as the number of samples, each time leaving out one sample from training set, and considering such omitted sample as a testing sample check the classification correctness. The classification correctness rate is the average ratio of the number of samples correctly classified and the total number of samples. The results are shown in Table 3 for different values of C. We used 22 concentration samples for acetone, 22 for benzene, 20 for ethanol, 23 for isopropanol, and 21 for methanol. For each concentration the experiment was repeated twice, thus a total number of 216 classification calculations was performed.. By using linear kernel we got 100.00% classification correctness rate for C = 1,000 adopting a leave-one-out cross-validation scheme. We remark that such results are better than those obtained by supplying all sensors by the same heater voltage (in such case, in fact, the best classification correctness rate was 94.74%). Once the classification process has been completed, the next step is to estimate the concentration of the classified analyte. To this aim, we use the least square regression approach. We build an approximation of the response (sensor resistance versus analyte concentration) for each sensor and each analyte. Then we use this approximation to find the concentration for each analyte type.
For sintered SnO 2 gas sensor, the concentration dependence of the response to a simple analyte exposure is nonlinear and can be described by a power law of the form [23] and n is the number of samples, which are indexed by i. Figure 3 shows, as an example, the original concentrations with respect to their sensor resistances, as well as the estimated curve for the analyte acetone. We have five curves, one for each sensor. The optimal estimate of the concentration is in our model a combination of the outputs of the diverse sensors. We have adopted the least square regression model to find the optimal weights on the basis of the experimental data. We come out in our experiments with five measures for each analyte sample. The weights α's are obtained by solving the following minimization problem : where n is the number of analyte samples, c is the true concentration, M is the number of sensors (in our case M = 5), c the concentrations that have been previously calculated (from equation 8). Tables 4-8 show the real concentrations with respect to the results of the proposed method. For comparison purposes we add in the table also the results obtained by simply averaging the outcomes provided by the five sensors.   Finally we considered (Table 9) the correlation coefficient (C.C) as a measure for the estimation accuracy [8]. The correlation coefficient is a number between 0.0 and 1.0. If there is no relationship between the predicted values and the actual values the correlation coefficient is 0.0 or very low (the predicted values are no better than random numbers). As the strength of the relationship between the predicted values and actual values increases so does the correlation coefficient. A perfect fit gives a coefficient of 1.0. Thus the higher correlation coefficient (near to 1.0) the better is the regressor [7]. The correlation coefficient is calculated as follows: where C.C is the correlation coefficient, X are the actual values, X are the predicted values, and n is the number of data points.

Conclusions
The results demonstrate that our system has the ability to identify the type of analyte and then estimate its concentration. The best correctness rate was 100.00%. Also the values obtained in terms of concentration estimates appear quite satisfactory. Supplying three similar sensors (TGS-822) with different heater voltages, improved the performance of the system. Future work will be devoted to identify binary mixture of gases and then to estimate the concentration of each component.