Electronic Nose Based on Independent Component Analysis Combined with Partial Least Squares and Artificial Neural Networks for Wine Prediction

The aim of this work is to propose an alternative way for wine classification and prediction based on an electronic nose (e-nose) combined with Independent Component Analysis (ICA) as a dimensionality reduction technique, Partial Least Squares (PLS) to predict sensorial descriptors and Artificial Neural Networks (ANNs) for classification purpose. A total of 26 wines from different regions, varieties and elaboration processes have been analyzed with an e-nose and tasted by a sensory panel. Successful results have been obtained in most cases for prediction and classification.


Introduction
The most important sensory characteristic in the study of the quality of wine is its aroma. Wine aroma has a great chemical complexity because it is produced by the simultaneous perception of many volatile compounds. This property contributes a 70% of weight in sensory panels with respect to texture and taste. Electronic nose (e-nose) technology is shown as a possibility for aroma profile analysis. Essentially the e-nose is an array of gas sensors with partially overlapped selectivities, a signal collecting unit and a pattern recognition software [1]. On the other hand, pattern analysis constitutes a really important building block in the development of gas sensor instruments given its ability of detecting, identifying and measuring volatile compounds. For these reasons this technology has been proposed as an artificial substitute of the human olfactory system. To successfully design a pattern analysis system for machine olfaction, a rigorous study is necessary on the diverse issues involved in processing multivariate data (see Figure 1): signal preprocessing and feature extraction, dimensionality reduction, classification, regression or clustering and validation [1,2]. A variety of Pattern Recognition and Machine Learning techniques is available for each module and only the preprocessing module is sensor dependent. The first block in Figure 1 represents the e-nose hardware. After sensor signals have been acquired and saved into the computer, the first computational stage (i.e., signal preprocessing) starts and serves various purposes, including compensating for sensor drift, extracting descriptive parameters from sensor array response and preparing the feature vector for further processing. A dimensionality reduction stage projects this initial feature vector onto a lower dimensional space in order to avoid problems associated with high-dimensional, sparse datasets and redundancy. The resulting low-dimensional feature vector is further used to solve a given prediction problem, generally classification, regression or clustering.
The problem of identifying an unknown sample as one from a previously learned odorant set is addressed in classification tasks. In regression tasks, the aim is to predict a set of properties (e.g., concentration for an analyte in a complex mixture or the value of a sensory panel descriptor) from other variables. Finally, in clustering tasks the goal is to learn the structural relationship among different odorants. A final step, sometimes overlooked, is the selection of models and parameter setting and the estimation of true rates for a trained model by means of validation techniques.

Material and Methods
To perform the measurements of wine samples, a home-made and home-developed electronic nose was used. The e-nose and data processing system are described in this section.

Wine Samples
A total of 26 wines (19 red and 7 white) from different regions were tasted by a sensory panel and analyzed by the electronic nose for the experiment. A list of the wine samples is shown in Table 1. Wine samples were acquired in specialized shops and supplied by collaborator wine cellars.

Sensory Panel
A group of 30 people with previous experience in wine analysis were trained in recognizing 45 aromatic compounds from wine. In this training, all compounds were presented to the panelists at their threshold concentration in water and in test wines [3]. Wines were presented in random order to the panelists. No information was given to the assessors about the origin of the samples and no colored lighting assessed panelists. The wine tasting took place in an air-conditioned room (21°C) with isolated booths. Judges assessed the aroma using a tasting evaluation sheet that included 10 sensory descriptors (herbaceous, fruity, flower-like, spicy, vegetal, phenolic, microbiologic, chemical, oxidation and woody-like) and 5 quality parameters (wine quality, aromatic intensity, alteration, persistence and off-flavours). The different terms were evaluated in a scale from 1 to 5 (1, null, very weak; 2, weak; 3, medium; 4, strong; 5, very strong). The average of all the panelists was calculated to build the prediction models. All the sensory evaluations were realized under Spanish Standardization Rules (UNE) [4].

Electronic Nose
An array of 16 thin film tin oxide sensors was prepared by RF sputtering onto alumina and doped with chromium and indium. Sensor thickness varied between 200 and 800 nm. Some sensors were doped with Chromium and Indium, either on the surface or in an intermediate layer. Array composition is described in Figure 2(a). The array was placed in a 24 cm 3 stainless steel cell with a heater and a thermocouple. The operating temperature of sensors is controlled to 250°C with a PID temperature controller. Two sampling methods were available for analysis: purge and trap and static headspace followed by a dynamic injection. In purge and trap, a Tekmar 3100 was used [5]. In headspace, 10 mL of solution were kept in a 50 mL Dreschel bottle at 30°C for 30 minutes followed by 20 minutes carrying the volatile compounds to the sensor cell. The carrier gas was high purity nitrogen (99.9998%) at a constant flow of 200 mL/min [6]. In Figure 3, a block diagram of the measuring system is shown.  The resistance of the sensors was measured with a Keithley 2001 8 1/2 digits digital multimeter (DMM) with a Keithley 7001 40-channels multiplexer connected to the personal computer through a GPIB interface. More details of the system can be found in [7]. Responses of the individual sensors are defined relative to the minimum resistance to a 12% (V/V) solution of ethanol in deionised water for all measurements (calibration solution). The response of the sensors is low, so no special requirements are needed in instrumentation. Two measurements are made each minute, and it is enough to obtain the response curve profile for data analysis. Figure 2(b) shows the typical transient responses of four chemoresistive sensors of the array, operating at 250°C, exposed towards the headspace of one of the wine samples (Pioz-W1). The response of the sensors corresponds to several pulses of 20 minutes of exposition to the tested wine flavour followed by a pure nitrogen purge for 40 minutes. These responses were stored in hard disk and processed later by the data processing system for prediction and classification purposes.

Data Processing System
This section shows the system designed for the analysis of e-nose responses of the wine samples under test. The aim of this system is to perform the classification of the wine samples and the prediction of the sensory panel attributes from the e-nose responses. The data collected were analyzed by means of pattern recognition techniques using commercial software packages.
Matlab ® [8] was used for linear methods like Independent Component Analysis (ICA) [9,10] and non-linear methods based on Artificial Neural Networks (ANNs) like backpropagation and probabilistic neural networks. Partial Least Squares Regression was performed with Unscrambler ® [11].

Preprocessing and Feature Extraction
The aim of feature extraction is to find a low-dimensional mapping f : x ∈ N → y ∈ M (M < N ) that preserves most of the information in the original feature vector x. In our case, a previous optimization of the method of feature extraction and selection were performed [12]; the responses of the individual sensors were defined relative to the minimum resistance to 12% (V/V) of ethanol for all the measurements: where R wine was the minimum resistance of the sensor in the measurement of wine and R calibration was the minimum resistance of the sensor in a solution of 12% of ethanol. This calibration was performed once a week, to eliminate the sensors drift [2]. Data were centered and scaled for further analysis.

Dimensionality Reduction and Classification
The traditional method for reduction in dimensionality and visualization of multivariate measurement data is Principal Component Analysis (PCA) [13]. This work introduces the higher order statistical method called Independent Component Analysis (ICA) [12] as an alternative method of data processing for an e-nose data. Moreover, ICA is an appropriate approach for signal processing as it is demonstrated in [14].
ICA is a statistical and computational technique for revealing hidden factors that underlie sets of random variables or measurements. This technique defines a generative model or the observed multivariate data, which is typically given as a large database of samples. In the model, the data variables are assumed to be linear or nonlinear mixtures of some unknown latent variables, and the mixing system is also unknown. The aim of ICA is the decomposition for multivariate signals in statistically independent component contributions with minimum loss of information [15]. ICA can be seen as an extension to principal component analysis. While PCA is an effective method for data compression, ICA is an effective method for the extraction of independent features [16,17]. It is much more powerful technique, even, capable of finding the underlying factors or sources when classic methods fail completely.
The ICA bilinear model can be written: where X (i × j) is the original experimental raw data matrix and A (i × n) and S T (n × j) are the so-called mixing and source matrices, respectively, in ICA notation. In addition, E (i × j) is the error matrix. The i and j index are respectively the numbers of rows and columns of the data matrix X, and n the number of components included in the bilinear decomposition of Equation (1). ICA algorithms try to find "unmixing" matrix W according to: whereŜ T is the estimation of the source matrix S T . Then, when W is the pseudoinverse of the mixing matrix A, the estimated source signal will be equal to the original source signal S T (provided that error tends to zero):Ŝ The two main assumptions of ICA are: 1. The mixing vectors in A are linearly independent.
2. The components in S T are mutually statistically independent.
On the other hand regression problems constitute a more challenging domain for e-nose instruments [2]. The goal of regression is to establish a predictive model from a set of independent variables (e.g., gas sensor responses) to another set of continuous dependent variables.
Partial Least Squares regression (PLS) combines the properties of multiple linear regression and principal component analysis to produce a technique that is able to accept collinear data and separate the sample noise in order to make linear combinations in the dependent concentration matrix. PLS is the "gold standard" in chemometrics due to its ability to handle collinear data and reduce the number of required calibration samples [18].
As opposed to PCR, which extracts the "latent variables" from the directions of maximum variance in the sensor matrix Y (the eigenvectors of YTY), PLS finds the directions of maximum correlation between the sensor response matrix and the calibration mixtures matrix (Y and C) in a sequential fashion. The first PLS latent variable (t = Yω) is obtained by projecting Y along the eigenvector ω corresponding to the largest eigenvalue of Y T CC T Y [19]. To find the second and subsequent latent variables, the current PLS latent variable is deflated by its ordinary least squares prediction (a simple approach to regression is to assume that the dependent variables can be predicted from a linear combination of the sensor responses) and the eigen-analysis is repeated. A stopping point for the sequential expansion is determined through cross-validation. Published works on multicomponent analysis using gas sensor arrays and PLS may be found in references [20][21][22][23].

Artificial Neural Networks
The discrimination of the tested wines belonging to the same class has been tackled with a pattern recognizer based on ANN providing nonlinearity in the multivariate classification performance. Two types of neural networks have been tested for the classification of the wine aroma.
Also, Leave-One-Out (LOO) cross validation was applied to check the performance of the network [24]. LOO consists in training N distinct nets (where N is the number of measurements) by using N − 1 training vectors, while the validation of the trained network is carried out by using the remaining vector, excluded from the training set. This procedure is repeated N times until all vectors are validated [25].

Backpropagation Networks
Backpropagation algorithm is the most suitable learning rule for multi-layer perceptrons. It involves two stages: firstly, a feedforward stage where the exterior input information on the input nodes is propagated forward in order to compute the output information indicators at the output unit, and secondly, a backward phase where alterations to the connection weights are adjusted based on the differences between the computed and the actual indications at the output units [26,27].
The information processing that it carry out is basically an approximation of a mapping or function: by means of training over a set of example mapping, y k = f (y k ), where k = 1, 2, 3..., N The backpropagation network architecture used has a learning rate of 0.01 and training times less than 3 minutes. Also, it is formed by three layers: the input layer has 16 neurons corresponding to the 16 sensors, a variable number of neurons with tansig function for the hidden layer, and nine neurons with hardlim function for the output layer, the same number of existing classes.

Probabilistic Networks
PNN is a kind of feedforward neural network. The original PNN structure is a direct neural network implementation of Parzen nonparametric Probability Density Function (PDF) estimation and Bayes classification rule [28,29]. The standard training procedure of PNN requires a single pass over all the patterns of the training set [29].
If X ∈ R d is a d-dimensional pattern vectors and its associated class is i ∈ (S 1 , S 2 , S 3 , ..., S k ), where k is the number of possible classes, and a posteriori probability P r (S i |x), that is from class S i , is by Bayes' rule: where P r (x|S i ), with i = 1, 2, 3, ..., k is a priori pdf of the pattern in classes to be separated. And P r (S i ) with i = 1, 2, 3, ..., k are a priori probabilities of the classes. P (x) is assumed to be a constant. The decision rule is to select class S i for with P r (S i |x) is maximum. This will happen if for all j = i A Probabilistic Neural Network (PNN) composed of three layers, with radial basis transfer functions in the hidden layer and a competitive one in the output [2], was used for classification purposes.

Results and Discussion
Wine samples were tasted by the sensory panel and analyzed by the electronic nose. Only headspace sampling method have been used in electronic nose because of the purge and trap did not offer good prediction ability due to the concentration and elimination of several chemical compounds in the trap [30]. Matlab® program was used for preprocessing and feature extraction. This program automatically processes the measurements files and generates the output vectors for dimensionality reduction. Besides, the program uses the neural networks toolbox and it performs the dimensionality reduction using PCA, LDA or ICA and classification with several types of Neural Networks (BackPropagation and Probabilistic Neural Networks). Finally it generates the corresponding plots and the confusion matrix obtained with the validation of ANNs. A summary of the results obtained in these analyses is shown in the following paragraphs.

Sensory Panel Responses
Taste of the samples was performed by the panel. The data obtained were processed and the average of all the panelists was calculated for each wine and descriptor. The sensory attributes obtained for young white, young red and oak aged wines are shown in Table 2. In young white wines the flower and fruity aromas predominate, whereas in young red wines the fruity and spicy aromas predominate, and the fruity, spicy and woody-like aromas predominate in oak aged wines.

E-Nose Response
As mentioned in Section 2.3, Figure 2 represents the typical transient responses of four chemoresistive sensors, operating at 250°C, exposed towards the headspace of the blank wine. The response of the sensors corresponds to several pulses of 20 minutes of exposition to the sample of wine, followed by a pure nitrogen purge for 40 minutes. When electrovalves are switched on and wine aroma is carried to the sensors cell, the resistance of the sensors decrease and when electrovalves are switched off, the response increase to the equilibrium values. Figures 4-6 show the radial plot of the 16 sensors responses to the three sets of samples measured: white, red and aged wines respectively. It can be noticed that the sensors give different signal for each different sample, illustrating the discrimination capabilities of the array.
No relation among sensors response and aromatic profile of each wine can be established according to Figures 4-6 and a classification of samples cannot be performed. The variation in response intensity is due to different headspace composition of the wines. Each sample has its characteristic organic volatile compounds profile.

Dimensionality Reduction and Classification
For a better visualization of the data and to eliminate redundancy, dimensionality reduction techniques are generally used. In this case ICA was carried out using signals corresponding to five repeated exposures collected in different days. The plot of the two first Independent Components for the measurements of young white, young red and aged red wines are shown in Figures 7, 8 and 9 respectively. As shown in Figure 7, the clusters corresponding to each wine are well separated except some partial overlapping between samples of W7 and W5. The young white wines are clearly separated among them. Figure 8 shows the ICA plot for young red wines, where a partial overlapping among some samples of R1, R2 and R9 can be observed. The clusters of the other wines are well separated. Figure 9 shows the ICA plot for aged red wines. In this figure there are several overlapping zones between some wines due to the similarity of the wines. Some of these zones correspond to measurements of the same wine but aged in a different type of oak barrel, e.g., M5-M6 and M9-M10. On the contrary, other wine samples, such as M1-M2, M3-M4 and M7-M8 are well separated. A regression model to estimate sensory panel indicators from electronic nose was built by partial least squares regression (PLS). Models were cross-validated by the leave-one-out method. Preprocessing, modelling and validation were performed with Unscrambler ® . Parameters used to evaluate the models prediction ability were: root mean square prediction error and correlation coefficient between real and predicted Y variables. All variables were normalized prior to the analysis.

Sensory Data versus E-Nose Data
The correlation coefficients and mean squared error values calculated for all the parameters in this extraction technique are listed in Table 3. These results show that good correlation coefficients and mean squared error values are obtained except for woody-like parameter. Nevertheless, results could be improved by analyzing more wine samples with sensory panel and e-nose in order to train the model and obtain more accuracy in the predictions. Models were built in order to predict the sensorial descriptors and they exhibit good prediction ability, with correlations coefficients from 0.50 to 0.95 and mean squared error values between 4.31 ×10 −4 and 1.29 ×10 −3 . Figure 10 shows the relationship between estimated (y-axis) and real values of sensorial descriptors (x-axis) based on the PLS model calculated from e-nose responses using headspace method. Validation points are shown in figures obtained with cross-validation. As shown in Figure 10, there is a close relationship between sensory panel attributes from sensory evaluation and those estimated by the PLS model. Therefore, a good prediction of these variables could be performed from the e-nose measurements.

Artificial Neural Networks
Two types of neural networks were used for classification: Backpropagation and Probabilistic Neural Networks. In general the output of the neural network is given as a confusion matrix. From the confusion matrix a success rate is defined as the number of correct classified measures in each class over the total number of measures in this class.

Backpropagation Networks
In the training of feedforward networks, several numbers of neurons in the hidden layer have been tested. The optimal number turned out to be 14 neurons. The network was trained with the data obtained with the wine samples measurements. Leave-one-out (LOO) validation was performed in order to check the performance of the network. The classification success was 97% for young white wines, 87% for young red wines and 84% for aged red wines. As an example, the confusion matrix obtained for white wines is shown in Table 4.

Probabilistic Networks
Probabilistic neural network were trained with the same data than backpropagation networks. Leave-one-out (LOO) validation was also performed. The classification success was 94% for young white wines, 84% for young red wines and 82% for aged red wines.

Conclusions and Future Works
An electronic nose has been designed in this work for prediction and classification of wine samples. This e-nose consist of a gas sensor array combined with Partial Least Squared and a pattern recognition system based on Independent Component Analysis for dimensionality reduction. Results obtained through wine headspace technique have been used for predicting the sensory panel descriptors of the wines analyzed. In general, good results have been obtained for the correlation between sensory panel and e-nose due to the similarity between this sampling system and the natural process of taste in humans. This is a consequence to the global response of receptors (human nose and chemical sensors) to the whole aroma, allowing to predict wine quality parameters and sensory descriptors of sensory panel.
The signals recorded from the multisensor array exposed to the aroma of the wine samples have also been used for classifying these samples through ICA as explorative techniques and dimensionality reduction combined with a pattern recognizer based on ANN. Results show classification success ratios higher than 87% for all types of wine.
Finally, it is important to remark that, results could be improved by measuring more wine samples in order to increase the database and to train the networks and model for classification and prediction. Samples should be selected in order to present a wide range in the parameters to predict.