Identification of Tea Storage Times by Linear Discrimination Analysis and Back-Propagation Neural Network Techniques Based on the Eigenvalues of Principal Components Analysis of E-Nose Sensor Signals

An electronic nose (E-nose) was employed to detect the aroma of green tea after different storage times. Longjing green tea dry leaves, beverages and residues were detected with an E-nose, respectively. In order to decrease the data dimensionality and optimize the feature vector, the E-nose sensor response data were analyzed by principal components analysis (PCA) and the five main principal components values were extracted as the input for the discrimination analysis. The storage time (0, 60, 120, 180 and 240 days) was better discriminated by linear discrimination analysis (LDA) and was predicted by the back-propagation neural network (BPNN) method. The results showed that the discrimination and testing results based on the tea leaves were better than those based on tea beverages and tea residues. The mean errors of the tea leaf data were 9, 2.73, 3.93, 6.33 and 6.8 days, respectively.


Introduction
Tea is the one of the most widely consumed drinks in the World because of its health-giving, dietetic and even therapeutic qualities. There are numerous beneficial components in green tea, such as vitamins, catechins, amino acids, and many inorganic nutritional components. In particular, green tea antioxidants may have a protective role in inhibiting cancer [1]. However, the storage time could affect the quality of the tea, which could decrease and many of the beneficial components could be gradually lost during storage. When the quality of the tea decreases and beneficial components are lost, the aroma of the green tea will change.
Aroma plays an important role in the evaluation of tea quality. Commonly, the aroma and the taste, which are main indicators of the quality of a tea, are affinitive and interact each other, so aroma is an important index for the quality evaluation of tea. Tea can lose its aroma gradually during storage because of the influence of the temperature, humidity, sunrays and oxidation. When the storage time is longer, the aroma becomes fainter and the tea quality is inferior, so tea quality can be evaluated by detecting the changes of its aroma during storage.
Usually, tea quality is evaluated by a human taste panel, but these sensory test results can be affected by subjective factors such as emotion, exhaustion and physiological conditions. The E-nose is an increasingly fast, reliable and non-destructive technology, which can be made easy-to-use and costefficient, and many positive applications of the E-nose technology have been reported in numerous fields [2][3][4]. For example, E-noses have been used to evaluate the quality of modified atmosphere packaged poultry meat [5], spoiled beef [6], fish [7], milk [8] and apricot cultivars [9]. An E-nose was used to discriminate four types of red wines which were made from the same variety of grapes and came from the same cellar [10]. Successful discrimination by an E-nose of different Spanish wines made from different grapes was reported too [11].
There is some literature on the application of E-noses in the discrimination of different types of teas [12,13]. The E-noses were used to classify tea samples that had undergone different processing methods. In such cases, a very good sensitivity to the tea aroma and satisfactory time stability was observed for metal oxide semiconductor (MOS) sensors. Moreover, the discriminated analytical methods were discussed and reported. Recently, Bhattacharyya et al. monitored the volatile components of black tea during the fermentation process and detected the optimum fermentation time on the basis of peaks in the sensor outputs [14]. In his other paper, a new E-nose-based approach for monitoring tea aroma during fermentation is proposed. Two methods, namely the 2-Norm method (2NM) and the Mahalanobis distance method (MDM) were tested and the results were correlated with the results of colorimetric tests and human expert evaluation [15].
In the research related to E-noses, the datasets were analyzed by pattern recognition methods, but little detailed information is available on the pretreatment of the data obtained by the E-nose. In this paper, in order to decrease the data dimensionality and optimize the results the vector principal component analysis (PCA) method was employed for data pretreatment. The five main principal components values were extracted and used as the input of the LDA and BP neural network studies to examine the applicability of an E-nose for assessment of the storage time of the tea.

Electronic Nose and Data Acquisition
The experimental instrument was a portable electronic nose (E-nose, PEN2) provided by WNA Airsense Analysentechnik GmbH (Schwerin, Germany). The device was equipped with 10 different metal oxide sensors positioned in a small chamber. The E-nose system consisted of a sampling apparatus, a detector unit containing the array of sensors, and pattern recognition software for data recording and analysis. The used sensors and their main attributes were described in our previous reports [4,[16][17][18].
During the measurement process the headspace gas was pumped into the sensor chamber with a constant rate of 100 mL/min via Teflon-tubing connected to a needle. When the gas accumulated in the headspace of vials was pumped into the sensor chamber, the ratio of conductance of each sensor changed. The sensor response was expressed as the ratio of conductance (G/G0) (G and G0, the conductivity of the sensors when the sample gas or zero gas blows over). The measurement procedure was controlled by a computer program. The measurement phase lasted for 60 s, which was enough for the sensors to reach stable values. The interval for data collection was 1 s. A computer recorded the response of the E-nose. When the measurement was completed, the acquired data was properly stored for later use and a cleaning phase lasting 70 s to clean the circuit and return sensors to their baseline values began.

Experimental Samples and Storage of the Tea
Longjing green tea (AAA grade, ¥ 2,400/kg in international trade) was produced and obtained on 1-Jul-05 from the Tea Academy of Zhejiang University. The tea samples were sealed in small tin bags, each of which contained 5 g tea. Two hundred and twenty five tea packages were prepared and 45 packages tea were detected in each experiment. The first 45 samples were detected on 1-Jul-05; others were kept under cold storage in the refrigerator at 4 °C. The second 45 packages were taken out from the refrigerator and detected after two months (1/9/2005), the third detection was carried out after four months (1/11/2005), the fourth detection was performed after six months (1/1/2006) and the fifth detection was performed after eight months (1/3/2006).

Tea Leaves Testing Sample Preparation
During this experiment 45 packages of tea samples were taken from the refrigerator and placed into 45 vials (500 mL), which were tightly sealed for 45 min. All the detections were carrying out at a constant temperature of 25 ± 1 o C. Headspace gas was pumped into the sensor chamber of the E-nose.

Tea Beverage and the Tea Residue Testing Sample Preparation
After the tea leaves samples were detected, they were then brewed based on the criteria of the sensory panel assessment (SB/T 10157-93) [19]. Five g of tea leaves was brewed with 250 mL boiled table-water (the ratio of the tea leaves to water was 1:50), and the tea beverage was filtered after 5 min. The tea beverages and tea residues were separated, sealed in 500 mL vials and maintained for 45 min. The tea beverages were cooled to a temperature of 25 ± 1 o C during the headspace generation time. In order to protect the sensors, silica gel was used to absorb the vapor in the vials.

Data Analysis
PCA [3] is a projection method that allows an easy visualization of all the information contained in a dataset. In addition, PCA helps to find out in what respect a sample is different from others and which variables contribute the most to this difference. The data was standardized before carrying out the PCA, in order to avoid the effect of the different dimensions. The five main principal component values were extracted and then used as the input for the discrimination analysis. The choice of five principal components will be justified by the results (Table 1).
LDA is one of the most used classification procedures, which has proven successful in many applications. The method maximizes the variance between categories and minimizes the variance within categories.
The so-called back-propagation neural network (BPNN) [3] is one of the more promising future technologies in computing. The network processes the inputs and compares the resulting outputs against the desired outputs. Errors are then propagated back through the system, causing the system to adjust the weights that control the network. This process occurs over and over as the weights are continually tweaked. During the training of a network the same set of data is processed many times as the connection weights are refined. Figure 1 shows the typical responses of the E-nose for the tea leaves, tea beverages and tea residues. Each curve, marked R(1)… R(10), represents a different sensor transient that measures the conductivity of each sensor against time due to the electro-valve action when the volatiles reached the measurement chamber. In the initial period, the conductivity of each sensor increases rapidly, and reaches its maximum. Then the curve descends rapidly, and finally the change of the curve stabilizes after about 50 s. There were different response curve of each sensor for the tea leaves, tea beverages and tea residues.

Principal Component Analysis (PCA)
The E-nose response for the tea contained in the original feature vectors [X = (x 1 , x 2 ,…x 80 )] has too much information, and we can be sure that there is some information of little significance for the subsequent discrimination analysis. In order to improve the calculation efficiency and obtain a more efficient analysis, the dimensionality of the original feature vectors must be decreased. The information with little significance should be neglected and the useful information should be preserved. In the paper, to achieve this goal, PCA was carried out using X = (x 1 , x 2 ,…x 80 ). Table 1 shows the eigenvalues and the accumulated contribution rate of the five main principal components of the tea leaf samples, tea beverage samples and tea residue samples. In this way, the dimensionality of the original feature vectors was decreased from 80 to 5. The results in Table 1 show that the accumulated contribution rate of the front five principal components was all more than 80%, most information of the original feature vectors was conserved, and then the five main principal components were used as the inputs for the discrimination analysis.

Linear Discrimination Analysis (LDA)
The five main principal components values are extracted by PCA and are analyzed by LDA. All the samples are separated into two groups. One group, the training set, contained 150 samples, and another is the test set which contained 75 samples. The analytical results of LDA based on tea leaves, tea beverages and tea residues are shown in Figures 2-4 (solid marks-training samples, hollow dotstest samples).
In Figure 2 better analytical results are obtained for both the training set and test set and the different storage times are obviously discriminated. This can be explained by the fact that there are more differences in the volatile component of the tea leaves with different storage times, which induces the responses of the sensors to change.
As shown in Figures 3-4, the analytical results on the basis of tea beverage or tea residues are worse than that on the basis of tea leaves. In Figure 3, the tea beverage samples for fresh tea are obviously separated, but the other samples are overlapped in a certain extent. In Figure 4, the analytical results based on the tea residue samples are similar to those based on tea beverage samples.

Back-Propagation Neural Network (BPNN)
In order to study the optimized discrimination method, a BPNN study was performed. The five main principal components acted as the inputs of the BPNN, and the output of the BPNN is the storage time (0, 60, 120, 180 and 240 days) of the tea samples. The network topology is 5-10-5, net.trainParam.lr = 0.08 and net.trainParam.lr_inc = 1.08. The software used is Matlab.
All the samples were also randomly separated into two groups. One group contained 150 samples (30 samples for each storage time) trained the BPNN and another group containing 75 samples (15 samples for each storage time) was tested. Test errors of the BPNN are recorded and these errors are calculated as: where x i is the true storage time (x i = 0, 60, 120, 180, 240), x ij is the predicted storage time. The computed results are shown in Table 2. As shown in Table 2, the i X is smaller for the tea leaves except for "0 day", and the i X value is slightly higher when based on the tea beverages and tea residues. The BPNN results showed that the better prediction of the tea storage time was obtained based on the tea leaves using eigenvalues of the E-nose signals extracted by PCA.

Conclusions
First this study demonstrates the feasibility of using an E-nose as an analytical tool for the recognition of volatile components emitted by differently aged tea. Second it is feasible to obtain feature values extracted by a PCA method and then the five main principal components values are used as the input of the discrimination analysis. This method decreases the data dimensionality and optimizes the feature vector. The LDA results show that the classification result of the tea storage time was best when based on the tea leaves, and the BPNN results show that the predicted error is smaller when based on tea leaf samples than that on the basis of tea beverage samples and tea residue samples.