Mining Feature of Data Fusion in the Classification of Beer Flavor Information Using E-Tongue and E-Nose

Multi-sensor data fusion can provide more comprehensive and more accurate analysis results. However, it also brings some redundant information, which is an important issue with respect to finding a feature-mining method for intuitive and efficient analysis. This paper demonstrates a feature-mining method based on variable accumulation to find the best expression form and variables’ behavior affecting beer flavor. First, e-tongue and e-nose were used to gather the taste and olfactory information of beer, respectively. Second, principal component analysis (PCA), genetic algorithm-partial least squares (GA-PLS), and variable importance of projection (VIP) scores were applied to select feature variables of the original fusion set. Finally, the classification models based on support vector machine (SVM), random forests (RF), and extreme learning machine (ELM) were established to evaluate the efficiency of the feature-mining method. The result shows that the feature-mining method based on variable accumulation obtains the main feature affecting beer flavor information, and the best classification performance for the SVM, RF, and ELM models with 96.67%, 94.44%, and 98.33% prediction accuracy, respectively.


Introduction
The consumption of beer, as a beverage, ranks third in the world after water and tea. It is rich in various amino acids, vitamins, and other nutrients needed by the human body [1,2], which is euphemistically known as 'liquid bread'. Barley germination is the main raw material for beer brewing, which makes beer a low-alcohol and high-nutrition drink. Additionally, it promotes digestion, spleen activity, appetite, and other functions [3][4][5].
Flavor information is one of the reference factors that reflects the beer's features, which consists of taste and olfactory information. Due to different manufacturing processes, both the taste and the smell of different beers are different. According to consumer preference, they choose beers with different flavors. Therefore, accurately and efficiently identifying different beers, and finding important features, are particularly significant. Meanwhile, it is also meaningful for quality control, storage, and authenticity recognition. A crucial observation was obtained in the psychology literature that the intensity of the senses could be overlapped, and people usually mistake volatile substances as 'taste' [6]. When we cannot smell, it is difficult to distinguish apple and potato, red wine and coffee. The food odor can stimulate people to salivate, which improves our sensation. When drinking fruit juice with the nose squeezed, sweet and sour can be felt by the tongue. Setting the nose free after drinking, the fruit juice flavor information will appear, so the sensory experience of food must be fully dependent on both the tongue and nose [7]. The conventional physical and chemical analysis methods cannot reflect the flavor characteristics of beer [8][9][10]. The most used method is sensory evaluation [11], but this method is quite subjective since the evaluation result changes with the physical condition and evaluation [11], but this method is quite subjective since the evaluation result changes with the physical condition and environment. It is time-consuming and has low efficiency. As objective and effective intelligent bionic instruments that are easy to operate, offer high precision, and are timesaving, among other advantages, e-tongue [12][13][14] and e-nose [15][16][17] are gradually replacing the traditional detection methods.
The e-tongue and e-nose could be applied to analyze the beer [18,19]. However, the flavor features of beer are complicated due to its composition and concentration. Therefore, the e-tongue and e-nose fusion system has a significant advantage of obtaining comprehensive information of taste and olfactory characteristics. The combined information based on the instruments is called data fusion [20]. The data-level (low-level) fusion combines the original sensing information of multiple detection instruments to obtain new data. The feature-level (medium-level) fusion combines features extracted from the original sensing information of multiple detection instruments. The decision-level (high-level) fusion combines sensor information after each sensor has made a preliminary determination, then fuses that information to obtain a final decision. Multi-sensor data fusion based on e-tongue and e-nose has been used widely, for instance, in the blending ratio prediction of old frying oil [21], classification of different honey and rice samples [22,23], nondestructive detection of fish freshness [24], and the evaluation of tea and strawberry juice flavor [25,26]. The previous studies showed that multi-sensor data fusion in the classification of food and quality assessment were much closer to the human perception mode and improved the analysis results. However, it also brings some irrelevant information, and even noisy information. The studies cannot give an effective feature selection method, which could lead to an unfavorable final classification and prediction, and increase the complexity of the model prediction. More importantly, we should adopt fewer features to reduce the sample detection difficulty and detection time, and the best fusion and identification methods are applied to improve the detection efficiency and classification accuracy rate of samples in real projects. For the feature method research, experts took advantage of GA-PLS to reduce the number of variables of electrochemical signals [27], extracted the features from sensor response signals in the time and frequency domains [28], optimized the number of channel inputs of the olfactory nervous system bionic model based on PCA [29], selected the variable characteristics of the multi-sensor based on the analysis of variance (ANOVA) [30], and applied multidimensional projection techniques (interactive document map) to analyze the capacitance data [31]. These studies were significant for reducing the feature dimension and achieving high-precision prediction of the sample. However, studies lacked a method to assess the importance of variables, and little information on the variables' behavior and correlation were offered.
This study provides a feature-mining method to obtain the best form of expression affecting beer flavor information. It improves the accuracy rate of identification and reduces the complexity of model prediction. Under the practical application background, it is significant that models are fast and accurate. Here, we select the SVM, RF, and ELM as the appraisal models, which have good generalizability and fast running time. We observe the classification performance of evaluation models to find the main feature and the variable's behavior, and contrast the analysis results of the different evaluation models to verify the validity and universality of the method. Figure 1 shows the technical route for this paper.

Beer
Five different beers were used in this study, and their alcohol degree, original wort concentration, and raw materials were obtained from the beer bottle labels. Table 1 lists all of these. The SA-402B e-tongue, developed by the Japan Insent Company, was used to gather beer taste information. The instrument includes a sensor array, an automatic detection system, a data acquisition system, and data analysis software. The sensor array consists of five taste sensors, and each sensor is composed of a unique artificial lipid-based membrane. Two Ag/AgCl electrodes containing an inner solution containing 3.33 M KCl and saturated AgCl were used for the reference electrode. The sensor AAE was applied to detect umami substances. The sensor CT0 was applied to detect salty substances. The sensor CA0 was applied to detect sour substances. The sensor C00 was applied to detect bitter substances. The sensor AE1 was applied to detect astringent substances. The positive sensor array consisted of C00, AE1, and a reference electrode. The negative sensor array consisted of CT0, CA0, AAE, and a reference electrode. Figure 2 shows the SA-402B e-tongue system.

Beer
Five different beers were used in this study, and their alcohol degree, original wort concentration, and raw materials were obtained from the beer bottle labels. Table 1 lists all of these. The SA-402B e-tongue, developed by the Japan Insent Company, was used to gather beer taste information. The instrument includes a sensor array, an automatic detection system, a data acquisition system, and data analysis software. The sensor array consists of five taste sensors, and each sensor is composed of a unique artificial lipid-based membrane. Two Ag/AgCl electrodes containing an inner solution containing 3.33 M KCl and saturated AgCl were used for the reference electrode. The sensor AAE was applied to detect umami substances. The sensor CT0 was applied to detect salty substances. The sensor CA0 was applied to detect sour substances. The sensor C00 was applied to detect bitter substances. The sensor AE1 was applied to detect astringent substances. The positive sensor array consisted of C00, AE1, and a reference electrode. The negative sensor array consisted of CT0, CA0, AAE, and a reference electrode. Figure 2 shows the SA-402B e-tongue system. The sample solution, reference solution, positive cleaning solution, and negative cleaning solution were put into the reagent tank. The automatic detection device manipulated the robot arm to collect the sample's taste information by setting the system parameter. When the taste substances were absorbed by the unique artificial lipid-based membrane, the potential difference between the working electrode and the reference electrode was measured. Forty milliliter beer samples were placed into the clean measuring cups. Before the test began, the sensor was cleaned in the positive and negative cleaning solution for 90 s, after which it was cleaned in the reference solution for 120 s, and then repeated in another reference solution. After the balance was reached in the reference The sample solution, reference solution, positive cleaning solution, and negative cleaning solution were put into the reagent tank. The automatic detection device manipulated the robot arm to collect the sample's taste information by setting the system parameter. When the taste substances were absorbed by the unique artificial lipid-based membrane, the potential difference between the working electrode and the reference electrode was measured. Forty milliliter beer samples were placed into the clean measuring cups. Before the test began, the sensor was cleaned in the positive and negative cleaning solution for 90 s, after which it was cleaned in the reference solution for 120 s, and then repeated in another reference solution. After the balance was reached in the reference solution, the test was started. The sensor tested each sample for 30 s, then the sensor was cleaned sensor twice, quickly, and returned to the reference solution to measure the aftertaste value (cpa), the measurement was completed once. After each measurement, the sensors were cleaned automatically. Six samples of each beer were prepared for measuring three times. Finally, a total of 90 samples were obtained. The experimental temperature was 20 ± 0.5 • C, and the relative humidity was 65 ± 2% RH. The intensity value of each sensor at the 30th second was extracted and analyzed in this study.

E-Nose Data Acquisition
The PEN3 e-nose, developed by the Airsense Analytics Inc. (Schwerin, Germany), was used to gather beer olfactory information. The instrument includes a gas collection device, a gas detection unit, and an air purification device. The gas detection unit includes a sensor array, and a pattern recognition analysis and processing system. The sensor array contains 10 metal oxide gas sensors, which can achieve the detection of olfactory cross-sensitive information. The components to be detected by sensors were listed as below: aromatic (W1C), hydrocarbon (W5S), aromatic (W3C), hydrogen (W6S), arom-aliph (W5C), broad-methane (W1S), sulfur-organic (W1W), broad-alcohol (W2S), sulfur-chlorine (W2W), and methane-aliphatic (W3S). Figure 3 shows the PEN3 e-nose system.
Six samples of each beer were prepared for measuring three times. Finally, a total of 90 samples were obtained. The experimental temperature was 20 ± 0.5 °C, and the relative humidity was 65 ± 2% RH. The intensity value of each sensor at the 30th second was extracted and analyzed in this study.

E-Nose Data Acquisition
The PEN3 e-nose, developed by the Airsense Analytics Inc. (Schwerin, Germany), was used to gather beer olfactory information. The instrument includes a gas collection device, a gas detection unit, and an air purification device. The gas detection unit includes a sensor array, and a pattern recognition analysis and processing system. The sensor array contains 10 metal oxide gas sensors, which can achieve the detection of olfactory cross-sensitive information. The components to be detected by sensors were listed as below: aromatic (W1C), hydrocarbon (W5S), aromatic (W3C), hydrogen (W6S), arom-aliph (W5C), broad-methane (W1S), sulfur-organic (W1W), broad-alcohol (W2S), sulfur-chlorine (W2W), and methane-aliphatic (W3S). Figure 3 shows the PEN3 e-nose system.
Five milliliters of a beer sample was put into a cork-tightened 50-mL sampling chamber for 10 min to ensure sufficient volatility. Before the test began, the gas chamber was cleaned with a gas flow to normalize the sensor signal, which was filtered by active charcoal at a flow rate of 300 mL/min for 60 s. The detection time was 80 s at the gas flow speed of 300 mL/min, so that the sensor reached a stable value. The sensor response value was defined as G/G0 (G0/G), where G is the conductivity of the sensor when the sample to be tested entered the sensor gas detection unit and G0 is the conductivity of the sensor when the pure gas entered the sensor gas detection unit. Eighteen samples of each beer were prepared for measurement. Finally, a total of 90 samples were obtained. The experimental temperature was 20 ± 0.5 °C, and the relative humidity was 65 ± 2% RH. The intensity value of each sensor at the 60th second was extracted and analyzed in this study.

Variable Selection
Principal component analysis (PCA) is a multivariate statistical analysis method which can transform the data into a new coordinate system, converting the multivariate information into several synthetic variables [32]. PCA preserves the useful information of the original variable, reducing the dimension of the multidimensional dataset, and extracts the principal component. The number of principal components is calculated according to the maximum variance principle. We determined the number of principal components according to the cumulative contribution rate and practical Five milliliters of a beer sample was put into a cork-tightened 50-mL sampling chamber for 10 min to ensure sufficient volatility. Before the test began, the gas chamber was cleaned with a gas flow to normalize the sensor signal, which was filtered by active charcoal at a flow rate of 300 mL/min for 60 s. The detection time was 80 s at the gas flow speed of 300 mL/min, so that the sensor reached a stable value. The sensor response value was defined as G/G0 (G0/G), where G is the conductivity of the sensor when the sample to be tested entered the sensor gas detection unit and G0 is the conductivity of the sensor when the pure gas entered the sensor gas detection unit. Eighteen samples of each beer were prepared for measurement. Finally, a total of 90 samples were obtained. The experimental temperature was 20 ± 0.5 • C, and the relative humidity was 65 ± 2% RH. The intensity value of each sensor at the 60th second was extracted and analyzed in this study.

Variable Selection
Principal component analysis (PCA) is a multivariate statistical analysis method which can transform the data into a new coordinate system, converting the multivariate information into several synthetic variables [32]. PCA preserves the useful information of the original variable, reducing the dimension of the multidimensional dataset, and extracts the principal component. The number of principal components is calculated according to the maximum variance principle. We determined the number of principal components according to the cumulative contribution rate and practical requirements. In this work, in order to obtain as much information as possible from the original fusion set, we made sure that the principal component with a cumulative variance contribution was 99%.
The feature variables can be screened by genetic algorithm-partial least squares (GA-PLS) to remove redundant variables for constructing the classification model [33,34]. In the process of variable selection, a randomization test is used to determine whether it can be applied; usually, the randomization test value is less than 5. As the number of variables increases, the cross-validated exceptions variance (CV%) value gradually increases to reach a maximum, and finally maintains a relatively stable state. Meanwhile, the root mean square error of cross-validation (RMSECV) gradually decreases to reach a minimum, and finally maintains a relatively stable state. In the calculation process, the chromosome corresponds to the highest CV%, and smallest RMSECV is the best optimal variable subset.
In the PLS, the explanatory ability of the independent variable to the dependent variable is measured by the variable importance of projection (VIP) scores [35]. The marginal contribution of the independent variable to the principal component is called VIP. The VIP definition is based on the fact that the explanatory ability of the independent variable to the dependent variable is passed through t, and if the explanatory ability of t to the dependent variable is strong, and the independent variable plays a very important role for t, we think that the explanatory ability of the independent variable to the dependent variable will be large. In this study, the variable importance of the e-tongue and e-nose fusion set is sorted based on the VIP scores.

Support Vector Machines (SVM)
SVM was first proposed by Cortes and Vapnik [36], and is a supervised learning model for classification and regression. The main idea is to establish a classification hyperplane as a decision plane. The SVM uses the kernel function to map the data to the high-dimensional space, making it as linear as possible. The kernel functions include linear kernel, polynomial kernel, radial basis kernel (RBF), Fourier kernel, spine kernel, and sigmoid nucleus in SVM. Compared with the kernel function and previous studies, the RBF kernel function gave an excellent classification performance [37][38][39]. Whether the sample is small or large, high dimension or low dimension, the RBF kernel function is applicable. Therefore, this paper used the RBF as the SVM classification kernel function.
The SVM algorithm proceeds as follows: Set the detected data vector to be N-dimensional, and then the L sets can be represented as The hyperplane constructed as: where is the weight coefficient of the decision plane, ϕ(x) is a nonlinear mapping function, and b is the domain value for the category division. In order to minimize the structural risk, the optimal classification plane satisfies the condition as: Introducing the nonnegative slack variable ξ i , the classification error is allowed within a certain range. Therefore, the optimization problem is translated into: where c is the penalty factor to control the complexity and approximation error of the model, and determine the generalizability of the SVM. Here, introducing the Lagrange multiplier algorithm, the optimization problem is transformed into dual form: In this paper, the RBF kernel function is introduced: where g is the kernel function parameter, which is related to the input space range or width. The larger the sample input space range is, the larger the value is. In contrast, the smaller the sample input space range is, the smaller the value is. The above optimization problem is translated into: Therefore, the minimization problem depends on the parameters c and g, and the correct and effective selection of parameters would show a good classification performance for SVM. Thus, GA was combined with SVM to optimize the penalty factor c and the kernel function parameter g. The parameters of the GA are initialized as follows: maximum generation was 100, population was 20, the search range of c was 0 to 100, and that of g was 0 to 1000.

Random Forests (RF)
RF is a nonlinear classification and regression algorithm, which was first proposed by Tin Kam in 1995. In 2001, Breiman conducted a deeper research [40]. The method combines bootstrap aggregating and random subspace successfully. The essence of RF is a classifier that contains a number of decision trees that are not associated with each other. When the data is input into a random forest, the classification result is recorded by each decision tree. Finally, the category of data is voted by decision trees. The RF shows good efficiency in practical applications, such as image processing, environmental monitoring, and medical diagnosis [41][42][43].
The RF algorithm proceeds as follows: (1) Using bootstrap sampling to generate T training sets S 1 , S 2 , · · · , S T randomly; (2) Each training set is used to generate the decision tree C 1 , C 2 , · · · , C T . The value of the split property set for each tree is mtry. The mtry value is the square root of the number of input variables. In general, the value of mtry remains stable throughout the forest development process; (3) Each tree has a complete development without taking pruning; (4) For testing set X, each decision tree is used to test and obtain the category C 1 (X), C 2 (X), · · · , C T (X); and (5) The category of the testing set is voted by decision trees.

Extreme Learning Machine (ELM)
Extreme learning machine (ELM) is a new algorithm for regression and classification, which was first proposed by Huang of Nanyang Technological University [40]. The essence of ELM is a single hidden layer feed-forward neural network (SLFN). The difference with other SLFN is that ELM randomly generates the connection weights (w) and threshold (b), and without adjustment in the training process. The optimal results can be obtained by adjusting the number of neurons in the hidden layer. Compared with the traditional training methods, this method has the advantages of fast learning speed and good generalization performance. In recent years, ELM has gained attention widely, such as with on-line fault monitoring, price forecasting, and control chart pattern recognition [44][45][46].
The ELM algorithm proceeds as follows: The activation function of neurons in the hidden layer is G. The ELM model can be represented as: where a i = [a 1 , a 2 , · · · , a n ] T is the weight vector of the ith hidden layer node and the input node. β i = [β 1 , β 2 , · · · , β n ] T is the weight vector of the ith hidden layer node and the input node. b i is the threshold of the ith hidden node. N is the number of hidden neurons. Equation (8) can be abbreviated as: where: where H is the output matrix of the hidden layer, and the output weights can be obtained by solving the least squares solution of the linear equations: The least squares solution as: where H + is the Moore-Penrose generalized inverse of the hidden layer output matrix H.

Allocation of Datasets and the Model Prediction Process
The 90 samples of original data were divided into two groups randomly. One group included 72 samples, which was used as a training set to build the model. The remaining 18 samples were used as a testing set to verify the classification performance of the model.
In SVM, in order to avoid over-learning and insufficient learning when searching the best parameters, the fitness function value of the GA was the highest accuracy rate of the training set under five-fold cross-validation, and the best c and g were selected when the highest CVAccuracy was obtained. In order to eliminate the impact of randomness, 10 prediction models were established, and then the average of their accuracy rates was used to describe the classification performance of the SVM.
In RF, the mtry value is the square root of the number of input variables. Therefore, the mtry value of single e-tongue, single e-nose, the dimensionality reduction set by PCA, and the feature screening set by GA-PLS were 3. The mtry value of original fusion set was 4. The 20 subsets based on variable accumulation, the mtry values of 1-3, 4-8, 9-15, and 16-20 were 1, 2, 3, and 4, respectively. Then, we observed the classification performance of RF by changing the number of random forest decision trees. The number of decision trees was taken from 2-100 at intervals of 2. In order to eliminate the impact of randomness, 10 prediction models were established, and then the average of their accuracy rates was used to describe the classification performance of the RF under the current decision tree.
In ELM, when the activation function of neurons in the hidden layer was determined, we changed the number of hidden layer neurons to observe the ELM classification performance. The number of hidden layer neurons was taken from 2-100 at intervals of 2. In order to eliminate the impact of randomness, 10 prediction models were established, and then the average of the accuracy rates was used to describe the classification performance of the ELM under the current neurons.

Pre-Processing
The detection data of the e-tongue and the e-nose contained 10-dimension feature variables, respectively. Data from the two systems were combined to form new data for describing the beer flavor information. A normalization between (−1, +1) was implemented on the original feature set from the different sensors of the e-tongue and e-nose. Figure 4 shows the averaged-value radar plot of the normalized values. According to the sensor response information, it was difficult to identify the different samples, and the relationship for each sensor was extremely complex. It was difficult to find the main features and their contribution to the beer flavor information. Thus, mining the data features and obtaining the variables' behavior are particularly important for distinguishing beer brands correctly. In RF, the mtry value is the square root of the number of input variables. Therefore, the mtry value of single e-tongue, single e-nose, the dimensionality reduction set by PCA, and the feature screening set by GA-PLS were 3. The mtry value of original fusion set was 4. The 20 subsets based on variable accumulation, the mtry values of 1-3, 4-8, 9-15, and 16-20 were 1, 2, 3, and 4, respectively. Then, we observed the classification performance of RF by changing the number of random forest decision trees. The number of decision trees was taken from 2-100 at intervals of 2. In order to eliminate the impact of randomness, 10 prediction models were established, and then the average of their accuracy rates was used to describe the classification performance of the RF under the current decision tree.
In ELM, when the activation function of neurons in the hidden layer was determined, we changed the number of hidden layer neurons to observe the ELM classification performance. The number of hidden layer neurons was taken from 2-100 at intervals of 2. In order to eliminate the impact of randomness, 10 prediction models were established, and then the average of the accuracy rates was used to describe the classification performance of the ELM under the current neurons.

Pre-Processing
The detection data of the e-tongue and the e-nose contained 10-dimension feature variables, respectively. Data from the two systems were combined to form new data for describing the beer flavor information. A normalization between (−1, +1) was implemented on the original feature set from the different sensors of the e-tongue and e-nose. Figure 4 shows the averaged-value radar plot of the normalized values. According to the sensor response information, it was difficult to identify the different samples, and the relationship for each sensor was extremely complex. It was difficult to find the main features and their contribution to the beer flavor information. Thus, mining the data features and obtaining the variables' behavior are particularly important for distinguishing beer brands correctly.

Extraction of Sensor Feature Variables
In order to acquire as much information as possible from the original fusion set, the 10 principal components were extracted by PCA, and the accumulated variance contribution rate was as high as 99.99%.
Before applying the GA-PLS to select variables, a randomization test was required to determine whether it could be applied. Figure 5 shows that randomization test result of the original fusion set. It can be seen that the randomization test value was less than 5, indicating that the GA-PLS was reliable. Figure 6 shows the GA-PLS search process for the best number of variables, it can be seen that the CV% increased rapidly and then gradually slowed down as the number of variables increases. When CV% reached a maximum 82.169%, the number of variables reached 12, and the number of variables continued to increase, the CV% decreased slightly and then stayed in a relatively stable state. On the contrary, in Figure 7, RMSECV decreased rapidly, and then slowly with the increase in the number of variables. When the number of variables reached 12, the RMSECV arrived at its minimum value of 0.5937, and then as the number of variables continued to increase, RMSECV increased slightly and maintained a relatively stable state. Finally, 12 feature variables were extracted from the original fusion set, which were CA0, C00, AE1, AAE, cpa(C00), W1C, W5S, W6S, W1S, W1W, W2S, and W2W.

Extraction of Sensor Feature Variables
In order to acquire as much information as possible from the original fusion set, the 10 principal components were extracted by PCA, and the accumulated variance contribution rate was as high as 99.99%.
Before applying the GA-PLS to select variables, a randomization test was required to determine whether it could be applied. Figure 5 shows that randomization test result of the original fusion set. It can be seen that the randomization test value was less than 5, indicating that the GA-PLS was reliable. Figure 6 shows the GA-PLS search process for the best number of variables, it can be seen that the CV% increased rapidly and then gradually slowed down as the number of variables increases. When CV% reached a maximum 82.169%, the number of variables reached 12, and the number of variables continued to increase, the CV% decreased slightly and then stayed in a relatively stable state. On the contrary, in Figure 7, RMSECV decreased rapidly, and then slowly with the increase in the number of variables. When the number of variables reached 12, the RMSECV arrived at its minimum value of 0.5937, and then as the number of variables continued to increase, RMSECV increased slightly and maintained a relatively stable state. Finally, 12 feature variables were extracted from the original fusion set, which were CA0, C00, AE1, AAE, cpa(C00), W1C, W5S, W6S, W1S, W1W, W2S, and W2W.

Extraction of Sensor Feature Variables
In order to acquire as much information as possible from the original fusion set, the 10 principal components were extracted by PCA, and the accumulated variance contribution rate was as high as 99.99%.
Before applying the GA-PLS to select variables, a randomization test was required to determine whether it could be applied. Figure 5 shows that randomization test result of the original fusion set. It can be seen that the randomization test value was less than 5, indicating that the GA-PLS was reliable. Figure 6 shows the GA-PLS search process for the best number of variables, it can be seen that the CV% increased rapidly and then gradually slowed down as the number of variables increases. When CV% reached a maximum 82.169%, the number of variables reached 12, and the number of variables continued to increase, the CV% decreased slightly and then stayed in a relatively stable state. On the contrary, in Figure 7, RMSECV decreased rapidly, and then slowly with the increase in the number of variables. When the number of variables reached 12, the RMSECV arrived at its minimum value of 0.5937, and then as the number of variables continued to increase, RMSECV increased slightly and maintained a relatively stable state. Finally, 12 feature variables were extracted from the original fusion set, which were CA0, C00, AE1, AAE, cpa(C00), W1C, W5S, W6S, W1S, W1W, W2S, and W2W.     Figure 8 shows the VIP score of the original variables of the e-tongue and e-nose. C00 and AE1 had larger VIP scores than the other variables, and this means that bitter and astringent were significant for beer classification. Cpa(AAE) and cpa(CA0) had smaller VIP scores, and this means that the aftertaste of umami and sour were less important for beer flavor. Thus, we generated the variable subsets which could be used to build the classification models. Each subset was generated with those variables based on the best VIP score. Subset #1 included C00, subset #2 included C00 and AE1, and subset #20 contained all the variables of the e-tongue and the e-nose. We then gradually accumulated the number of variables, and observed the classification results of the models to achieve the purpose of filtering redundant information [47].  Table 2 shows the classification results of single e-tongue, single e-nose, and the original fusion set based on the SVM, RF, and ELM. The classification accuracy rate of the SVM for e-tongue was 83.33%, for e-nose it was 80.56%, and the original fusion set was 88.89%. Figure 9 shows the parameter  Figure 8 shows the VIP score of the original variables of the e-tongue and e-nose. C00 and AE1 had larger VIP scores than the other variables, and this means that bitter and astringent were significant for beer classification. Cpa(AAE) and cpa(CA0) had smaller VIP scores, and this means that the aftertaste of umami and sour were less important for beer flavor. Thus, we generated the variable subsets which could be used to build the classification models. Each subset was generated with those variables based on the best VIP score. Subset #1 included C00, subset #2 included C00 and AE1, and subset #20 contained all the variables of the e-tongue and the e-nose. We then gradually accumulated the number of variables, and observed the classification results of the models to achieve the purpose of filtering redundant information [47].  Figure 8 shows the VIP score of the original variables of the e-tongue and e-nose. C00 and AE1 had larger VIP scores than the other variables, and this means that bitter and astringent were significant for beer classification. Cpa(AAE) and cpa(CA0) had smaller VIP scores, and this means that the aftertaste of umami and sour were less important for beer flavor. Thus, we generated the variable subsets which could be used to build the classification models. Each subset was generated with those variables based on the best VIP score. Subset #1 included C00, subset #2 included C00 and AE1, and subset #20 contained all the variables of the e-tongue and the e-nose. We then gradually accumulated the number of variables, and observed the classification results of the models to achieve the purpose of filtering redundant information [47].  Table 2 shows the classification results of single e-tongue, single e-nose, and the original fusion set based on the SVM, RF, and ELM. The classification accuracy rate of the SVM for e-tongue was 83.33%, for e-nose it was 80.56%, and the original fusion set was 88.89%. Figure 9 shows the parameter  Table 2 shows the classification results of single e-tongue, single e-nose, and the original fusion set based on the SVM, RF, and ELM. The classification accuracy rate of the SVM for e-tongue was 83.33%, for e-nose it was 80.56%, and the original fusion set was 88.89%. Figure 9 shows the parameter optimization process of the SVM based on GA. The best c and g were selected when the highest CVAccuracy was obtained to build the model. The classification accuracy rate of the RF for e-tongue was 83.33%, for e-nose it was 77.78%, and the original fusion set was 88.89%. Figure 10 shows the classification performance of the RF based on the number of decision trees. The classification accuracy rate of the ELM for e-tongue was 82.78%, for e-nose it was 78.89%, and the original fusion set was 88.33%. Figure 11 shows the classification performance of the ELM based on the number of hidden neurons. It can be seen that the classification accuracy rate increased by using data fusion.

Results of the Models
However, we cannot be sure that the 20-dimensional feature variables were the main variables. It is uncertain that each feature variable contributed to the beer overall flavor and affected the classification results. Therefore, the following three feature selection methods were discussed to find the main feature affecting the beer flavor.   Table 3 shows the classification results of the original fusion set, the dimensionality reduction set by PCA, and the feature screening set by GA-PLS. It can be seen that the PCA extracted principal components did not work desirably to improve the classification results. SVM and ELM classification results rise slightly to 91.11% and 89.44%, respectively. The RF classification result was still 88.89%. This may be used as an unsupervised learning method without introducing classified information. It may lose effective authentication information and does not remove redundant information effectively. Figure 12 shows the parameter optimization fitness curve of the SVM, the classification performance of the RF based on the number of decision trees, and the classification performance of the ELM based on the number of hidden neurons for the dimensionality reduction set by PCA. However, compared with the 10 principal components extracted by PCA and the original fusion set, the 12-dimensional feature variables selected by GA-PLS obtained a better classification result. The classification accuracy rate of the SVM was 96.67%, for RF it was 94.44%, and for ELM it was 94.44%. This shows that GA-PLS removed some redundant information of the original fusion set and selected the effective feature variables. Figure 13 shows the parameter optimization fitness curve of the SVM, the classification performance of the RF based on the number of decision trees, and the classification performance of the ELM based on the number of hidden neurons for the feature screening set by GA-PLS. However, this method cannot find the combined behavior among variables, and cannot obtain the importance evaluation of each variables' contribution.    Table 4 shows 20 subsets, which were generated with those variables based on the best VIP score. The classification performance of SVM and ELM in subset #7 and RF in subset #9 could be equal to the original fusion set, respectively, which meant that the original fusion set contained a large amount of redundant information. With the number of variables increased, the classification performance of SVM, RF, and ELM models appeared to have a small range fluctuation, and it showed that these variables had a relevant impact on the contribution of beer flavor features. The highest classification accuracy rate of SVM was up to 96.67% in subset #12, RF was 94.44% in subset #11, and ELM was 98.33% in subset #12, respectively. With the number of variables continuing to increase, the classification accuracy rate of the models decreased and did not exceed the highest value. Figure 14 shows the parameter optimization fitness curve of the SVM for subset #12, the classification performance of the RF based on the number of decision trees for subset #11, and the classification performance of the ELM based on the number of hidden neurons for subset #12. In this way, we not only obtained the best subset to achieve the purpose of reducing redundant variables, but also obtained the variables' behavior by observing the classification tendency of the model when variables were added gradually and the highest classification accuracy rate for beer flavor information.  Table 3 shows the classification results of the original fusion set, the dimensionality reduction set by PCA, and the feature screening set by GA-PLS. It can be seen that the PCA extracted principal components did not work desirably to improve the classification results. SVM and ELM classification results rise slightly to 91.11% and 89.44%, respectively. The RF classification result was still 88.89%. This may be used as an unsupervised learning method without introducing classified information. It may lose effective authentication information and does not remove redundant information effectively. Figure 12 shows the parameter optimization fitness curve of the SVM, the classification performance of the RF based on the number of decision trees, and the classification performance of the ELM based on the number of hidden neurons for the dimensionality reduction set by PCA. However, compared with the 10 principal components extracted by PCA and the original fusion set, the 12-dimensional feature variables selected by GA-PLS obtained a better classification result. The classification accuracy rate of the SVM was 96.67%, for RF it was 94.44%, and for ELM it was 94.44%. This shows that GA-PLS removed some redundant information of the original fusion set and selected the effective feature variables. Figure 13 shows the parameter optimization fitness curve of the SVM, the classification performance of the RF based on the number of decision trees, and the classification performance of the ELM based on the number of hidden neurons for the feature screening set by GA-PLS. However, this method cannot find the combined behavior among variables, and cannot obtain the importance evaluation of each variables' contribution.       Table 4 shows 20 subsets, which were generated with those variables based on the best VIP score. The classification performance of SVM and ELM in subset #7 and RF in subset #9 could be equal to the original fusion set, respectively, which meant that the original fusion set contained a large amount of redundant information. With the number of variables increased, the classification performance of SVM, RF, and ELM models appeared to have a small range fluctuation, and it showed that these variables had a relevant impact on the contribution of beer flavor features. The highest classification accuracy rate of SVM was up to 96.67% in subset #12, RF was 94.44% in subset #11, and ELM was 98.33% in subset #12, respectively. With the number of variables continuing to increase, the classification accuracy rate of the models decreased and did not exceed the highest value. Figure 14 shows the parameter optimization fitness curve of the SVM for subset #12, the classification performance of the RF based on the number of decision trees for subset #11, and the classification performance of the ELM based on the number of hidden neurons for subset #12. In this way, we not only obtained the best subset to achieve the purpose of reducing redundant variables, but also obtained the variables' behavior by observing the classification tendency of the model when variables were added gradually and the highest classification accuracy rate for beer flavor information.

Conclusions
The main conclusions are as follows: (1) Compared with the single e-tongue and single e-nose, the classification accuracy rate of beer flavor information was improved by using multi-sensor data fusion, and the classification accuracy rate of SVM was 88.89%, RF was 88.89%, and ELM was 88.33%; (2) The feature selection method based on PCA did not obtain the best form of beer flavor information. The feature selection method based on GA-PLS improved the beer flavor classification rate and reduced the feature dimension obviously, and SVM showed the best classification performance at 96.67%. However, it did not give the contribution behavior of each variable for the overall information; and (3) By variable accumulation based on the best VIP score, the classification accuracy rate of SVM and ELM in subset #7 was 88.89% and 88.33%, respectively, and the classification accuracy rate of the RF in subset #9 was 88.89%, which meant that the original fusion set contained a lot of redundant information. Finally, ELM showed the best classification performance 98.33% in subset #12. Thus, C00, AE1, W1C, W3S, W3C, W5C, W1W, CA0, cpa(C00), W2S, AAE, and W1S

Conclusions
The main conclusions are as follows: (1) Compared with the single e-tongue and single e-nose, the classification accuracy rate of beer flavor information was improved by using multi-sensor data fusion, and the classification accuracy rate of SVM was 88.89%, RF was 88.89%, and ELM was 88.33%; (2) The feature selection method based on PCA did not obtain the best form of beer flavor information.
The feature selection method based on GA-PLS improved the beer flavor classification rate and reduced the feature dimension obviously, and SVM showed the best classification performance at 96.67%. However, it did not give the contribution behavior of each variable for the overall information; and (3) By variable accumulation based on the best VIP score, the classification accuracy rate of SVM and ELM in subset #7 was 88.89% and 88.33%, respectively, and the classification accuracy rate of the RF in subset #9 was 88.89%, which meant that the original fusion set contained a lot of redundant information. Finally, ELM showed the best classification performance 98.33% in subset #12. Thus, C00, AE1, W1C, W3S, W3C, W5C, W1W, CA0, cpa(C00), W2S, AAE, and W1S were considered as the main features.
Among the contributions of this study, a variable accumulation strategy based on the best VIP score was proposed and applied to beer flavor information identification. It provided a vital method, which used the least characteristic variables and the best fusion method, combined with excellent pattern recognition methods, to identify beer flavor information more efficiently and more accurately. It not only obtained the important evaluation of each variable, but also obtained the correlation behavior by observing the classification tendency of the model. Meanwhile, it also provided a more efficient and accurate method to monitor product quality in the actual process of industrialization.