Feature Extraction and Classification of Citrus Juice by Using an Enhanced L-KSVD on Data Obtained from Electronic Nose

Aroma plays a significant role in the quality of citrus fruits and processed products. The detection and analysis of citrus volatiles can be measured by an electronic nose (E-nose); in this paper, an E-nose is employed to classify the juice which is stored for different days. Feature extraction and classification are two important requirements for an E-nose. During the training process, a classifier can optimize its own parameters to achieve a better classification accuracy but cannot decide its input data which is treated by feature extraction methods, so the classification result is not always ideal. Label consistent KSVD (L-KSVD) is a novel technique which can extract the feature and classify the data at the same time, and such an operation can improve the classification accuracy. We propose an enhanced L-KSVD called E-LCKSVD for E-nose in this paper. During E-LCKSVD, we introduce a kernel function to the traditional L-KSVD and present a new initialization technique of its dictionary; finally, the weighted coefficients of different parts of its object function is studied, and enhanced quantum-behaved particle swarm optimization (EQPSO) is employed to optimize these coefficients. During the experimental section, we firstly find the classification accuracy of KSVD, and L-KSVD is improved with the help of the kernel function; this can prove that their ability of dealing nonlinear data is improved. Then, we compare the results of different dictionary initialization techniques and prove our proposed method is better. Finally, we find the optimal value of the weighted coefficients of the object function of E-LCKSVD that can make E-nose reach a better performance.


Introduction
Citrus juice is famous for its rich nutrition, delicious taste and aroma-make, and aroma is a significant factor that affects the quality of citrus fruits and processed products. It is important for us to study the aroma of citrus processed products due to the fact that characteristics of aroma can be a standard for testing in the field of food. When tasting a citrus, we can use multiple senses, such as aroma and taste, to feel the stimulus. Also, flavor is a crucial factor in determining what citrus quality and nutrition value is. At present, the main method of testing the citrus quality is sensory analysis and precision instrumental.
Sensory analysis is used in traditional processing extensively. Although this test method is extremely consumer-friendly, people are too emotional to have a similar standard of the same juice. At the same time, various degrees of fatigue problems will affect experimenters' sensory analysis, which may cause the experimenters to come to inaccurate or even wrong conclusions and make the results subjective and unscientific. Extracting more features from an E-nose response can help improve the classification accuracy. In the traditional way, we would like to get the maximum value of the steady-state response of sensors to construct the original feature matrix and use this matrix to classify, but its accuracy is not ideal. To get a better performance, some classical algorithms are used for reprocessing the original feature matrix. Principal component analysis (PCA) is a method to find several comprehensive indicators to represent many original features, so that these comprehensive indicators reflect the original variables as much as possible, and they are not related to each other [20]. Independent component analysis (ICA) [21] is a statistical method used to convert observed multidimensional vectors into statistically independent components for eliminating the redundancy of the original data. These algorithms help the classifier improve accuracy. However, PCA and ICA are both linear methods of which the performance is not very ideal when used to deal with the nonlinear data. The kernel PCA (KPCA) is a nonlinear method which finds a computationally tractable solution through a simple kernel function that intrinsically constructs a nonlinear mapping from the input space to the high-dimensional space and then performs a nonlinear PCA in the high-dimensional space [22].
When the feature matrix is obtained, it will be put into the classifier. The most common classifiers used in an E-nose is the linear discriminant analysis (LDA) [23], radial basis function neural network (RBFNN) [24] and support vector machine (SVM) [25]. LDA is a linear classify, and its target during the training process is to shorten the distance between the samples from the same class and largen the distance of samples from the different classes. With the help of the kernel function, LDA can classify the nonlinear data. RBFNN using a radial basis function as its nonlinear mapping function is not easy to fall into the local optimum, but the time it spends during the training process is a little long. SVM is a widely used classifier in an E-nose. Its training target is not only the highest classification accuracy of the training data set but also the highest classification accuracy of the test data set, and it has a good generalization ability.
We cannot help but think that if we unify the process of feature extraction and the classifier, as a joint indicator, can we get a better recognition rate. Inspired by this, we find the label consistent KSVD (L-KSVD), which combines the feature extraction with the classification. KSVD is a dictionary learning algorithm for creating a dictionary for spare representations via a singular value decomposition approach. KSVD is a generalization of the k-means clustering method, and it works by iteratively alternating between sparse coding the input data based on the current dictionary and updating the atoms in the dictionary to better fit the data [26]. The innovation of this paper is pointed out as follows: (a) The traditional L-KSVD cannot handle the problem of nonlinear data very well, and the kernel function is adopted in this paper to help L-KSVD deal with the nonlinear data obtained by the E-nose. (b) Choosing a proper dictionary is the first and most important step of L-KSVD, and a novel dictionary initialization method is proposed according to the data characteristics of the E-nose.
With the help of the Enhance Quantum-behaved Particle Swarm Optimization (EQPSO), this method generates random numbers in binary and uses the recognition rate as a fitness function to decide which sensor response will be used to initialize the dictionary. (c) The weighted coefficients of the objective function of L-KSVD have a bigger impact on the classification accuracy, so these coefficients are standardized and then optimized with the help of EQPSO in this paper.
In the rest of this paper, we present the experiment of this paper in Section 2 and introduce the proposed L-KSVD in Section 3. The results and discussion will be presented in Section 4. Finally, the conclusions are drawn in Section 5.

Materials and Methods
In this project, we use the cold-pressed technology to process the Valencia oranges with the same maturity to obtain orange juice: 50 kg oranges are sued in this project, and the shape and size of each orange is almost same. These oranges come from 10 trees of similar age and growth in the same orchard and are picked at the same position of each tree (about 5 kg oranges from one tree). During the process of obtaining the experimental juice, filtration, sterilization and canning are implemented in turn. Then, the juice is stored in a storage tank (its volume is 30 L), and nitrogen is filled in the top of the storage tank to be isolated from air, and the air pressure of the tank is 202.6 KPa. On the bottom of the tank, there is a tap which is used for the sampling of the orange juice. Inside the tank, there is a blender, and every time before the sampling, we turn it on for 1 min to stir the orange juice thoroughly in case the orange juice might delaminate or precipitate, and the blender spins on its axis once every 1 s. The sampling is proceeded every 15 days, and more information of the above description can be found in Figure 1. After each sampling, the orange juice is put into 4 sterilized and identical glass vessels, and each of them contains 500 mL of orange juice.
The analysis of the orange juice aroma components is shown in Table 1, and the schematic diagram of the experimental E-nose system is shown in Figure 2.
Inside the tank, there is a blender, and every time before the sampling, we turn it on for 1 min to stir the orange juice thoroughly in case the orange juice might delaminate or precipitate, and the blender spins on its axis once every 1 s. The sampling is proceeded every 15 days, and more information of the above description can be found in Figure 1. After each sampling, the orange juice is put into 4 sterilized and identical glass vessels, and each of them contains 500 mL of orange juice. The analysis of the orange juice aroma components is shown in Table 1, and the schematic diagram of the experimental E-nose system is shown in Figure 2. Nonanal (C9H18O) 49 2,4-diphenyl-4-methyl-1-pentene (C18H20) 25 Octyl acetate(C10H20O2) 50 2,4-diphenyl-4-methyl-2-pentene (C18H20) Figure 1. The obtainment of experimental juice and the timeline of an E-nose sampling experiment.  During each sampling experiment of the E-nose, we set the temperature and humidity of the chamber of the E-nose at 25 °C and 40%, respectively. The rules used when we design and construct the sensor array are (a) the sensor array can respond to all classes of the juice odor; (b) each sensor has its own interesting odor and can also respond to other odors of juice; and (c) a high-performance cost ratio and an easy purchase are requirements. Table 2 lists the sensors selected by us and their sensitive characteristics. The gas sensor array was located in the Teflon chamber with a volume of 0.24 L, and the flow rate can be controlled by a flowmeter and a micro has its own interesting odor and can also respond to other odors of juice; and (c) a high-performance cost ratio and an easy purchase are requirements. Table 2 lists the sensors selected by us and their sensitive characteristics. The gas sensor array was located in the Teflon chamber with a volume of 0.24 L, and the flow rate can be controlled by a flowmeter and a micro pump with a set value of 0.08 L/min. The practical E-nose system designed by us is shown in Figure 3.  In the rubber stopper, there are two holes with two thin Teflon tubes inserted: one Teflon tube fixed as close as possible to Valencia orange juice. The output gas from the tube containing VOCs of the Valencia orange juice flows out of the bottle and then flows into the chamber through a Teflon tube. As for each sampling experiment, the following three stages should be performed: Step (a) expose all sensors to clean air for 5 min to obtain the baseline.
Step (b) introduce the target gas into the chamber for 7 min.
Step (c) exposed the sensor array to clean air for 5 min again to clean the sensors and restore the baseline.
In each experiment, each cup of orange juice is tested by an E-nose 6 times so that 24 sample data are recorded in total. After 4 sampling experiments, the data set of this project, which contains 96 sample data, is obtained.
The sensors used by the E-nose of this paper are all metal oxide sensor (MOS). With a MOS, when its resistance wire contacts different gases, its resistivity will change, which will lead to the change of its resistance value. The E-nose hardware circuit designed in this paper can transform the change of resistance value to the change of voltage value, and then, we record this voltage value. Therefore, the voltage value is taken as the response value of the sensor in this paper. For this In the rubber stopper, there are two holes with two thin Teflon tubes inserted: one Teflon tube fixed as close as possible to Valencia orange juice. The output gas from the tube containing VOCs of the Valencia orange juice flows out of the bottle and then flows into the chamber through a Teflon tube. As for each sampling experiment, the following three stages should be performed: Step (a) expose all sensors to clean air for 5 min to obtain the baseline.
Step (b) introduce the target gas into the chamber for 7 min.
Step (c) exposed the sensor array to clean air for 5 min again to clean the sensors and restore the baseline.
In each experiment, each cup of orange juice is tested by an E-nose 6 times so that 24 sample data are recorded in total. After 4 sampling experiments, the data set of this project, which contains 96 sample data, is obtained.
The sensors used by the E-nose of this paper are all metal oxide sensor (MOS). With a MOS, when its resistance wire contacts different gases, its resistivity will change, which will lead to the Sensors 2019, 19, 916 6 of 17 change of its resistance value. The E-nose hardware circuit designed in this paper can transform the change of resistance value to the change of voltage value, and then, we record this voltage value. Therefore, the voltage value is taken as the response value of the sensor in this paper. For this voltage value, we first use analog circuits to filter and amplify it, and then, the response of the sensor array is sampled by a 14-bit data acquisition system (DAS) which is bought from the market, and the output signal of DAS is sent to the computer by a USB data line. Figure 4 illustrates the response of the sensors when Valencia orange juice odor is introduced into the chamber. We can see that each response curve rises obviously from the fifth minute when the target gas begins to pass over the sensor array and recovers to the baseline after the twelfth minute when clean air is conveyed to wash the sensors.  Note: Strictly speaking, the response of the metal oxide sensor (MOS) should be the resistance value, and the response of the vertical coordinate in this diagram is the voltage value, which is the voltage value that we use in the circuit (designed by ourselves) to transform the change of resistance value to the change of voltage value. The resistance of the sensor is different when the gas environment is different, and the voltage value is also different, so we take the voltage value as the response value of sensors in this paper.
Then the maximum value of the steady-state response of sensors is extracted to create the feature matrix of the E-nose. There are 96 samples in this matrix, and the dimension of each sample is 15. We randomly select 2/3 samples of each gas to establish the training data set, and the rest are used as the test data set. This feature matrix is called the original feature matrix, and it is the input of the proposed KSVD algorithm.

KSVD and L-KSVD
KSVD is a generalization of the k-means clustering method, and it works by iteratively alternating between sparse coding the input data based on the current dictionary and updating the atoms in the dictionary to better fit the data. KSVD can be found widely used in applications such as image processing, audio processing, biology and document analysis. KSVD learns a shared dictionary, which optimizes the following objective function: are N sparse codes of input signals Y ; and T is a constant, which Then the maximum value of the steady-state response of sensors is extracted to create the feature matrix of the E-nose. There are 96 samples in this matrix, and the dimension of each sample is 15. We randomly select 2/3 samples of each gas to establish the training data set, and the rest are used as the test data set. This feature matrix is called the original feature matrix, and it is the input of the proposed KSVD algorithm.

KSVD and L-KSVD
KSVD is a generalization of the k-means clustering method, and it works by iteratively alternating between sparse coding the input data based on the current dictionary and updating the atoms in the dictionary to better fit the data. KSVD can be found widely used in applications such as image processing, audio processing, biology and document analysis. KSVD learns a shared dictionary, which optimizes the following objective function: where Y = [y 1 , y 2 , · · · , y N ] ∈ R n×N are N input signals and each is in the n dimension; D = [d 1 , d 2 · · · , d K ] ∈ R n×K (K >> n, making D over-complete) is a dictionary with atoms; x N ] ∈ R K×N are N sparse codes of input signals Y; and T is a constant, which controls the number of nonzero elements in x i less than T. Solving the minimization of Equation (1) by a two-step iterative algorithm: Firstly, the dictionary is fixed, and the sparse coefficients X can be found. This is the problem of sparse coding, which can be solved by an orthogonal matching pursuit (OMP) [27,28]. Secondly, while fixing all other atoms in D, the sparse coefficient matrix X is fixed and dictionary D is updated one atom at the same time.
For each atom, d k and the corresponding k th row of coefficient matrix X denoted by x k T define the group of samples that use this atom as T , restricts E k by choosing only the columns in ω k and obtains E R k . Then, the following problem is solved: where a singular value decomposition (SVD) is performed and E R k = U∆V T . A label consistent KSVD (L-KSVD) algorithm to learn a discriminative dictionary for sparse coding was presented by Zhuojin Jiang et al. This algorithm learns a single over-complete dictionary and an optimal linear classifier jointly. It yields dictionaries so that feature points with the same class labels will have similar sparse codes.
In order to use class labels of training data, the associating label information with each dictionary item (columns of the dictionary matrix) enforces discriminability in sparse codes during the dictionary learning process [29].
The aim is to leverage the supervised information of input signals to learn a reconstructive, discriminative dictionary and to include the classification error as a term in the objective function for dictionary learning; each dictionary item is chosen to represent a subset of the training signals ideally from a single class: in that case, each dictionary item d k can be associated with a particular label. Thus, there is an explicit correspondence between the dictionary items and the labels. Then, L-KSVD focuses on the effects of adding a label-consistent regularization term, subsequently, to learn both more balanced reconstructive and discriminative power, making the objective function with a joint classification error and label-consistent regularization term.
The performance of the linear classifier must be based on the discriminability of x (the input sparse codes). To accept the discriminative sparse codes that have the learned D, an objective function of dictionary construction is defined as follows: where α dominates the reconstruction and label-consistent regularization, H − WX 2 2 represents the classification error, and W is the classifier parameters to make the classification dictionary optimal. L-KSVD makes the linear predictive classifier f (x; W) = Wx, with Q = q 1 i . . . q K i ∈ R K×N as the input signal Y of the discriminative sparse codes of the classification, denote the discriminative sparse A is a linear transformation matrix, and L-KSVD identifies g = (x; A) = Ax; in the sparse feature space R K , the original sparse code x is transformed to the most discriminative one. Q − AX 2 2 , on behalf of the discriminative sparse-code error, performs X approximate discriminant sparse codes. It forces the signals from the same class to have very similar sparse representations and uses simple linear classifiers to achieve a good classification performance. H = [h 1··· h N ] ∈ R m×N is a class label of Y. h i = [0, 0 . . . 1 . . . 0, 0] t ∈ R m is a label vector the goes to the input signal y i , and the nonzero position indicates the location of the classes. α and β are scalars for controlling the relative contribution of the corresponding items.
Dictionaries learned with the method will adapt to the structure of the training data (resulting in a good representation of the strict sparse constraints on each member of the set) and will make discriminatory sparse codes X disregard the dictionary size. These sparse codes can be directly used by classifiers, such as in Reference [30]. The discriminating characteristics of sparse codes x are very important for the performance of linear classifiers.
The brief introduction of KSVD and L-KSVD are shown above. When we apply L-KSVD to solve the problem of data processing of an E-nose, we find that several problems need to be solved: The distribution of data gained by E-nose data is nonlinear, so the effect of KSVD/L-KSVD is not ideal when they are used to process the data directly; since the sensor of an E-nose is cross-responsive, the data is redundant. When the dictionary of KSVD/L-KSVD is initialized, different sensor responses will be selected and the corresponding recognition rate will be different. The weighted coefficients of three parts of the L-KSVD objective function will determine the influence of each part on the final result.
During Sections 3.2-3.4, we will solve the above three problems, respectively. The L-KSVD enhanced by techniques from Sections 3.2-3.4 is called E-LCKSVD.

Kernel Function
L-KSVD for sparse coding has contributed a lot, which lies in explicitly integrating the discriminative sparse codes and a single predictive linear classifier into the objective function for dictionary learning. However, this algorithm cannot handle the problems of the nonlinear data very well. As nonlinear dynamical systems which are difficult to solve in mathematics and science, based on pattern recognition theory, the low-dimensional space linear inseparable model through nonlinear mapping to a high-dimensional feature space may be linearly separable, but if we use this technique in a high-dimensional space classification or regression directly, there is a big obstacle named dimension disaster which will exist in high-dimensional feature space operations.
It has been proved that the kernel function can be used to solve this problem effectively. A kernel is a nonnegative real-valued integrable function; it is desirable to define the function for most applications to satisfy two additional requirements: normalization and symmetry. As we know, several types of kernel functions are commonly used in many fields, especially in a support vector machine (SVM). SVM maps the sample space to a feature space of high or even infinite dimensions through nonlinear mapping so that the nonlinear separable problems in the original sample space are transformed into linearly separable problems in the feature space. As for the problems of classification and regression, it is likely that the sample set cannot be processed linearly in the lowdimensional sample space, but the linear partition (or regression) can be achieved through a linear hyper plane in the high-dimensional feature space. As the linear learning machine is established in the high-dimensional feature space, it does not increase the complexity of the calculations and avoids the dimensional disaster to some extent compared with the linear model. All of this is due to the kernel function expansion and computational theory.
The kernel function can be combined with different algorithms to form a variety of different methods based on kernel function technology, and the design of these two parts can be carried out separately. The combination of L-KSVD and the kernel function can solve the nonlinear problem of the E-nose. In this paper, firstly, we use the RBF kernel to map the data of an E-nose and then run KSVD/L-KSVD, namely, in the high-dimensional space. The expression of RBF kernel is where σ is the scale factor which determines the distribution of the data mapped to the high-dimensional space, so it is a very important parameter. In this paper, an optimization algorithm named EQPSO is used to set its value.

Dictionary
Choosing a proper dictionary is the first and most important step of the sparse representation based on classifications with encouraging results [31]. Especially, dictionaries learned from training data obtain researchers' attentions because the learned dictionaries usually lead to a better representation and achieve much success in classification.
The goal of dictionary learning is to learn an over-complete dictionary matrix D ∈ R n×K in which K contains signal-atoms (in this notation, columns of D). A signal vector y ∈ R n can be represented, sparsely, as a linear combination of these atoms; to represent y, the representation vector x should satisfy the exact condition y = Dx, or the approximate condition y ≈ Dx, made precise by requiring that y − Dx p ≤ ε for some small value ε and some L p norm. The vector x ∈ R K contains the representation coefficients of the signal y. Typically, the norm p is selected as L 1 , L 2 or L ∞ .
If n < K and D is a full-rank matrix, there are an infinite number of solutions for the representation problem. Then, constraints should be set on the solution. Also, to ensure sparsity, the solution with the fewest number of nonzero coefficients is preferred, which means the sparsity representation is the solution of either (P 0 )min x x 0 subject to y = Dx or (P 0,ε )min x x 0 subject to y − Dx 2 ≤ ε, where the x 0 counts the nonzero entries in the vector x.
Minimizing the reconstruction error and satisfying the sparsity constraints to achieve the construction of D: Although the random dictionary initialization has an outstanding effect on images of compressing and restorations, it is not good enough to identify gas in an E-nose due to the fact that the cross-responsiveness of the sensor array, that is, each sensor, will respond to the same gas, which means that the response of the sensor is overlapping and redundant.
To select the best combination of sensors, we use random binary number to filter proper dictionary initialized atoms: 1 represents this sensor is selected, and 0 represents not. Remove the information which is redundant, and filter the characteristic representative data in this way. The binary parameter is provided by EQPSO (shown in Section 3.5), and the way to generate the random binary number can be found in Section 3.5.

Weighted Coefficients
At first, we use a linear predictive classifier f (x; W) = Wx. An objective function for learning a dictionary D having both reconstructive and discriminative power can be defined as follows: In this paper, Equation (6) is changed by us to the following: where α + β + γ = 1 and α, β, γ ε (0,1). We define this process as normalization processing. In Equation (7), the value of α, β and γ changes the proportion of each part in the whole expression; it is a little bit more intuitive to see which part is going to have a bigger impact on the outcome result of the recognition rate. We can see the classification accuracy of E-LCKSVD is very different when the values of α and β are different from Table 3, so the values of α, β and γ have great influences on the classification accuracy. Some optimization technique must be employed to set their value.

PSO, QPSO and EQPSO
Particle swarm optimization (PSO) is an evolutionary computation algorithm based on the swarm intelligence theory. This swarm intelligence algorithm for continuous searching space problems is widely used for its simple programming and fast convergence speed and is used in Reference [32]. Among them, and where t is the current iteration number of the algorithm, r 1d (t) and r 2d (t) are random numbers in [0,1]; P i is the current optimal position of the particle; and P g is the global optimal position of the group. However, in a classic PSO, the particle search process is realized in the form of orbit and the particle's flight speed is limited. Therefore, in the search process, the particle search space is limited to a limited search space, which cannot cover the whole feasible search space. A general PSO cannot guarantee convergence to the global optimal solution with probability 1, which is the deficiency of PSO. In order to solve this shortcoming of PSO, quantum-behaved PSO (QPSO) has been proposed [33]. QPSO combine PSO with quantum mechanics. QPSO has great advantages in terms of search ability, convergence speed, accuracy and solving robustness. Compared with the other algorithm, one of the biggest characteristics of QPSO is the simple calculation and few control parameters. It is superior to the general PSO not only in search ability but also in its accuracy, and QPSO can guarantee the global convergence with probability 1.
The wave function ϕ(X, t) is used to determine the state of each particle and the definition of the average optimal position which is the center of the optimal position of all particles. We can get an updated Equation (10) for the position of the particle: where α is called the compression expansion factor to regulate the rate of convergence of particles. However, when the number of iterations is not infinite, QPSO cannot guarantee to find the global optimal value. In practice, the number of iterations is always limited. Enhanced QPSO (EQPSO) [34,35] is an improved QPSO algorithm, which can ensure to find the value closest to the optimal value in the case of limited iteration times. It has a better ability of making particles multifarious at the early iteration's stage and performing in local searching ability at the later stage of iteration. Meanwhile, the convergence speed of EQPSO is faster than other considered PSOs, such as PSO and QPSO. Therefore, EQPSO is selected as the optimization algorithm in this paper.

Optimization Problem of the Proposed E-LCKSVD
The first step is to generate α and β (γ = 1 − α − β); these two numbers are real, and the value range is 0 to 1. Then, from 15 binary numbers which are used to determine the response of the sensor which will be used to initialize the dictionary, one of the responses of the sensor will be selected, only when the corresponding binary number is 1. The classification accuracy of the E-nose is the fitness function; a set of parameters can be chosen only if its corresponding fitness function reaches the maximum. The detailed information of the parameters that needed optimization is shown in Table 4. Table 4. Details of the parameters needing optimization.

Results and Discussion
As mentioned before, there are 96 samples of 4 classes (24 samples in each class), and we randomly select 16 samples from each class to build the training data set (16 × 4 = 64 samples), and the rest of the samples are used to build the test data set (8 samples in each class). Figure 5 gives the work flow of E-LCKSVD. In order to verify that the kernel function can help L-KSVD deal with the data of the E-nose more effectively, we have entered several sets of comparative experiments. To determine whether to add a kernel function to the map data to a variable controlled by a high-dimensional space, we compare the processed and unprocessed data into KSVD and L-KSVD respectively. Since KSVD does not have the ability of pattern recognition by itself, we input the sparse matrix obtained by it into the extreme learning machine (ELM) [36] for pattern recognition. Each set of procedures is repeated 10 times, and the one with the highest recognition rate was taken as the final result, which is shown in Table 5.

Results and Discussion
As mentioned before, there are 96 samples of 4 classes (24 samples in each class), and we randomly select 16 samples from each class to build the training data set (16 × 4 = 64 samples), and the rest of the samples are used to build the test data set (8 samples in each class). Figure 5 gives the work flow of E-LCKSVD. In order to verify that the kernel function can help L-KSVD deal with the data of the E-nose more effectively, we have entered several sets of comparative experiments. To determine whether to add a kernel function to the map data to a variable controlled by a high-dimensional space, we compare the processed and unprocessed data into KSVD and L-KSVD respectively. Since KSVD does not have the ability of pattern recognition by itself, we input the sparse matrix obtained by it into the extreme learning machine (ELM) [36] for pattern recognition. Each set of procedures is repeated 10 times, and the one with the highest recognition rate was taken as the final result, which is shown in Table 5.    It can be seen obviously that the addition of the kernel function improves the discriminating precision significantly. As shown in Figure 6, KSVD combined with ELM is more effective in processing data, as does L-KSVD. Next, we compare the dictionary initialization method proposed in this paper with the previous dictionary initialization methods. We found that the initialization of the dictionary can affect the classification accuracy.
As shown in Figure 7, a too small dictionary leads to the loss of information, resulting in a low recognition rate. Meanwhile, the process of training the dictionary will be complex when the dictionary is too large, and a large dictionary does not mean a high classification accuracy because the data of the sensor array is redundant. In addition, we found that even though the dimensions were the same, the result is different when different sensors are selected to initialize the dictionary, just like shown in Table 6. The classification accuracy of different dictionary initialization methods are shown in Table 7. Finally, the size of the dictionary is 14. Compared with the random initialization method, it is significant to select the initialization dictionary purposefully. The training set recognition rate of E-LCKSVD using the optimized dictionary initialization method is as high as 98.4%, and the test set recognition rate reaches 96.9%. Next, we compare the dictionary initialization method proposed in this paper with the previous dictionary initialization methods. We found that the initialization of the dictionary can affect the classification accuracy.
As shown in Figure 7, a too small dictionary leads to the loss of information, resulting in a low recognition rate. Meanwhile, the process of training the dictionary will be complex when the dictionary is too large, and a large dictionary does not mean a high classification accuracy because the data of the sensor array is redundant. In addition, we found that even though the dimensions were the same, the result is different when different sensors are selected to initialize the dictionary, just like shown in Table 6. The classification accuracy of different dictionary initialization methods are shown in Table 7. Finally, the size of the dictionary is 14. Compared with the random initialization method, it is significant to select the initialization dictionary purposefully. The training set recognition rate of E-LCKSVD using the optimized dictionary initialization method is as high as 98.4%, and the test set recognition rate reaches 96.9%. Figure 8 is the comparison of the recognition rate using several different weighted coefficients and the corresponding weight when the highest recognition rate is reached. It can be seen visually that the weights have a great influence on the effect of the whole system. Therefore, finding the appropriate weight value is an important step. Different weights combinations have the problem of the local optimal solution; we use EQPSO to avoid this, and then, we find the global optimization point: α = 0.7661, β = 0.0351. recognition rate. Meanwhile, the process of training the dictionary will be complex when the dictionary is too large, and a large dictionary does not mean a high classification accuracy because the data of the sensor array is redundant. In addition, we found that even though the dimensions were the same, the result is different when different sensors are selected to initialize the dictionary, just like shown in Table 6. The classification accuracy of different dictionary initialization methods are shown in Table 7. Finally, the size of the dictionary is 14. Compared with the random initialization method, it is significant to select the initialization dictionary purposefully. The training set recognition rate of E-LCKSVD using the optimized dictionary initialization method is as high as 98.4%, and the test set recognition rate reaches 96.9%.

Sensors
Binary Number   1  TGS813  1  0  0  1  1  1  1  1  0  2  TGS816  1  1  0  1  1  1  1  0  1  3  TGS822  1  0  1  1  1  0  1  1  0  4  TGS2600  1  1  1  0  0  1  0  1  1  5  TGS2602  1  0  1  1  1  1  0  1  1  6  TGS2610C  1  1  1  1  0  0  1  1  1  7  TGS2611E  1  0  1  0  1  1  0  1  1  8  TGS2620  0  1  0  1  0  1  0  0  1  9  MQ135A  1  0  0  0  1  1  1  1  1  10  MQ135  1  1  0  0  1  1  1  1  1  11  MQ136  0  1  1  0  1  1  1  1  0  12  MQ137  0  1  1  1  0  1  1  0  0  13  MS1100  0  1  1  1  1  0  0  0  0  14  MP4  1  1  1  1  0  0  1  0  1  15  MP503  0  1  1  1  1  0  1  Then, we compare the proposed method with the results of the existing common feature extraction and classifier in an E-nose. We use the maximum steady-state response of the original response curve, PCA (as the representative of the linear feature extraction algorithm) and KPCA (as the representative of the nonlinear feature extraction algorithm) to construct the feature matrix, and select SVM, RBFNN and K-LDA (the kernel LDA) as classifiers. The data processing results are shown in Table 8. and the corresponding weight when the highest recognition rate is reached. It can be seen visually that the weights have a great influence on the effect of the whole system. Therefore, finding the appropriate weight value is an important step. Different weights combinations have the problem of the local optimal solution; we use EQPSO to avoid this, and then, we find the global optimization point: Then, we compare the proposed method with the results of the existing common feature extraction and classifier in an E-nose. We use the maximum steady-state response of the original response curve, PCA (as the representative of the linear feature extraction algorithm) and KPCA (as  From Table 8, we can find that the best result is obtained when KPCA is used to extract the feature and SVM is the classifier, but this result (96.6/90.6%) is still a little worse than E-LCKSVD (98.4/96.9%). The advantages and drawbacks of all techniques used in this paper is shown in Table 9. Table 9. The advantages and drawbacks of all the techniques used in this paper.

PCA
This linear feature extraction algorithm obtains the new feature according to the variance contribution rate, but the effect is not satisfactory when dealing with the nonlinear data.

KPCA
With the help of the kernel function, the data can be mapped to a high-dimension space and then analyzed by PCA, which has the ability of processing the nonlinear data, but the high-dimension mapping increases the computational complexity.

K-LDA
LDA is a kind of supervised linear classifier. With the help of the kernel function, it has the ability to classify the nonlinear data to some extent, but the improvement of the recognition rate of the kernel function is limited.

RBFNN
An artificial neural network used in an E-nose earlier: using a radial basis function as the nonlinear mapping function, the recognition rate is better than K-LDA, but still lower than SVM.
SVM For a long time, SVM is considered as an optimal classifier. With the help of the kernel function, SVM has an excellent ability to process data, but the recognition rate is affected by the quality of the input data.

E-LCKSVD
The feature extraction and classifier are integrated into one, considering the influence of the dictionary initialization, kernel function and weight coefficient of the objective function on the recognition rate; if the idea of semi-supervised learning can be added, it would be more valuable to use unlabeled data which is cheap and easily available.

Conclusions
In this paper, an E-nose is used to classify the orange juice of different storage days. An intelligent algorithm is important for the E-nose. Feature extraction and classification are two key steps, and the existing techniques generally study the two links separately, but the classifier can only adjust itself and cannot train the feature extraction in the process of training, so the result is not satisfactory. In this paper, an enhanced L-KSVD technique, E-LCKSVD, is proposed. This technique can combine feature extraction with pattern recognition and adjust and optimize the feature extraction and classification according to the training results. Aiming at the traditional L-KSVD algorithm, combined with the characteristics of the data in this paper, we have made some improvements to the algorithm: Firstly, the nonlinear response of the E-nose is mapped to linear data by using the kernel function. Then, a dictionary initialization method with the help of EQPSO is proposed to initialize the dictionary of KSVD/L-KSVD, and a KSVD/L-KSVD dictionary initialization method suitable for the characteristics of E-nose data is obtained. Finally, the weighted coefficients of different parts of the objective function of L-KSVD are standardized, the influence of weighted coefficients on the recognition rate is studied, and an optimized setting method of weighted coefficients based on EQPSO is proposed. The experiments of using E-LCKSVD to get higher recognition accuracy results are very satisfactory. In short, we conclude from a series of results that E-LCKSVD is an ideal solution for distinguishing gas data from an E-nose.