3.1. KSVD and L-KSVD
KSVD is a generalization of the k-means clustering method, and it works by iteratively alternating between sparse coding the input data based on the current dictionary and updating the atoms in the dictionary to better fit the data. KSVD can be found widely used in applications such as image processing, audio processing, biology and document analysis. KSVD learns a shared dictionary, which optimizes the following objective function:
where
are
N input signals and each is in the
dimension;
(
>>
, making
over-complete) is a dictionary with atoms;
are
N sparse codes of input signals
; and
is a constant, which controls the number of nonzero elements in
less than
.
Solving the minimization of Equation (1) by a two-step iterative algorithm: Firstly, the dictionary is fixed, and the sparse coefficients
can be found. This is the problem of sparse coding, which can be solved by an orthogonal matching pursuit (OMP) [
27,
28]. Secondly, while fixing all other atoms in
, the sparse coefficient matrix
is fixed and dictionary
D is updated one atom at the same time.
For each atom,
and the corresponding
row of coefficient matrix
denoted by
define the group of samples that use this atom as
. The error matrix computes
, restricts
by choosing only the columns in
and obtains
. Then, the following problem is solved:
where a singular value decomposition (SVD) is performed and
.
A label consistent KSVD (L-KSVD) algorithm to learn a discriminative dictionary for sparse coding was presented by Zhuojin Jiang et al. This algorithm learns a single over-complete dictionary and an optimal linear classifier jointly. It yields dictionaries so that feature points with the same class labels will have similar sparse codes.
In order to use class labels of training data, the associating label information with each dictionary item (columns of the dictionary matrix) enforces discriminability in sparse codes during the dictionary learning process [
29].
The aim is to leverage the supervised information of input signals to learn a reconstructive, discriminative dictionary and to include the classification error as a term in the objective function for dictionary learning; each dictionary item is chosen to represent a subset of the training signals ideally from a single class: in that case, each dictionary item can be associated with a particular label. Thus, there is an explicit correspondence between the dictionary items and the labels. Then, L-KSVD focuses on the effects of adding a label-consistent regularization term, subsequently, to learn both more balanced reconstructive and discriminative power, making the objective function with a joint classification error and label-consistent regularization term.
The performance of the linear classifier must be based on the discriminability of
(the input sparse codes). To accept the discriminative sparse codes that have the learned
, an objective function of dictionary construction is defined as follows:
where
dominates the reconstruction and label-consistent regularization,
represents the classification error, and
is the classifier parameters to make the classification dictionary optimal. L-KSVD makes the linear predictive classifier
, with
as the input signal
of the discriminative sparse codes of the classification, denote the discriminative sparse code
consistent with
(input signal), when
are nonzero values on these indices where
and
(dictionary item) use a communion label. For instance, It can be assumed
and
, where
,
,
and
are from class 1;
,
,
and
are from class 2; and
,
,
and
are from class 3.
could be defined as
A is a linear transformation matrix, and L-KSVD identifies ; in the sparse feature space , the original sparse code is transformed to the most discriminative one. , on behalf of the discriminative sparse-code error, performs X approximate discriminant sparse codes. It forces the signals from the same class to have very similar sparse representations and uses simple linear classifiers to achieve a good classification performance. is a class label of Y. is a label vector the goes to the input signal , and the nonzero position indicates the location of the classes. and are scalars for controlling the relative contribution of the corresponding items.
Dictionaries learned with the method will adapt to the structure of the training data (resulting in a good representation of the strict sparse constraints on each member of the set) and will make discriminatory sparse codes
X disregard the dictionary size. These sparse codes can be directly used by classifiers, such as in Reference [
30]. The discriminating characteristics of sparse codes
are very important for the performance of linear classifiers.
The brief introduction of KSVD and L-KSVD are shown above. When we apply L-KSVD to solve the problem of data processing of an E-nose, we find that several problems need to be solved:
The distribution of data gained by E-nose data is nonlinear, so the effect of KSVD/L-KSVD is not ideal when they are used to process the data directly; since the sensor of an E-nose is cross-responsive, the data is redundant. When the dictionary of KSVD/L-KSVD is initialized, different sensor responses will be selected and the corresponding recognition rate will be different. The weighted coefficients of three parts of the L-KSVD objective function will determine the influence of each part on the final result.
3.2. Kernel Function
L-KSVD for sparse coding has contributed a lot, which lies in explicitly integrating the discriminative sparse codes and a single predictive linear classifier into the objective function for dictionary learning. However, this algorithm cannot handle the problems of the nonlinear data very well. As nonlinear dynamical systems which are difficult to solve in mathematics and science, based on pattern recognition theory, the low-dimensional space linear inseparable model through nonlinear mapping to a high-dimensional feature space may be linearly separable, but if we use this technique in a high-dimensional space classification or regression directly, there is a big obstacle named dimension disaster which will exist in high-dimensional feature space operations.
It has been proved that the kernel function can be used to solve this problem effectively. A kernel is a nonnegative real-valued integrable function; it is desirable to define the function for most applications to satisfy two additional requirements: normalization and symmetry. As we know, several types of kernel functions are commonly used in many fields, especially in a support vector machine (SVM). SVM maps the sample space to a feature space of high or even infinite dimensions through nonlinear mapping so that the nonlinear separable problems in the original sample space are transformed into linearly separable problems in the feature space. As for the problems of classification and regression, it is likely that the sample set cannot be processed linearly in the lowdimensional sample space, but the linear partition (or regression) can be achieved through a linear hyper plane in the high-dimensional feature space. As the linear learning machine is established in the high-dimensional feature space, it does not increase the complexity of the calculations and avoids the dimensional disaster to some extent compared with the linear model. All of this is due to the kernel function expansion and computational theory.
The kernel function can be combined with different algorithms to form a variety of different methods based on kernel function technology, and the design of these two parts can be carried out separately. The combination of L-KSVD and the kernel function can solve the nonlinear problem of the E-nose.
In this paper, firstly, we use the RBF kernel to map the data of an E-nose and then run KSVD/L-KSVD, namely, in the high-dimensional space. The expression of RBF kernel is
where
is the scale factor which determines the distribution of the data mapped to the high-dimensional space, so it is a very important parameter. In this paper, an optimization algorithm named EQPSO is used to set its value.
3.3. Dictionary
Choosing a proper dictionary is the first and most important step of the sparse representation based on classifications with encouraging results [
31]. Especially, dictionaries learned from training data obtain researchers’ attentions because the learned dictionaries usually lead to a better representation and achieve much success in classification.
The goal of dictionary learning is to learn an over-complete dictionary matrix in which contains signal-atoms (in this notation, columns of ). A signal vector can be represented, sparsely, as a linear combination of these atoms; to represent , the representation vector should satisfy the exact condition , or the approximate condition , made precise by requiring that for some small value and some norm. The vector contains the representation coefficients of the signal . Typically, the norm is selected as , or .
If and D is a full-rank matrix, there are an infinite number of solutions for the representation problem. Then, constraints should be set on the solution. Also, to ensure sparsity, the solution with the fewest number of nonzero coefficients is preferred, which means the sparsity representation is the solution of either subject to or subject to , where the counts the nonzero entries in the vector .
Minimizing the reconstruction error and satisfying the sparsity constraints to achieve the construction of : Although the random dictionary initialization has an outstanding effect on images of compressing and restorations, it is not good enough to identify gas in an E-nose due to the fact that the cross-responsiveness of the sensor array, that is, each sensor, will respond to the same gas, which means that the response of the sensor is overlapping and redundant.
To select the best combination of sensors, we use random binary number to filter proper dictionary initialized atoms: 1 represents this sensor is selected, and 0 represents not. Remove the information which is redundant, and filter the characteristic representative data in this way. The binary parameter is provided by EQPSO (shown in
Section 3.5), and the way to generate the random binary number can be found in
Section 3.5.