2.1. Classifiers
Scores of feasible and high-efficiency classification models have sprung up and been proven to be promising in e-nose applications over the past few decades [
14,
15,
16,
17,
18]. They can be concisely categorized as two types [
19,
20]. One is the linear classifier. Early research by Martı́n et al. [
21] utilized linear discriminant analysis (LDA) in an e-nose system to accomplish certain classification tasks about vegetable oils, which offers excellent classification and prediction competence. Song et al. [
22] employed the partial least squares regression (PLSR) analysis to decide the predictive correlations between e-nose responses, the chemical parameters of the controlled oxidation of chicken fat, free fatty acid profiles, and gas chromatography-mass spectrometer (GC-MS) data and proved the promising application of e-nose systems in chicken fat oxidation control. Thaler et al. [
23] used an e-nose with the logistic regression method to manage binary classification of bacteria data. Hassan et al. [
24] combined a probabilistic framework with spike latency patterns in an e-nose for the quantification or classification of carcinogenic formaldehyde and used a naive Bayes classifier to evaluate the stochastic variability in the spike latency patterns. The linear classifier is relatively easy to establish and basically efficient, but functions in a limited manner when handling nonlinear problems.
As previous research work has demonstrated, the innate nonlinear attribute characterizes some e-nose data [
6]. More specifically, when analyzing volatile organic compounds (VOCs), the data structure of the feature matrix derived from the e-nose response curves is nonlinear. Also, some exceptions will render the data structure nonlinear and complex [
25]. To better cope with the nonlinear characteristic of the e-nose data, nonlinear classifiers are introduced into the e-nose applications. Artificial neural networks (ANNs), which typically possess nonlinear attributes, have been used in an e-nose system by Gardner et al. [
26]. This work illustrated the superiority of the ANN over conventional methodologies. Pardo et al. [
27] applied SVM to e-nose data classification and found this technique efficient, but strongly sensitive to the regularization parameter. Tang et al. [
28] constructed an e-nose system with a KNN-embedded microprocessor for smell discrimination and demonstrated its excellent performance in distinguishing the chemical volatile of three kinds of fruits. In addition, the decision tree, which is a tree structure comprising internal and terminal nodes, was used in both the discrimination and dimensionality reduction of e-nose data by Cho et al. [
29]. The nonlinear classifier can model the complicated nonlinear relationship between inputs and desired outputs and exhibits distinguished robustness and fault tolerance. Nevertheless, it shows delaying convergence and easily falls into local optima.
2.2. ELM
ELM, first put forward by Huang et al. [
30] in 2004, is a single-hidden layer feedforward neural network (SLFN)-based learning algorithm, which selects hidden nodes randomly and computes the output weights of SLFNs analytically rather than tuning parameters iteratively. In this way, it exhibits excellent generalization performance at an exceedingly fast learning speed. Afterwards, Qiu et al. [
31] applied ELM to e-nose data processing for both qualitative classification and quantitative regression of strawberry juice data and further concluded that ELM performed best in comparison to other pattern recognition approaches such as the learning vector quantization (LVQ) neural networks and SVMs. Over the last few decades, aware of the remarkable nature of ELM, a wide range of variants of ELM have been proposed to tackle the unconsidered or open questions remaining in this promising research field. As an example, fully-complex ELM (C-ELM) was designed to extend ELM from the real domain to the complex domain by Li et al. [
32]. Similarly, Huang et al. [
33,
34] suggested incremental extreme learning machine (I-ELM), which incrementally increases randomly generated hidden nodes and the improved form of I-ELM with fully-complex hidden nodes to extend it from the real domain to the complex domain. They stated that I-ELM and C-ELM with fully-complex activation functions and with randomly-generated hidden nodes not relying on the training data can serve as universal approximators.
The kernel method, one of the various improvement methods for ELM, has aroused much interest and been utilized to promote a variety of systems ever since. Pioneering work by Huang et al. [
35] succeeded in extending ELM to kernel learning, that is, ELM can use various feature mappings (hidden-layer output functions) involving not only random hidden nodes, but also kernels. In other words, in kernel ELM (KELM), which has been proven more efficient and stable than the original ELM, the hidden layer feature mapping is determined by virtue of a kernel matrix. Furthermore, KELM retains the characteristic of ELM, whose quantity of hidden nodes is randomly assigned. Then, Fernández-Delgado et al. [
36] proposed a so-called direct kernel perceptron (DKP) on the basis of KELM. Fu et al. [
37] achieved the fast determination of impact location using KELM. More recently, Peng et al. [
38] perfectly applied KELM to the e-nose signals classification, which dramatically obtained high efficiency.
Despite the great applicability, however, a multitude of research works have demonstrated that the generalization ability of KELM is closely related to the kernel functions, and how to select or construct an effective kernel function that adapts to the practical problems is invariably a hot issue in the study of ELM. A simple KELM is generally implemented using a single kernel function, which can only reflect the characteristics of one class or one facet of data, and therefore is bound to trigger defects. The performances of KELMs with different kernels and model parameters are enormously different. The model parameters after training are still intensely sensitive to the samples. Consequently, the KELM has poor generalization ability and robustness due to the fixed form and a relatively narrow range of variation for a single kernel.
Recently, to better and more suitably address a specific problem, a more popular idea on kernel function establishment, called the multiple kernel learning (MKL) has been created and utilized. The MKL creates a feasible composite kernel by properly combining a series of kernels [
39,
40]. One of these kernels, the weighted kernel technique, has been further explored and has proved to be strikingly efficient in various studies. To name just a few, Sonnenburg et al. [
41] offered an approach of convexly combining several kernels with a sparse weighting to overcome the problems within traditional kernel methods. Additionally, in 2014, Jia et al. [
25] proposed a novel weighted approach to build the kernel function of kernel principal component analysis (KPCA) and utilized it in an e-nose to predict the wound infection ratings by extracting the data structure in the original feature matrix of wound infection data. They promoted the weighted KPCA (WKPCA) method and accomplished higher classification accuracy than that of many other classical feature extraction methods under the same conditions.
Moreover, research works have revealed the tremendous applicability of the weighted multiple kernel methodology in the field of ELM. Liu et al. [
42] accomplished pioneering work and employed the weighted multiple kernel idea to solve two unconsidered issues in KELM and ELM: the ways of selecting an optimal kernel in a specific application context of KELM and coping with information fusion in ELM when there are various heterogeneous sources of data, and proposed sparse, non-sparse and radius-incorporated multiple kernel ELM (MK-ELM) methods. Furthermore, Zhu et al. [
43] put forward the distance-based multiple kernel ELM (DBMK-ELM), which is a linear combination of base kernels and the combination coefficients are learned by virtue of solving a regression problem. It can attain an extremely fast learning speed and be adopted in both classification and regression, which was not accomplished by previous MK-ELM methods. Li et al. [
44] proposed two formulations of multiple kernel learning for ELM by virtue of formulating it as convex programs, and thus, globally optimal solutions are guaranteed, which also proved to be competitive in contrast to the conventional ELM algorithm. In the learning of these different MK-ELMs, they are solved by constrained-optimization problems with different constraints. Usually, only the combination coefficients of base kernels and the structural parameters of classifiers (the output weights of SLFNs) are learned and analytically obtained by a matrix inverse operation and the regularization parameter
C is specified arbitrarily [
42,
43]. In a different study, the regularization parameter
C is jointly optimized with the combination coefficients of base kernels and the structural parameters of classifiers, which works better in most cases in comparison with the approach of pre-specifying
C [
44]. This means that all the algorithms regard the combination coefficients of base kernels (weights) as an inner parameter of SLFNs and obtain the optimal weights by serving them as constraints of the joint optimization objective function. In addition, all the algorithms do not optimize the kernel parameters of the base kernels, which are just specified as several special values. However, the kernel parameters of the base kernels have strong effects on the spatial distribution of the data in the high-dimensional feature space, which is defined by the kernel implicitly. On the other hand, the regularization parameter
C is of great importance for the generalization performance of MK-ELMs. Consequently, the kernel parameters of the base kernels and the regularization parameter
C need to be properly selected. All the MK-ELM algorithms emphasize the constrained-optimization problems for learning and lose sight of the effectiveness of intelligence optimization algorithms for parameter optimization. Furthermore, from a practical point of view, the application of MK-ELM in e-noses has not been explored.