A New Mixed-Gas-Detection Method Based on a Support Vector Machine Optimized by a Sparrow Search Algorithm

Zhang, Haitao; Han, Yaozhen

doi:10.3390/s22228977

Open AccessArticle

A New Mixed-Gas-Detection Method Based on a Support Vector Machine Optimized by a Sparrow Search Algorithm

by

Haitao Zhang

and

Yaozhen Han

^*

School of Information Science and Electrical Engineering, Shandong Jiaotong University, Jinan 250357, China

^*

Author to whom correspondence should be addressed.

Sensors 2022, 22(22), 8977; https://doi.org/10.3390/s22228977

Submission received: 30 September 2022 / Revised: 15 November 2022 / Accepted: 17 November 2022 / Published: 20 November 2022

(This article belongs to the Special Issue Electronic Gas Sensors, Sensor Arrays, and Electronic Noses for Indoor and Outdoor Environment Monitoring)

Download

Browse Figures

Versions Notes

Abstract

:

To solve the problem of the low recognition rate of mixed gases and consider the phenomenon of low prediction accuracy when traditional gas-concentration-prediction methods deal with nonlinear data, this paper proposes a mixed-gas identification and gas-concentration-prediction method based on a support vector machine (SVM) optimized by a sparrow search algorithm (SSA). Principal component analysis (PCA) is applied to perform data dimensionality reduction on the input data, and SSA is adopted to optimize the SVM hyperparameters to improve the recognition rate and gas-concentration-prediction accuracy of mixed gases. For the mixed-gas identification, the classification accuracy is significantly improved under the proposed SSA optimization SVM method (SSA-SVM), compared with random forest (RF), extreme-learning machine (ELM), and BP neural network methods. With respect to gas-concentration prediction, the maximum fitting degrees reached 99.34% for single gas-concentration prediction and 97.55% for mixed-gas-concentration prediction. The experimental results show that the SSA-SVM method had a high recognition rate and high concentration-prediction accuracy in gas-mixture detection.

Keywords:

mixed-gas detection; support vector machine; sparrow search algorithm; prediction; classification

1. Introduction

Mixed-gas detection is of great significance in the fields of food safety testing [1], agricultural production management [2], urban environmental quality testing [3], and life science research [4], particularly in industrial production. This is one of the key means to prevent fire, explosion, poisoning, and other safety accidents and ensure the safe production of enterprises. For example, once flammable and explosive gases, such as methane and ethylene, leak, it is likely to cause fire, explosion, and other safety accidents.

Gas sensors are an important component of gas detection. Some scholars focus on improving the performance of gas sensors to enhance the detection level of mixed gases. For instance, metals were doped into nanomaterials to raise the sensitivity of sensors in [5,6]. Polyaniline/single-walled carbon nanotube composites were utilized to increase the selectivity and stability of the sensor at room temperature in [7]. The soft-film-plate method and nano-casting strategy were adopted to enhance the sensitivity, response, and recovery time of porous metal oxide sensors in chemical synthesis in [8].

The performance of gas sensors has been promoted through composites, preparation processes, and doping. However, it is far from enough to consider only the hardware. As a multi-disciplinary interdisciplinary subject, machine learning is a research hotspot in the field of artificial intelligence and pattern recognition. Its application covers all fields of artificial intelligence [9,10,11]. Machine learning is also applied in the field of mixed-gas detection. Peng et al. [12] proposed a mixed-gas-identification method based on a deep convolutional neural network (DCNN), as DCNNs have high accuracy and robustness.

Due to the complex network structure of DCNNs and the possibility of falling into local optima, Zhang et al. [13] adopted a BP neural network with a simple network structure to realize the anti-interference detection of carbon monoxide and methane. However, the optimization objective function of the network was complex, which led to a slow convergence speed, and it was difficult to achieve timeliness in practical applications.

For sake of further improving the time efficiency of gas detection, random forest (RF), and k-nearest neighbor (KNN) algorithms were adopted [14,15]. Researchers experimentally demonstrated that the response time of gas detection is reduced to an extent under RF and KNN methods. Yet, with the increase in training samples, the amount of calculation for the KNN becomes larger, and its fault tolerance to training data is poor. The RF is overly sensitive to small changes in the training dataset and tends to overfit during classification [16].

Support vector machines (SVMs), as a type of machine-learning algorithm, has simple calculations, strong versatility, and robustness. SVMs have been widely applied in pattern classification and nonlinear regression, such as image recognition [17,18], text classification [19], fault detection [20,21], etc. In the field of mixed-gas detection, Rachid et al. [22] compared the effectiveness of partial least squares with the SVM method in monitoring gas concentrations in a confined environment. The results showed that gas concentrations were more accurately estimated under the SVM method. Zhao et al. [23] adopted a BP neural network, SVM, and extreme-learning machine (ELM) for identifying ethanol, acetone, and formaldehyde.

The results showed that SVM had the highest recognition accuracy. Zhang et al. [24] achieved the semi-quantitative detection of toxic and harmful gas mixtures in the kitchen through electronic nose detection. Whether through pattern classification or nonlinear regression, the relevant parameters of the SVM (mainly penalty parameter c and kernel function parameter g) must be adjusted to obtain a relatively good prediction effect. The grid search method is commonly used to optimize the selection of SVM parameters [25].

Although the global optimal solution can be found by grid search, the hyperparameter space is not limited. If the scope is expanded to find the global optimal solution, it will be time-consuming. Scholars have proposed different optimization algorithms to solve the problem of SVM hyperparameters, including genetic algorithms (GA) [26,27], particle swarm optimization (PSO) algorithms [28,29], ant colony algorithms [30], grey wolf optimization (GWO) algorithms [31], etc.

In the field of mixed-gas detection, the hyperparameters of support vector regression (SVR) are optimized by PSO to improve the prediction accuracy of mixed gas [32]. Deng et al. [33] proposed a GWO optimization SVM method to effectively suppress the effects of temperature and pressure on carbon monoxide. Li et al. [34] presented the SVR method optimized by GA to increase the prediction accuracy of the gas content in the drilling site. This method achieved precise control of the differences outside the prediction point.

The above optimization algorithms have enhanced the performance of SVM/SVR to an extent. However, PSO and GWO tend to fall into local optima when dealing with complex problems, and the convergence speed is slow in late iterations. The efficiency of GA is usually lower when compared with other traditional optimization algorithms.

Sparrow search algorithm (SSA), a new swarm intelligence optimization algorithm, was proposed by Xue and Shen in 2020 [35]. The algorithm mainly simulates the foraging and anti-predation behavior of sparrows. SSA has attracted wide attention because of its fast convergence speed and strong optimization ability. Wang et al. [36] adopted SSA to solve the optimal configuration model of distributed generations, and the effectiveness and superiority of SSA were verified by experimental simulation.

Song et al. [37] described the SSA optimization ELM method for the evaluation of water quality, which successfully overcomes the instability and nonlinearity of water quality parameter data. Liu et al. [38] optimized the SSA method to solve the problem of unmanned aerial vehicle (UVA) path planning. The modified SSA provides the best route for UAVs in complex 3D flight environments.

In the field of image processing, Wu et al. [39] proposed a modified SSA algorithm to deal with the matter of poor performance of threshold image segmentation in peak-to-noise ratio and feature similarity. Zhang et al. [40] introduced SSA into an adaptive enhancement classifier to improve the potential of lung CT image classification performance and the probability of early-stage cancer detection. Aiming at the problem of a low recognition rate of mixed gas and low prediction accuracy of gas concentration, this paper proposes a mixed-gas-detection method based on SSA-SVM. Its contribution is summarized as follows.

(1) An optimization algorithm based on the sparrow search algorithm to optimize the hyperparameters of support vector machines is designed for the identification of mixed gases and the prediction of gas concentration.

(2) The parameter selection of SVM/SVR is solved by the SSA algorithm. In mixed-gas identification, the identification accuracy of the mixed gas is improved by combining PCA with SSA-SVM. In gas-concentration prediction, the fitting degree of gas-concentration prediction under single gas and mixed gas is improved by the SSA-SVR. Compared with GWO and GA optimization algorithms, the SSA optimization algorithm improves the convergence speed and prediction fitting degree.

The structure of this paper is organized as follows. Section 2 introduces the mixed-gas-detection method. In Section 3, the mixed-gas identification and gas-concentration prediction are experimentally verified, and the results are analyzed and compared. Finally, our summary is given in Section 4.

2. Mixed-Gas-Detection Methods

In order to further improve the accuracy of mixed-gas identification and concentration prediction, a new mixed-gas-detection method based on SSA-SVM is proposed in this paper. The flow chart of mixed-gas identification based on SSA-SVM is shown in Figure 1, and the flow chart of mixed-gas-concentration prediction based on SSA-SVR is shown in Figure 2. The dataset provided by the University of California Irvine (UCI) machine-learning repository is adopted for simulation tests in MATLAB 2021 b. This section presents the proposed method in detail.

2.1. Mixed-Gas Classification and Gas-Concentration Prediction Based on SVM

2.1.1. Mixed-Gas Classification

SVM, as a machine-learning algorithm, is commonly used to realize the pattern recognition of gas sensor arrays. The main idea is to construct a classification hyperplane as a decision surface to distinguish mixed gases. The relationship between a gas sensor and gas mixture is usually nonlinear, such that a correctly divided hyperplane is often not found in the original gas sample space. Thus, the original low-dimensional space is mapped into a high-dimensional space to search for a suitable partition hyperplane.

In the experiment, the mixed-gas classification dataset has 41,785 samples. Then, the training set in the given feature space is represented as

D = \{(x_{1}, y_{1}), \dots, (x_{41785}, y_{41785})\} \in {(X \times Y)}^{41785}

(1)

where

x_{i} \in X, y_{i} \in Y = {1, 2, 3, 4} (i = 1, 2, \dots, 41785)

;

x_{i}

is the main feature vector extracted by PCA;

y_{i}

is the gas classification label.

The optimal hyperplane of high-dimensional space can be defined as

f (x) = w^{*} φ (x) + b

(2)

where w is the normal vector of the classification hyperplane and b is the intercept.

After the input data is mapped to a high-dimensional space, the SVM optimization problem becomes

min_{w, b} \frac{1}{2} {∥w∥}^{2} + c \sum_{i = 1}^{41785} ξ_{i}

(3)

where c is the regularization coefficient and

ξ_{i}

is the relaxation variable.

SVM maps linear and indivisible low-dimensional space to high-dimensional space by nonlinear transformation, and it has “dimension disaster” when operating in high-dimensional space. The introduction of the kernel function can not only transform the feature from low-dimensional space to high-dimensional space but also effectively avoid “dimension disaster” and reduce the amount of computation. For optimization problems, it is necessary to select an appropriate kernel function

K (x, x_{i})

and penalty parameter c and then to construct and solve the optimization problem.

\begin{matrix} max Q (λ) = \sum_{i = 1}^{m} λ_{i} - \frac{1}{2} \sum_{i = 1}^{m} \sum_{j = 1}^{m} λ_{i} λ_{j} y_{i} y_{j} K (x_{i}, x_{j}) \\ s . t \{\begin{matrix} \sum_{i = 1}^{m} λ_{i} y_{i} = 0 \\ 0 \leq λ_{i} \leq c \end{matrix}, i = 1, 2, \dots, m \end{matrix}

(4)

Then, the optimal solution is obtained as

λ^{*} = {({λ_{1}}^{*}, {λ_{2}}^{*}, \dots, {λ_{m}}^{*})}^{T}

.

According to the optimal solution,

λ^{*}

,

w^{*}

is calculated as

w^{*} = \sum_{i = 1}^{m} λ_{i}^{*} y_{i} K (x, x_{i})

(5)

To choose a positive component

λ_{j}^{*}

(

0 < λ_{j}^{*} < C

) of

λ^{*}

, the threshold is calculated as

b^{*} = y_{i} - \sum_{i = 1}^{m} y_{i} λ_{i}^{*} K (x_{i} - x_{j})

(6)

Finally, the decision function is constructed as

f (x) = sgn (\sum_{i = 1}^{m} λ_{i}^{*} y_{i} K (x, x_{i}) + b^{*})

(7)

2.1.2. Mixed-Gas-Concentration Prediction

SVR, as an important application branch of SVM, is superior to other machine-learning algorithms in gas-concentration prediction. The basic idea of SVR is no longer to find an optimal classification surface to separate the samples but to find an optimal classification surface to minimize the error of all training samples from the optimal classification surface.

For the regression problem, the training data

T = {(x_{1}, y_{1}), \dots, (x_{n}, y_{n})}, y_{i} \in R

is given to make

f (x) = w^{T} x + b

and

y_{i}

as close as possible. Then, the loss is calculated from the difference between the predicted output gas concentration

f (x)

and the real gas concentration

y_{i}

. The corresponding loss function is

L (z) = max (0, |z| - ε) = \{\begin{matrix} 0, \begin{matrix} i f |z| < ε \end{matrix} \\ |z| - ε, \begin{matrix} o t h e r w i s e \end{matrix} \end{matrix}

(8)

Thus, the optimization problem of SVR can be expressed as

\begin{matrix} min_{w, b} \frac{1}{2} {∥w∥}^{2} + C \sum_{i = 1}^{n} L (z) \\ s . t . \begin{matrix} |y_{i} - f (x)| \end{matrix} \leq ε \end{matrix}

(9)

Slack variables

ξ_{i}

and

{ξ_{i}}^{*}

are then introduced to replace the loss function, and Equation (9) is rewritten as

\begin{matrix} \min_{w, b} \frac{1}{2} {‖ w ‖}^{2} + C \sum_{i = 1}^{n} (ξ_{i} + ξ_{i}^{*}) \\ s . t . \{\begin{matrix} y_{i} - f (x_{i}) \leq ε + ξ_{i} \\ f (x_{i}) - y_{i} \leq ε + ξ_{i}^{*} \\ ξ_{i}, ξ_{i}^{*} \geq 0 \end{matrix} \end{matrix}

(10)

where C is the regularization coefficient, which is mainly used to prevent SVR overfitting. If the value of is C too large or too small, the generalization ability of SVR will be deteriorated.

Then, the Lagrange multiplier is introduced to obtain the Lagrange function and the dual problem. The solution of SVR can be obtained by solving the dual problem.

f (x) = \sum_{i = 1}^{n} (λ_{i}^{*} - λ_{i}) {x_{i}}^{T} x_{j} + b

(11)

Finally, the kernel function is added to obtain the decision function of the gas-concentration prediction.

f (x) = \sum_{i = 1}^{n} (λ_{i}^{*} - λ_{i}) K (x, x_{i}) + b

(12)

2.2. Sparrow Search Algorithm

Whether SVM or SVR, the performance depended on the selection of the two parameters. In this paper, an SSA optimization algorithm is introduced to solve the hyperparameter problem and to improve the identification accuracy of mixed gas and the accuracy of gas-concentration prediction. SSA mainly simulates the foraging and anti-predatory behavior of sparrow groups. In the sparrow foraging process, sparrows are divided into finders and joiners according to their locations and energy reserves. The finders are responsible for finding food and for providing the joiners with foraging areas and directions. Joiners find food based on the information provided by the finders.

Assuming that there are sparrows, the sparrow population can be expressed as

A = [\begin{matrix} a_{1, 1} & a_{1, 2} & \dots & a_{1, d} \\ a_{2, 1} & a_{2, 2} & \dots & a_{2, d} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ a_{p, 1} & a_{p, 2} & \dots & a_{p, d} \end{matrix}]

(13)

where p is the number of sparrows, and d is the number of optimization parameters.

In the iterative process, the algorithm will continue to search for food within the search range if the finder does not find predators, and it will sound an alarm to alert other sparrows when the finder finds a predator. At this point, the whole flock of sparrows will quickly fly to other safe areas to feed. The location update of the discoverer is described as

a_{i, j}^{t + 1} = \{\begin{matrix} a_{i, j}^{t} * exp \frac{- i}{β * M a x T} \\ a_{i, j}^{t} + Q * L \end{matrix} \begin{matrix}  \end{matrix} \begin{matrix} i f (R_{2} < S T) \\ i f (R_{2} \geq S T) \end{matrix}

(14)

where t is the current number of iterations;

M a x T

is the maximum number of iterations; and

R^{2}

and

S T

are the warning value and the safety value, respectively. When

t < M a x T

, SSA will sort the sparrows according to their fitness and find the current best and worst sparrows.

During foraging, the foraging strategy is determined by their energy level (fitness) of the sparrows. However, it will immediately leave the current location to obtain food if the joiner finds that the finder has found better food. At this time, the location update of the joiners is described as

a_{i, j}^{t + 1} = \{\begin{matrix} Q * exp \frac{a_{W o r s t}^{t} - a_{i, j}^{t}}{i^{2}} \\ a_{p}^{t + 1} + |a_{i, j}^{t} - a_{p}^{t + 1}| * (X^{T} {(X X^{T})}^{- 1}) * L \end{matrix} \begin{matrix}  \end{matrix} \begin{matrix} i f (i < \frac{n}{2}) \\ o t h e r w i s e \end{matrix}

(15)

where

a_{p}

is the best position currently occupied by the finder and

a_{W o r s t}

is the current global worst position.

Sparrows with poor fitness and at the edge of the population are extremely vulnerable to natural enemies. These sparrows quickly fly to safety as soon as they are aware of the danger. However, once the sparrow in the middle of the population is aware of the danger, it will immediately move closer to other sparrows to reduce its own danger. For these sparrows, the location update is described as

a_{i, j}^{t + 1} = \{\begin{matrix} a_{b e s t}^{t} + λ |a_{i, j}^{t} - a_{b e s t}^{t}| \\ a_{i, j}^{t} + K * (\frac{|a_{i, j}^{t} - a_{b e s t}^{t}|}{(f_{i} - f_{w}) + ε}) \end{matrix} \begin{matrix}  \end{matrix} \begin{matrix}  \end{matrix} \begin{matrix}  \end{matrix} \begin{matrix} f_{i} > f_{g} \\ f_{i} = f_{g} \end{matrix}

(16)

where

a_{b e s t}

is the current global optimal position;

λ

is the step size;

f_{i}

is the fitness of the current sparrow individual; and

f_{g}

and

f_{w}

are the current global best and worst fitness, respectively.

During the iteration process, if the sparrow’s new position is better than the previous position, the current position will be updated until the global optimal position and the global optimal fitness are found. During this period, the identity of the sparrow is also constantly updated and alternated. Every sparrow can be a finder if it is well adapted; however, the proportions of finders and joiners in the population are constant.

3. Experimental Results and Analysis

3.1. Data Preparation

This paper uses the data set “Gas sensor arrays in dynamic gas mixtures” from the UCI. The dataset contains 16 chemical sensors of four different types: TGS-2602, TGS-2602, TGS-2600, TGS-2600, TGS-2610, TGS-2610, TGS-2620, TGS-2620, TGS-2602, TGS-2602, TGS-2600, TGS-2600, TGS-2610, TGS-2610, TGS-2620, and TGS-2620. In the experiment, the sensor array composed of 16 gas sensors is placed in a 60 mL sealed box.

Gas samples are injected at a constant flow rate of 300 mL/min. The conductivity of the sensor (S/m) is continuously collected at a sampling frequency of 100 Hz. Each measurement is constructed by continuously acquiring signals from 16 sensor arrays. The concentration of the gas sample varies randomly. The gas samples are mainly composed of methane and ethylene in the air. The concentration range of ethylene is 0–20 ppm, and the concentration range of methane is 0–300 ppm. More details on the dataset can be found in [41]. After sorting and screening this dataset, 41,785 gas samples are finally selected for the experiment.

3.2. Mixed-Gas Identification

3.2.1. Feature Extraction

The input data of this system is the output of 16 sensors in the sensor array. The output data is the gas-classification label, and the desired output is

[\begin{matrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{matrix}]

(17)

which represents methane, ethylene, air, and mixed gases.

To improve the recognition accuracy of gas mixtures, the input data needs to be subjected to feature extraction. This eliminates or reduces the effect of the gas concentrations on the sensor array. As a common data analysis method, PCA can be used to extract the main feature vectors of data, achieve dimensionality reduction of high-dimensional data, and maximally preserve the feature information in high-dimensional data. Since the dimensions of the input data are inconsistent, which affects the calculation results, this study needs to be standardized to the original data first.

a_{i} = \frac{x_{i} - \bar{x}}{\sqrt{\frac{1}{m} \sum_{i = 1}^{m} {(x_{i} - \bar{x})}^{2}}}

(18)

Standardization transforms the input data into data with a mean of 0 and a standard deviation of 1. This reduces the effect of outliers on the gas-classification results. Then, the input data is

X = [\begin{matrix} a_{1, 1} & a_{2, 1} & \dots & a_{16, 1} \\ a_{1, 2} & a_{2, 2} & \dots & a_{16, 2} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ a_{1, 41785} & a_{2, 41785} & \dots & a_{16, 41785} \end{matrix}] = [\begin{matrix} b_{1} & b_{2} & \dots & b_{16} \end{matrix}]

(19)

where

b_{1} = {[a_{1, 1}, a_{1, 2}, \dots, a_{1, 41785}]}^{T}

.

After standardized processing, the variance matrix is represented as

C = \frac{1}{41784} X^{T} X

(20)

The eigenvalue

k_{1}, k_{2}, \dots, k_{16}

and

λ_{1}, λ_{2}, \dots, λ_{16}

eigenvector of the covariance matrix are then obtained. The eigenvalues are sorted from large to small, and the contribution rate of each eigenvalue is

p_{i} = \frac{κ_{i}}{\sum_{i = 1}^{16} κ_{i}}

(21)

The cumulative contribution rate is obtained by solving the contribution rate of each eigenvalue. The cumulative contribution rate is the sum of the contribution rates of the first n eigenvalues. This is reflected the ability of the first n principal components to explain the original variable. Finally, the required principal components are obtained from the cumulative contribution rate:

X = {(x_{1}, x_{2}, \dots, x_{n})}^{T}

. The sixteen eigenvalues and contribution rates obtained from this data set after PCA processing are shown in Table 1.

As shown in Table 1, with the increase in the principal components, the increase in the contribution rate gradually decreases. When there are four principal components, the cumulative contribution rate reaches more than 99%. The four principal components explain more than 99% of the total variance. Thus, the original data can be reduced from 16 to 4 dimensions to represent the original features in this paper. After the data is reduced by PCA, the sample order of the classification dataset is randomly shuffled. According to the ratio of 8:2, this is divided into training sets and test sets.

3.2.2. Classification Results for Mixed Gases

In the experiment, the value range of hyperparameter c and g of SVM are set as

[2^{- 2} - 2^{8}]

and

[2^{- 5} - 2^{5}]

, respectively. The kernel function of SVM is the radial basis function. The related parameters of SSA are set as shown in Table 2.

In Table 2, d is the number of optimization parameters; lb is the lower limit of the optimization parameters in SVM; ub is the upper limit of the optimization parameters in SVM; p is the sparrow population size; MaxT is the maximum number of iterations; ST is the safety threshold; FD is the finder; JD is the joiner.

After setting the relevant parameters of SSA, the optimal parameters c = 53.248 and g = 5.979 are obtained by solving Equations (14)–(16). Then, the best parameter combination is used to train on the training set to obtain the SSA-SVM model, and finally the classification test is performed on the test set. The classification results are shown in Figure 3.

It can be seen from Figure 3 that the accuracy rates of methane, ethylene, air, and mixed gas in the mixed-gas identification are 95.5%, 96.1%, 95.9%, and 96.7%, respectively. The accuracy rate is 96% among 2006 methane test samples, and 96.3% of samples were correctly identified as mixed gas among 1695 mixed-gas test samples. Overall, SSA-SVM correctly classified 8025 out of 8357 test samples. The accuracy rate reached 96%.

According to the model processing tasks, the evaluation criteria were also different. Accuracy is the most simple and intuitive evaluation index in classification problems. The accuracy is susceptible to the influence of the larger category if the sample is unevenly proportioned. In multi-classification, the arithmetic mean of the sample accuracy under each category is generally used as the evaluation index of the model.

A C C = \sum_{i = 1}^{4} \frac{T P_{i}}{T P_{i} + F P_{i}}

(22)

where

T P

is the number of correct predictions and

F P

is the number of incorrect predictions.

To further demonstrate the performance of SSA-SVM, we compared it with RF, ELM, and BP neural networks. The relevant parameters of the RF, ELM, and BP neural networks are shown in Table 3. The classification results are shown in Figure 4.

From the combination of Figure 4 and Equation (22), the accuracies of the four methods can be calculated as 96%, 92.9%, 86.8%, and 81.3%. SSA-SVM improved the accuracy by 3.1%, 8.2%, and 14.7% compared with the RF, ELM, and BP neural network.

In order to better reflect the mixed gas recognition ability of the SSA-SVM, RF, ELM, and BP neural networks, the accuracy rate of each category under the above methods is calculated. The comparison results are shown in Figure 5. As shown in Figure 5, the SSA-SVM method is better than the other three methods in identifying both single gases and mixed gases.

3.3. Prediction of Mixed-Gas Concentration

3.3.1. Data Processing

In this paper, the characteristic subsets of each gas concentration are screened from the classification dataset. The input data is unchanged, and the output data is the gas concentration value. In the experiment, the input data and output data are normalized to be between 0 and 1 to improve the calculation efficiency of the regression model fitting process.

y_{i} = \frac{x_{i} - min_{1 \leq j \leq n} \{x_{j}\}}{max_{1 \leq j \leq n} \{x_{j}\} - min_{1 \leq j \leq n} \{x_{j}\}}

(23)

Once the data process is completed, the sample order of the feature subset is randomly shuffled. It is divided into a training set and test set with a ratio of 9:1, and five-fold cross-validation will be performed during the training process. The sample sizes for each subset of gas concentration characteristics are shown in Table 4.

3.3.2. Mixed-Gas-Concentration Prediction Results

The value range of the hyperparameter q is

[2^{- 5} - 2^{10}]

, and the value range of t is

[2^{- 5} - 2^{10}]

. The loss function value is set as 0.01. The relevant parameters of SSA are set as Table 5.

In Table 5, d is the number of optimization parameters; lb is the lower limit of optimization parameters in SVR; ub is the upper limit of optimization parameters in SVR; p is the sparrow population size; MaxT is the maximum number of iterations; ST is the safety threshold; FD is the finder; JD is the joiner.

According to Equations (14)–(16), the two hyperparameters of SVR are optimized, and the optimal parameter combination is finally determined. The optimal parameter combination of each feature subset is shown in Table 6.

Then, the gas-concentration-prediction test is conducted using the test set. The prediction results are normalized to obtain the final concentration prediction results. The SSA-SVR gas-concentration-prediction results are shown in Figure 6.

The coefficient of determination

R^{2}

is one of the performance evaluation indicators of the regression model, which is applied to reflect the accuracy of the regression model to fit the data. The value range of

R^{2}

is 0–1. The closer the value is to 1, the better the model fits the data.

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - f (i))}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}} * 100 %

(24)

To further validate the regression effect of the SSA algorithm for optimizing SVR, we compared it with the GWO and GA optimization algorithms. The minimum mean square error result is shown in Figure 7. The relevant parameter settings of the GWO and GA optimization algorithms are shown in Table 7 and Table 8.

In Table 7, SA is the number of wolves, Maxt is the maximum number of iterations, dim is the number of optimization parameters, lb is a lower bound on the optimization parameters, and ub is an upper bound on the optimization parameters.

In Table 8, Nind is the population size; Maxgen is the maximum number of genetic generations; ggap is the generation gap; px and pm are the crossover probability and variation probability, respectively; and select is the selection function.

The mean square error of the SSA optimization algorithm reaches the minimum after six iterations under the same population size and iteration numbers. The GWO optimization algorithm achieves the minimum mean square error in about 12 iterations, and the GA optimization algorithm takes about 15 iterations. In comparison, the convergence speed of the SSA optimization algorithm is better than that of the GWO and GA algorithms.

In order to further intuitively reflect the fitting degree of the prediction data of SSA-SVR, GWO-SVR, and GA-SVR, the determination coefficient of each model is calculated. The fitting results of the gas-concentration prediction are shown in Figure 8. Whether for a single gas or a mixture of gases, the concentration prediction fit of SSA-SVR is better than that of GWO-SVR and GA-SVR. The fitting degree of SSA-SVR for single gas-concentration prediction is over 97%. The fitting degrees of SSA-SVR for the prediction of methane and ethylene concentrations under mixed gas are 92.36% and 97.55%.

4. Conclusions

Mixed-gas detection is of great importance in industrial production processes. This type of detection is important to ensure the green development and safe production of enterprises. This paper presents a mixed-gas-identification method based on SSA-SVM. Our conclusions are summarized as follows.

(1) In the mixed-gas identification experiment, SSA-SVM was combined with PCA for mixed-gas identification. PCA was adopted to reduce the original data from 16 dimensions to 4 dimensions to achieve dimensionality reduction of the high-dimensional data. The influence of redundant data on the SSA-SVM model was reduced. The classification accuracy of SSA-SVM on the test set was over 96%. Compared with the RF, ELM, and BP neural network models, the classification accuracy of SSA-SVM was improved by 3.1%, 8.2%, and 14.7%, respectively.

(2) In the gas-concentration-prediction experiment, the prediction fit and convergence iteration speed of SSA-SVR were better than those of GWO-SVR and GA-SVR. For the concentration prediction of a single gas, the fitting degree of SSA-SVR was more than 97%. The fitting degree of the predicted concentration of methane was as high as 99.34%. The fitting degrees of the predicted concentrations of methane and ethylene in mixed gas reached 92.36% and 97.55%.

(3) With the long-term use of the sensor, there will inevitably be sensor drift. Data distortion is prone to affect data analysis. Next, we will focus on this issue and reduce the effect of drift on the system.

Author Contributions

Conceptualization, H.Z. and Y.H.; methodology, H.Z. and Y.H.; software, H.Z.; validation, H.Z. and Y.H.; formal analysis, H.Z.; investigation, H.Z. and Y.H.; resources, H.Z.; data curation, H.Z.; writing—original draft preparation, H.Z.; writing—review and editing, Y.H. and H.Z.; visualization, H.Z. and Y.H.; supervision, Y.H. and H.Z.; project administration, Y.H. and H.Z.; funding acquisition, Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China (61803230); Projects of Shandong Province College Youth Innovation Technology Support Program (2019KJN023); the Graduate’s Scientific Research Foundation of Shandong Jiaotong University (2022YK060).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available in a publicly accessible repository that does not issue DOIs. Publicly available datasets were analyzed in this study. These data can be found here: https://archive-beta.ics.uci.edu, (accessed on 30 September 2022).

Acknowledgments

The authors would like to thank the anonymous reviewers and the editors for their helpful comments. Thanks to UCI for the dataset.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ali, M.M.; Hashim, N.; Abd Aziz, S.; Lasekan, O. Principles and recent advances in electronic nose for quality inspection of agricultural and food products. Trends Food Sci. Technol. 2020, 99, 1–10. [Google Scholar]
Sharma, A.; Singh, S.P.; Solanki, V.; Sethuramalingam, S.; Singh, S.P. SVM-based compliance discrepancies detection using remote sensing for organic farms. Arab. J. Geosci. 2021, 14, 1–10. [Google Scholar]
Arroyo, P.; Herrero, J.L.; Suárez, J.I.; Lozano, J. Wireless sensor network combined with cloud computing for air quality monitoring. Sensors 2019, 19, 691. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Balasubramaniam, V. Artificial intelligence algorithm with SVM classification using dermascopic images for melanoma diagnosis. J. Artif. Intell. Capsul. Netw. 2021, 3, 34–42. [Google Scholar] [CrossRef]
Djeziri, M.A.; Djedidi, O.; Morati, N.; Seguin, J.L.; Bendahan, M.; Contaret, T. A temporal-based SVM approach for the detection and identification of pollutant gases in a gas mixture. Appl. Intell. 2022, 52, 6065–6078. [Google Scholar] [CrossRef]
Singh, G.; Virpal; Singh, R.C. Highly sensitive gas sensor based on Er-doped SnO2 nanostructures and its temperature dependent selectivity towards hydrogen and ethanol. Sensors Actuators Chem. 2019, 282, 373–383. [Google Scholar] [CrossRef]
Motaghedifard, M.H.; Pourmortazavi, S.M.; Mirsadeghi, S. Selective and sensitive detection of Cr (VI) pollution in waste water via polyaniline/sulfated zirconium dioxide/multi walled carbon nanotubes nanocomposite based electrochemical sensor. Sensors Actuators B Chem. 2021, 327, 128882. [Google Scholar] [CrossRef]
Zhou, X.; Cheng, X.; Zhu, Y.; Elzatahry, A.A.; Alghamdi, A.; Deng, Y.; Zhao, D. Ordered porous metal oxide semiconductors for gas sensing. Chin. Chem. Lett. 2018, 29, 405–416. [Google Scholar] [CrossRef]
Sarker, I.H. Machine learning: Algorithms, real-world applications and research directions. SN Comput. Sci. 2021, 2, 1–21. [Google Scholar] [CrossRef]
Benos, L.; Tagarakis, A.C.; Dolias, G.; Berruto, R.; Kateris, D.; Bochtis, D. Machine learning in agriculture: A comprehensive updated review. Sensors 2021, 21, 3758. [Google Scholar] [CrossRef]
Zhang, L.; Wen, J.; Li, Y.; Chen, J.; Ye, Y.; Fu, Y.; Livingood, W. A review of machine learning in building load prediction. Appl. Energy 2021, 285, 116452. [Google Scholar] [CrossRef]
Peng, P.; Zhao, X.; Pan, X.; Ye, W. Gas classification using deep convolutional neural networks. Sensors 2018, 18, 157. [Google Scholar] [CrossRef]
Zhang, J.; Xue, Y.; Sun, Q.; Zhang, T.; Chen, Y.; Yu, W.; Xiong, Y.; Wei, X.; Yu, G.; Wan, H.; et al. A miniaturized electronic nose with artificial neural network for anti-interference detection of mixed indoor hazardous gases. Sensors Actuators B Chem. 2021, 326, 128822. [Google Scholar] [CrossRef]
Wei, G.; Zhao, J.; Yu, Z.; Feng, Y.; Li, G.; Sun, X. An effective gas sensor array optimization method based on random forest. In Proceedings of the 2018 IEEE SENSORS, New Delhi, India, 28–31 October 2018; pp. 1–4. [Google Scholar]
Xu, Y.; Zhao, X.; Chen, Y.; Zhao, W. Research on a mixed gas recognition and concentration detection algorithm based on a metal oxide semiconductor olfactory system sensor array. Sensors 2018, 18, 3264. [Google Scholar] [CrossRef]
Boateng, E.Y.; Otoo, J.; Abaye, D.A. Basic tenets of classification algorithms K-nearest-neighbor, support vector machine, random forest and neural network: A review. J. Data Anal. Inf. Process. 2020, 8, 341–357. [Google Scholar] [CrossRef]
Chandra, M.A.; Bedi, S. Survey on SVM and their application in image classification. Int. J. Inf. Technol. 2021, 13, 1–11. [Google Scholar] [CrossRef]
Hu, L.; Cui, J. Digital image recognition based on Fractional-order-PCA-SVM coupling algorithm. Measurement 2019, 145, 150–159. [Google Scholar] [CrossRef]
Goudjil, M.; Koudil, M.; Bedda, M.; Ghoggali, N. A novel active learning method using SVM for text classification. Int. J. Autom. Comput. 2018, 15, 290–298. [Google Scholar] [CrossRef]
Saari, J.; Strömbergsson, D.; Lundberg, J.; Thomson, A. Detection and identification of windmill bearing faults using a one-class support vector machine (SVM). Measurement 2019, 137, 287–301. [Google Scholar] [CrossRef]
Zidi, S.; Moulahi, T.; Alaya, B. Fault detection in wireless sensor networks through SVM classifier. IEEE Sensors J. 2017, 18, 340–347. [Google Scholar] [CrossRef]
Laref, R.; Losson, E.; Sava, A.; Adjallah, K.; Siadat, M. A comparison between SVM and PLS for E-nose based gas concentration monitoring. In Proceedings of the 2018 IEEE International Conference on Industrial Technology (ICIT), Lyon, France, 19–22 February 2018; pp. 1335–1339. [Google Scholar]
Zhao, L.; Li, X.; Wang, J.; Yao, P.; Akbar, S.A. Detection of formaldehyde in mixed VOCs gases using sensor array with neural networks. IEEE Sensors J. 2016, 16, 6081–6086. [Google Scholar] [CrossRef]
Zhang, J.; Xue, Y.; Zhang, T.; Chen, Y.; Wei, X.; Wan, H.; Wang, P. Detection of Hazardous Gas Mixtures in the Smart Kitchen Using an Electronic Nose with Support Vector Machine. J. Electrochem. Soc. 2020, 167, 147519. [Google Scholar] [CrossRef]
Laref, R.; Losson, E.; Sava, A.; Siadat, M. On the optimization of the support vector machine regression hyperparameters setting for gas sensors array applications. Chemom. Intell. Lab. Syst. 2019, 184, 22–27. [Google Scholar] [CrossRef]
Tao, Z.; Huiling, L.; Wenwen, W.; Xia, Y. GA-SVM based feature selection and parameter optimization in hospitalization expense modeling. Appl. Soft Comput. 2019, 75, 323–332. [Google Scholar] [CrossRef]
Huang, S.; Zheng, X.; Ma, L.; Wang, H.; Huang, Q.; Leng, G.; Meng, E.; Guo, Y. Quantitative contribution of climate change and human activities to vegetation cover variations based on GA-SVM model. J. Hydrol. 2020, 584, 124687. [Google Scholar] [CrossRef]
Cuong-Le, T.; Nghia-Nguyen, T.; Khatir, S.; Trong-Nguyen, P.; Mirjalili, S.; Nguyen, K.D. An efficient approach for damage identification based on improved machine learning using PSO-SVM. Eng. Comput. 2022, 38, 3069–3084. [Google Scholar] [CrossRef]
Zhang, L.; Shi, B.; Zhu, H.; Yu, X.B.; Han, H.; Fan, X. PSO-SVM-based deep displacement prediction of Majiagou landslide considering the deformation hysteresis effect. Landslides 2021, 18, 179–193. [Google Scholar] [CrossRef]
Pan, M.; Li, C.; Gao, R.; Huang, Y.; You, H.; Gu, T.; Qin, F. Photovoltaic power forecasting based on a support vector machine with improved ant colony optimization. J. Clean. Prod. 2020, 277, 123948. [Google Scholar] [CrossRef]
Liao, X.; Zhou, G.; Zhang, Z.; Lu, J.; Ma, J. Tool wear state recognition based on GWO–SVM with feature selection of genetic algorithm. Int. J. Adv. Manuf. Technol. 2019, 104, 1051–1063. [Google Scholar] [CrossRef]
Fan, S.; Li, Z.; Xia, K.; Hao, D. Quantitative and qualitative analysis of multicomponent gas using sensor array. Sensors 2019, 19, 3917. [Google Scholar] [CrossRef] [Green Version]
Deng, J.; Chen, W.L.; Liang, C.; Wang, W.F.; Xiao, Y.; Wang, C.P.; Shu, C.M. Correction model for CO detection in the coal combustion loss process in mines based on GWO-SVM. J. Loss Prev. Process Ind. 2021, 71, 104439. [Google Scholar] [CrossRef]
Li, D.; Peng, S.; Du, W.; Guo, Y. New method for predicting coal seam gas content. Energy Sources Part A Recover. Util. Environ. Eff. 2019, 41, 1272–1284. [Google Scholar] [CrossRef]
Xue, J.; Shen, B. A novel swarm intelligence optimization approach: Sparrow search algorithm. Syst. Sci. Control Eng. 2020, 8, 22–34. [Google Scholar] [CrossRef]
Wang, H.; Xianyu, J. Optimal configuration of distributed generation based on sparrow search algorithm. In Proceedings of the IOP Conference Series: Earth and Environmental Science; IOP Publishing: Bristol, UK, 2021; Volume 647, p. 012053. [Google Scholar]
Song, C.; Yao, L.; Hua, C.; Ni, Q. Comprehensive water quality evaluation based on kernel extreme learning machine optimized with the sparrow search algorithm in Luoyang River Basin, China. Environ. Earth Sci. 2021, 80, 1–10. [Google Scholar] [CrossRef]
Liu, G.; Shu, C.; Liang, Z.; Peng, B.; Cheng, L. A modified sparrow search algorithm with application in 3d route planning for UAV. Sensors 2021, 21, 1224. [Google Scholar] [CrossRef]
Wu, D.; Yuan, C. Threshold image segmentation based on improved sparrow search algorithm. Multimed. Tools Appl. 2022, 81, 1–34. [Google Scholar]
Zhang, J.; Xia, K.; He, Z.; Yin, Z.; Wang, S. Semi-supervised ensemble classifier with improved sparrow search algorithm and its application in pulmonary nodule detection. Math. Probl. Eng. 2021, 2021, 18. [Google Scholar] [CrossRef]
Fonollosa, J.; Sheik, S.; Huerta, R.; Marco, S. Reservoir computing compensates slow response of chemosensor arrays exposed to fast varying gas concentrations in continuous monitoring. Sensors Actuators B Chem. 2015, 215, 618–629. [Google Scholar] [CrossRef]

Figure 1. The SSA-SVM mixed-gas-identification method.

Figure 2. The SSA-SVR mixed-gas-concentration-prediction method.

Figure 3. Mixed-gas-identification confusion matrix. ‘1’ is methane, ‘2’ is ethylene, ‘3’ is air, and ‘4’ is mixed gas.

Figure 4. Mixed gas results for four mixed-gas-identification methods. (a) The SSA-SVM gas-identification results; (b) the RF gas-identification results; (c) the ELM gas-identification results; and (d) the BP neural network gas-identification results.

Figure 5. Gas identification accuracy of the four mixed-gas-identification methods.

Figure 6. The SSA-SVR gas-concentration prediction results. (a) The predicted concentration of methane under a single gas; (b) the predicted concentration of ethylene under a single gas; (c) the predicted concentration of methane under the mixed gas; and (d) the predicted concentration of ethylene in the mixed gas.

Figure 7. The mean square error of the three optimization algorithms.

Figure 8. Gas-concentration-prediction fits.

Table 1. Eigenvalues and the contribution rates of 16 eigenvectors.

Principal Components	Eigenvalue	Contribution Rate%	Cumulative Contribution Rate%
PC1	10.650	66.562	66.562
PC2	3.779	23.620	90.182
PC3	1.067	6.669	96.851
PC4	0.413	2.585	99.436
PC5	0.039	0.246	99.682
PC6	0.036	0.224	99.906
PC7	0.007	0.049	99.955
PC8	0.003	0.017	99.972
PC9	0.002	0.012	99.984
PC10	0.001	0.006	99.990
PC11	0	0.006	99.996
PC12	0	0.002	99.998
PC13	0	0.001	99.999
PC14	0	0.001	1
PC15	0	0	1
PC16	0	0	1

Table 2. The SSA-related parameters in SSA-SVM.

Parameter	d	lb	ub	p	MaxT	ST	FD	JD
Value	2	$[2^{- 2}, 2^{- 5}]$	$[2^{8}, 2^{5}]$	50	100	0.6	90%	10%

Table 3. Related parameters of the comparative test.

Method	Related Parameters
RF	trees = 500, mtry = 4
ELM	the number of hidden layer neurons is 100
BP-NN	the number of neurons in the hidden layer is 10 the activation function is Sigmoid, three layers

Table 4. The number of feature subset samples.

Data	Single Gas		Mixed Gas
Data	CH4	C2H4	CH4	C2H4
Train	9010	8589	7637	7637
Test	1002	955	849	849

Table 5. SSA-related parameters in SSA-SVR.

Parameter	d	lb	ub	p	MaxT	ST	FD	JD
Value	2	$[2^{- 5}, 2^{- 5}]$	$[2^{10}, 2^{10}]$	50	100	0.8	40%	60%

Table 6. The best parameter combination.

Parameter	Single Gas		Mixed Gas
Parameter	CH4	C2H4	CH4	C2H4
q	66.6915	5.9072	14.8780	11.9311
t	7.8229	66.3515	13.8491	32.6786

Table 7. The parameters related to the GWO optimization algorithm.

Parameter	SA	Maxt	dim	lb	ub
Value	50	100	2	$[2^{- 5}, 2^{- 5}]$	$[2^{10}, 2^{10}]$

Table 8. The parameters related to the GA optimization algorithm.

Parameter	Nind	Maxgen	ggap	px	pm	select
Value	50	100	0.9	0.7	0.1	rws

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, H.; Han, Y. A New Mixed-Gas-Detection Method Based on a Support Vector Machine Optimized by a Sparrow Search Algorithm. Sensors 2022, 22, 8977. https://doi.org/10.3390/s22228977

AMA Style

Zhang H, Han Y. A New Mixed-Gas-Detection Method Based on a Support Vector Machine Optimized by a Sparrow Search Algorithm. Sensors. 2022; 22(22):8977. https://doi.org/10.3390/s22228977

Chicago/Turabian Style

Zhang, Haitao, and Yaozhen Han. 2022. "A New Mixed-Gas-Detection Method Based on a Support Vector Machine Optimized by a Sparrow Search Algorithm" Sensors 22, no. 22: 8977. https://doi.org/10.3390/s22228977

APA Style

Zhang, H., & Han, Y. (2022). A New Mixed-Gas-Detection Method Based on a Support Vector Machine Optimized by a Sparrow Search Algorithm. Sensors, 22(22), 8977. https://doi.org/10.3390/s22228977

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A New Mixed-Gas-Detection Method Based on a Support Vector Machine Optimized by a Sparrow Search Algorithm

Abstract

1. Introduction

2. Mixed-Gas-Detection Methods

2.1. Mixed-Gas Classification and Gas-Concentration Prediction Based on SVM

2.1.1. Mixed-Gas Classification

2.1.2. Mixed-Gas-Concentration Prediction

2.2. Sparrow Search Algorithm

3. Experimental Results and Analysis

3.1. Data Preparation

3.2. Mixed-Gas Identification

3.2.1. Feature Extraction

3.2.2. Classification Results for Mixed Gases

3.3. Prediction of Mixed-Gas Concentration

3.3.1. Data Processing

3.3.2. Mixed-Gas-Concentration Prediction Results

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI