Combustible Gas Classification Modeling using Support Vector Machine and Pairing Plot Scheme

Combustible gases, such as CH4 and CO, directly or indirectly affect the human body. Thus, leakage detection of combustible gases is essential for various industrial sites and daily life. Many types of gas sensors are used to identify these combustible gases, but since gas sensors generally have low selectivity among gases, coupling issues often arise which adversely affect gas detection accuracy. To solve this problem, we built a decoupling algorithm with different gas sensors using a machine learning algorithm. Commercially available semiconductor sensors were employed to detect CH4 and CO, and then support vector machine (SVM) applied as a supervised learning algorithm for gas classification. We also introduced a pairing plot scheme to more effectively classify gas type. The proposed model classified CH4 and CO gases 100% correctly at all levels above the minimum concentration the gas sensors could detect. Consequently, SVM with pairing plot is a memory efficient and promising method for more accurate gas classification.


Introduction
The need for the detection of various gases in industrial and public areas has been continuously increasing as environment, health, and safety issues arise. Combustible gas detection is the most important due to the risk of fire and explosion [1][2][3][4]. Various gas sensor types have been used to detect combustible gases in the atmosphere including electrochemical, semiconductor, photoelectric, and MEMS sensors [5,6]. Semiconductor gas sensors offer many advantages, including low cost, small size, wide range of detectable gases, fast response time, and high sensitivity to combustible gases. However, high and broad sensitivity leads to relatively low selectivity and, consequently, to coupling problems where the sensor reacts to another gas in duplicate or cross-response. Sensor response can be greatly degraded by coupling. Considerable research and development efforts have been focused on physical parameters, such as materials, sensor structure, and sensor driving conditions, etc., to solve this problem, but these approaches have not yet achieved a technical level of commercialization. Therefore, a gas classification algorithm to compensate for the coupling problem may be a more viable solution. Consequently, many studies have considered gas classification models incorporating various machine learning methods [7][8][9][10][11][12][13].
In this study, we constructed a decoupling algorithm with two different SnO 2 semiconductor gas sensors based on support vector machine (SVM) to classify CH 4 and CO as representative combustible gases. We also introduced a new pairing plot scheme in the gas classification algorithm to obtain gas detection signal behavior patterns that could be classified into two classes by SVM. An experimental calibrating gas environment was set up and gas sensing experiments were conducted under specific gas injection conditions. After data acquisition, first data selection (FDS) was applied to include only meaningful data in the classification model, and then behavior patterns for each gas were analyzed using pairing plots. Gas sensor responses showed distinguishable patterns. Subsequently, second data selection (SDS) was performed to reduce computational costs. Finally, we built a gas classification model based on non-linear SVM and verified reliability for the final model using a confusion matrix.

Materials and Methods
We designed the experimental setup to provide a controllable gas environment, as shown in Figure 1a. The setup included a gas chamber connected with gas cylinders, data acquisition equipment (DAQ) for gas sensor control and measurement, digital multimeter (DMM) to verify electrical signals in the circuit, source measure unit (SMU) for specific voltage supply to the circuit, mass flow controller (MFC) for accurate CH 4 and CO flow control, and a computer to control these components and run gas classification algorithms. were analyzed using pairing plots. Gas sensor responses showed distinguishable patterns. Subsequently, second data selection (SDS) was performed to reduce computational costs. Finally, we built a gas classification model based on non-linear SVM and verified reliability for the final model using a confusion matrix.

Materials and Methods
We designed the experimental setup to provide a controllable gas environment, as shown in Figure 1a. The setup included a gas chamber connected with gas cylinders, data acquisition equipment (DAQ) for gas sensor control and measurement, digital multimeter (DMM) to verify electrical signals in the circuit, source measure unit (SMU) for specific voltage supply to the circuit, mass flow controller (MFC) for accurate CH4 and CO flow control, and a computer to control these components and run gas classification algorithms.  We employed commercially available MQ4 and MQ7 sensors [14,15] (Zhengzhou Winsen Electronics Technology Corporation, Zhengzhou, China) to detect CH4 and CO, respectively, as shown in Figure 1a. These are SnO2-based n-type semiconductor sensors [16][17][18][19] which operate based on reactions with combustible gases around the SnO2 surface. When the sensor is heated up, oxygen is actively adsorbed on the surface, taking electrons from the SnO2 surface, forming an electron depletion region beneath the surface. When CH4 and CO gases are present around SnO2 with sufficient energy, they react with adsorbed oxygen atoms, subsequently releasing electrons to SnO2 We employed commercially available MQ4 and MQ7 sensors [14,15] (Zhengzhou Winsen Electronics Technology Corporation, Zhengzhou, China) to detect CH 4 and CO, respectively, as shown in Figure 1a. These are SnO 2 -based n-type semiconductor sensors [16][17][18][19] which operate based on reactions with combustible gases around the SnO 2 surface. When the sensor is heated up, oxygen is actively adsorbed on the surface, taking electrons from the SnO 2 surface, forming an electron depletion region beneath the surface. When CH 4 and CO gases are present around SnO 2 with sufficient energy, they react with adsorbed oxygen atoms, subsequently releasing electrons to SnO 2 and, hence, reducing sensor resistance. Therefore, a load resistor is required in the data collection circuit, and voltage drops across the load resistor increase as the sensor resistance reduces due to the gas interaction. Thus, we collected load resistor voltages as gas detection signals. The load resistance was set 10 kΩ to obtain high sensing resolution which can be determined by following equations: where V r is the output signal range, V CC is the operating voltage, R L is the load resistance, R s_max is the maximum sensor resistance, and R s_min is the minimum sensor resistance.
Next, maximum V r in Equation (2) can be obtained by calculating only the minimum value of R s_max ×R s_min R L + R L because V CC , R s_min , and R s_max are constant.
As shown in Equations (3) and (4), by using arithmetic-geometric mean inequality, the minimum value of R s_max ×R s_min R L + R L is calculated and R L is decided. Moreover, the operating temperature also affects the sensors' performance [20][21][22]. Practically, the best temperatures for CH 4 and CO to be adsorbed on the SnO 2 surface are very different, presenting less cross-selectivity with respect to other gases. However, even if sensors operate at the best operating temperature for each gas, the cross-selectivity issue still cannot be fully ignored. This study employed the selectivity differences among the two types of sensor for decoupling. Thus, the operating temperature of each sensor should be constant by fixing an operating voltage of 5 V (V cc ) and the heating coil resistance of each gas sensor. Figure 1b shows that the sensor circuit comprised three cross-arranged MQ4 and MQ7 sensors for effective gas detection.
In this study, we conducted the gas detection experiments for a single gas environment. More specifically, a situation was assumed to identify whether it was CH 4 or CO when multiple semiconductor gas sensors were employed. Gas injection experiments commenced with aging time to heat and, hence, stabilize the sensors. After sufficient aging time, gas was injected at specific rates (standard cubic centimeter per minute (sccm)) for 20 s. Injection then stopped and a 5 min reaction time was allowed to ensure the gas sensors fully reacted. The same cycle was repeated with increasing gas levels until the target gas concentration was attained. The experiments were carried out under ambient atmosphere, i.e., air for both CH 4 and CO gas detections since the metal oxide semiconductor sensors are not operational without oxygen. Moreover, N 2 gas was only employed to purge and remove CH 4 and CO gases remaining in the gas lines. After conducting each gas experiment, we evacuated CH 4 or CO gas inside the chamber to initialize the experimental environment. Table 1 shows gas injection conditions for CH 4 and CO gas detection experiments.
The experimental setup was carried out for each gas concentration within a range that could be fatal to humans by assuming CH 4 or CO gas was leaked at actual industrial sites or public places. Thus, the target gas concentrations were different, because the human hazardous concentration of CH 4 gas is higher than the CO gas according to the dangerous concentration criteria of the Korea Gas Safety Corporation and Korea Environment Corporation. We designed the SVM for classifying CH 4 and CO gases using MATLAB ® . In general, SVM is a machine learning method classifying two or more data classes [23][24][25][26][27]. This study built a classification model with non-linear SVM to classify curved behavioral patterns. In the SVM algorithm, kernel function helps modeling for non-linear hyperplane with reduced computational costs. Thus, we employed a Gaussian RBF kernel function, which is one of the generally used and high-performance functions, which can be expressed as; where X i and Y i are data set vectors corresponding to CH 4 and CO gas, respectively; and γ is a parameter controlling the deviation of the Gaussian function [28,29]. After gas classification modeling, we verified the classification model's reliability using a confusion matrix with test data sets extracted from a distinct gas detection experiment [30][31][32]. A confusion matrix is a visualization method for classification of model performance and reliability. The model's reliability verification using the confusion matrix proceeded with new data sets that did not belong to the training data. The confusion matrix visualizes the matches between the predicted class and the true class. We also used several dummy data sets to double check classification model reliability.

Results
As shown in Figure 2, the overall procedure of the gas classification consisted of gas sensing experiments, gas classification modeling, and two-step verifications. Moreover, gas classification modeling involves four steps: first data selection (FDS), pairing plot scheme, second data selection (SDS), and SVM. The experimental setup was carried out for each gas concentration within a range that could be fatal to humans by assuming CH4 or CO gas was leaked at actual industrial sites or public places. Thus, the target gas concentrations were different, because the human hazardous concentration of CH4 gas is higher than the CO gas according to the dangerous concentration criteria of the Korea Gas Safety Corporation and Korea Environment Corporation.
We designed the SVM for classifying CH4 and CO gases using MATLAB Ⓡ . In general, SVM is a machine learning method classifying two or more data classes [23][24][25][26][27]. This study built a classification model with non-linear SVM to classify curved behavioral patterns. In the SVM algorithm, kernel function helps modeling for non-linear hyperplane with reduced computational costs. Thus, we employed a Gaussian RBF kernel function, which is one of the generally used and high-performance functions, which can be expressed as; where Xi and Yi are data set vectors corresponding to CH4 and CO gas, respectively; and γ is a parameter controlling the deviation of the Gaussian function [28,29]. After gas classification modeling, we verified the classification model's reliability using a confusion matrix with test data sets extracted from a distinct gas detection experiment [30][31][32]. A confusion matrix is a visualization method for classification of model performance and reliability. The model's reliability verification using the confusion matrix proceeded with new data sets that did not belong to the training data. The confusion matrix visualizes the matches between the predicted class and the true class. We also used several dummy data sets to double check classification model reliability.

Results
As shown in Figure 2, the overall procedure of the gas classification consisted of gas sensing experiments, gas classification modeling, and two-step verifications. Moreover, gas classification modeling involves four steps: first data selection (FDS), pairing plot scheme, second data selection (SDS), and SVM.  Figure 3 shows output voltages for the load resistor resulting from MQ4 and MQ7 sensor reactions with respect to gas concentration. These output voltages were logged every 2 s by DAQ. Therefore, we could confirm the responses of each gas sensor by observing the voltage changes from the load resistor.  Figure 3 shows output voltages for the load resistor resulting from MQ4 and MQ7 sensor reactions with respect to gas concentration. These output voltages were logged every 2 s by DAQ. Therefore, we could confirm the responses of each gas sensor by observing the voltage changes from the load resistor. Although the MQ4 sensor was specific for detecting CH4 gas, it also reacted to CO gas with a similar issue arising for the MQ7 sensor. Thus, both sensors exhibit low selectivity and, hence, coupling problems for gas signals. Therefore, it was not possible to clearly identify CH4 or CO gas levels from either sensor alone. Even using both gas sensors, it was difficult to classify gas type from output voltages alone. Therefore, we proposed SVM with a pairing plot method.

Pairing Plot Scheme for Support Vector Machine
There were concentration ranges where the sensing signals were indistinguishable (Figure 3, blue marked area) due to the gas sensors' physical limitations. Thus, we needed to select meaningful data before pairing the data, i.e., FDS. It was necessary to avoid confusion about the initial response of the sensors due to the noise voltages under ambient atmosphere. To extract meaningful data used for machine learning, the SnO2 gas sensor signals should be distinguishable from the initial detection value (Vinitial) at which sensors start detecting gases. The noise voltage difference (Vnoise.diff) is the difference between maximum and minimum noise voltage values before gas injection. Vinitial should be at least two times larger than Vnoise.diff. Based on these criteria, we specified indistinguishable sensing Although the MQ4 sensor was specific for detecting CH 4 gas, it also reacted to CO gas with a similar issue arising for the MQ7 sensor. Thus, both sensors exhibit low selectivity and, hence, coupling problems for gas signals. Therefore, it was not possible to clearly identify CH 4 or CO gas levels from either sensor alone. Even using both gas sensors, it was difficult to classify gas type from output voltages alone. Therefore, we proposed SVM with a pairing plot method.

Pairing Plot Scheme for Support Vector Machine
There were concentration ranges where the sensing signals were indistinguishable (Figure 3, blue marked area) due to the gas sensors' physical limitations. Thus, we needed to select meaningful data before pairing the data, i.e., FDS. It was necessary to avoid confusion about the initial response of the sensors due to the noise voltages under ambient atmosphere. To extract meaningful data used for machine learning, the SnO 2 gas sensor signals should be distinguishable from the initial detection value (V initial ) at which sensors start detecting gases. The noise voltage difference (V noise.diff ) is the difference between maximum and minimum noise voltage values before gas injection. V initial should be at least two times larger than V noise.diff . Based on these criteria, we specified indistinguishable sensing value ranges as shown in Figure 3 (blue marked area). We set the FDS criteria based on the V initial for each gas sensor, defined by ambient atmosphere voltage (V ambient ) and V noise.diff . In short, the V initial was determined by the minimum detectable voltage (V det ) in the following equations: where V ambient is average output voltage in ambient atmosphere, and V noise.diff is the difference between maximum and minimum noise voltage in ambient atmosphere. Only output voltages above V initial were selected for the pairing plot. The selected data were plotted in the form of (MQ4, MQ7) considering all possible pairing cases in each gas detection experiment. For example, since there were three MQ4 and three MQ7 sensors, nine pairing (MQ4, MQ7) cases were extracted from each experiment. Figure 4a shows the pairing plots for the FDS applied gas detection experiment. The CH 4 and CO gas had distinguishable behavior patterns that enabled them to be clearly classified. value ranges as shown in Figure 3 (blue marked area). We set the FDS criteria based on the Vinitial for each gas sensor, defined by ambient atmosphere voltage (Vambient) and Vnoise.diff. In short, the Vinitial was determined by the minimum detectable voltage (Vdet) in the following equations: where Vambient is average output voltage in ambient atmosphere, and Vnoise.diff is the difference between maximum and minimum noise voltage in ambient atmosphere. Only output voltages above Vinitial were selected for the pairing plot. The selected data were plotted in the form of (MQ4, MQ7) considering all possible pairing cases in each gas detection experiment. For example, since there were three MQ4 and three MQ7 sensors, nine pairing (MQ4, MQ7) cases were extracted from each experiment. Figure 4a shows the pairing plots for the FDS applied gas detection experiment. The CH4 and CO gas had distinguishable behavior patterns that enabled them to be clearly classified.    In SVM training, determining hyperplane was performed using only boundary data for each class. Thus, for the data sets selected from FDS, additional data selection was performed using the concentration in each injection cycle. This provided significant memory and computational efficiency for the learning process. Figure 4a inset shows that the number of data in the particular gas concentration can be reduced to two data points through SDS by pairing the maximum MQ4 value with the corresponding MQ7 value and the maximum MQ7 value with the corresponding MQ4 value, i.e., (MQ4_max, MQ7) and (MQ4, MQ7_max), respectively, providing the pairing plot with the minimum number of essential data (Figure 4b). We subsequently applied non-linear SVM with these paired data sets.

Gas Classification Model Using Non-Linear Support Vector Machine
Selected data sets were randomly divided into training and testing data sets at a 4:1 ratio. Feature selection for the data sets was decided by the output voltage, since all data sets only included the output voltage in this study. The K-Fold cross validation was used to avoid the overfitting problem for the training data sets [33][34][35]. The most important SVM step is to find the hyperparameters defining the optimal hyperplane. We used the Gaussian RBF kernel method for the non-linear SVM to define the hyperplane and, hence, establish the classification model. Subsequent verification with the testing data sets confirmed 100% classification accuracy. Figure 5 shows the visualization of elements for the defined hyperplane, support vectors, and all data sets. In SVM training, determining hyperplane was performed using only boundary data for each class. Thus, for the data sets selected from FDS, additional data selection was performed using the concentration in each injection cycle. This provided significant memory and computational efficiency for the learning process. Figure 4a inset shows that the number of data in the particular gas concentration can be reduced to two data points through SDS by pairing the maximum MQ4 value with the corresponding MQ7 value and the maximum MQ7 value with the corresponding MQ4 value, i.e., (MQ4_max, MQ7) and (MQ4, MQ7_max), respectively, providing the pairing plot with the minimum number of essential data (Figure 4b). We subsequently applied non-linear SVM with these paired data sets.

Gas Classification Model Using Non-Linear Support Vector Machine
Selected data sets were randomly divided into training and testing data sets at a 4:1 ratio. Feature selection for the data sets was decided by the output voltage, since all data sets only included the output voltage in this study. The K-Fold cross validation was used to avoid the overfitting problem for the training data sets [34][35]. The most important SVM step is to find the hyperparameters defining the optimal hyperplane. We used the Gaussian RBF kernel method for the non-linear SVM to define the hyperplane and, hence, establish the classification model. Subsequent verification with the testing data sets confirmed 100% classification accuracy. Figure 5 shows the visualization of elements for the defined hyperplane, support vectors, and all data sets. To verify classification model reliability, we extracted paired data sets for a new CH4 and CO gas detection experiment. The number of data sets for CH4 and CO were 102 and 126, respectively. Figure 6a shows the gas classification confusion matrix for the new paired data sets, confirming 100% classification accuracy for each gas. Moreover, we intentionally created 10 paired data sets with incorrect values for each gas to double check the model's reliability. As shown in Figure 6b, the confusion matrix results for classification by adding these dummy data sets were visualized. Consequently, the reliability of the non-linear SVM gas classification model was verified again by fully classifying all 20 incorrect paired data sets. To verify classification model reliability, we extracted paired data sets for a new CH 4 and CO gas detection experiment. The number of data sets for CH 4 and CO were 102 and 126, respectively. Figure 6a shows the gas classification confusion matrix for the new paired data sets, confirming 100% classification accuracy for each gas. Moreover, we intentionally created 10 paired data sets with incorrect values for each gas to double check the model's reliability. As shown in Figure 6b, the confusion matrix results for classification by adding these dummy data sets were visualized.
Consequently, the reliability of the non-linear SVM gas classification model was verified again by fully classifying all 20 incorrect paired data sets.

Conclusions
Although combustible gas detection in industrial and public areas is essential, it is difficult to accurately identify gases due to the inferior semiconductor gas sensor performance. In particular, selectivity issues cause significant coupling problems among sensing signals, making accurate gas identification difficult. Thus, it is necessary to introduce an algorithmic approach to compensate for this issue. In this work, we proposed a classification algorithm based on support vector machine by introducing a pairing plot technique. Furthermore, we achieved the memory efficient gas classification model using the data selection method. Model reliability was verified by classifying CH4 and CO gases 100% accuracy through additional tests with confusion matrix. Thus, the proposed method classified CH4 and CO gases simultaneously with 100% accuracy even in the presence of gas sensor selectivity issues. The proposed approach is not specific to semiconductor gas sensors and could also be applied to most or all other sensor types which have sensing signal coupling problems. Therefore, modeling with non-linear support vector machine and pairing plot technique could be an effective way to identify gases.

Conclusions
Although combustible gas detection in industrial and public areas is essential, it is difficult to accurately identify gases due to the inferior semiconductor gas sensor performance. In particular, selectivity issues cause significant coupling problems among sensing signals, making accurate gas identification difficult. Thus, it is necessary to introduce an algorithmic approach to compensate for this issue. In this work, we proposed a classification algorithm based on support vector machine by introducing a pairing plot technique. Furthermore, we achieved the memory efficient gas classification model using the data selection method. Model reliability was verified by classifying CH 4 and CO gases 100% accuracy through additional tests with confusion matrix. Thus, the proposed method classified CH 4 and CO gases simultaneously with 100% accuracy even in the presence of gas sensor selectivity issues. The proposed approach is not specific to semiconductor gas sensors and could also be applied to most or all other sensor types which have sensing signal coupling problems. Therefore, modeling with non-linear support vector machine and pairing plot technique could be an effective way to identify gases.