1. Introduction
Engineering innovation refers to the solving the social and industrial problems via use of the innovative engineering technologies and approaches. With the rise of industrialization and bridging of socio-economic gap between different strata of society, use of chemicals has been on rise. Assistive technology is the technological domain consisting of systems having either software or hardware alone or both designed to enhance and maintain human capabilities in situations that require special attention. Different solutions in assistive technology range from unmanned vehicles-based surveillance applications to healthcare applications like automated wheelchairs, pose estimates, etc. In this work, we propose an assistive technology solution for a very relevant problem of gas detection and identification for domestic, industrial, and outside environments.
Industrial hazards can cause chemical and/or radioactive damage to the surrounding environment. With the rapid developments in the industrialization and automated chemical plants, gas leakage is a common issue. Explosions, fires, spills, leaks, and waste emissions are some of the consequences of industrial accidents [
1,
2]. Residential cooking and carelessness in disposing of wastes generate unnecessary fumes, are the significant reasons of fume leakages. An article presented in the media revealed that burning wood, biomass, and dung led to 326,000 of the estimated 645,000 premature deaths from outdoor air pollution, which constitutes about 50% of the total deaths due to outdoor pollution [
3]. Harmful gases such as Liquid Petroleum Gas (LPG), Compressed Natural Gas (CNG), Methane, Propane, and other flammable and toxic gases, if not used carefully and adequately, may lead to accidents and, in some cases, disastrous consequences. A gas leak is an unintended crack, hole, or porosity in a joint or machinery, which excludes different fluids and gases, allowing the escape of a closed medium. In any plant or industrial setup, a gas leak test is a quality control step that must be performed before a device is set up. As a precautionary measure, gas sensors are set up near the leakage prone equipment. However, the sensors are not able to detect gas in a mixed gas environment. Sensors are also prone to and limited to their operating characteristics.
Human intervention is not always possible in leakage situations, primarily due to the hazardous nature of gases. Smoke emissions during leakages give rise to unclear vision problems, fire and smoke leakages demand the immediate evacuation of persons with mobile disability. Breathing these dangerous fumes may lead to dizziness, unconsciousness, and mass disaster if not treated properly. In the case of gas leakage in chemical factories, it can cause explosions. Therefore, detecting gas leakages and explosions within a short period is of utmost importance. Early detection of gas leakage with higher accuracy and reliability using the state-of-art techniques is an essentially required assistive technology solution. Detecting a particular gas or different gases in the mixture of gases is also challenging and requires technological attention. Existing methods of mixed gas detection methods include a way of using a Colorimetric Tape [
4]. In this method, a dry material of tape reacts with the gas being emitted and leaves a special stain for different gases under consideration. The more the gas concentration, the darker the stain on the tape [
5]. Gas Chromatography is another methodology that separates mixtures of gases based on differences in boiling points, polarity, and vapor pressure [
6]. This method has high separation efficiency but requires a large apparatus and workforce to operate [
7]. Other than the chemical methods of gas detection and the advancements in interdisciplinary technologies, various Artificial Intelligence (AI) based techniques are also reported in the literature. Different machine learning algorithms such as Logistic Regression, Random Forest, and Support Vector Machines (SVM) are proposed in the literature for gas detection [
8]. However, these methods require multiple hyperparameters tuning and statistical calculation for accurate and robust gas classification. It increases the processing time, the power used, and computations [
9]. Adbul Majeed [
10] provided a methodology that selected top weighted features from complex datasets for improving the time complexity as well as accuracy of the machine learning models.
Khalaf [
11] proposed an electronic nose system of classification and concentration estimation that uses least square regression. An array of eight different gas sensors is used to identify gases’ concentration in [
12]. In this work, Deep convolutional neural networks are employed for the application of gas classification. It was shown that the deep learning algorithms can learn features from the measurements from gas sensors in a better way and can achieve higher classification accuracy. Bilgera et al. [
13] presented a fusion of different AI models for Gas Source Localization to determine the point of leakage in a ground using six various gas sensors. Pan et al. [
14] presented a deep learning approach consisting of a hybrid framework comprised of the Convolutional Neural Network (CNN) and Long short-term memory (LSTM) to extract sequential information from transient response curves. Fast Gas Recognition algorithm based on hybrid CNN and Recurrent Neural Network (RNN) is presented in [
9]. It was shown that the fusion model outperforms Support Vector Machine (SVM), Random Forest,
-nearest neighbors. Liu et al. [
15] described two network structures, Deep Belief Networks and Stacked Autoencoders, to extract abstract gas features from E-nose. Then the Softmax classifiers are constructed using these features. These reported approaches use sequential methods based on the gas sensor data directly.
However, there are several issues with using only a gas sensors-based detection and identification approach. The primary reason is that the proportion of gas in air is very low in some cases, and the gases are not identifiable with standard gas sensors. This generates false negatives or false positives and hence hampers the detection accuracy of the system. Additionally, low-cost sensors are typically less sensitive and may not provide accurate measurements. Another method observed for gas detection is the use of thermal imaging. When a gas is leaked, the surrounding temperature increases compared with the normal conditions. The increase in temperature can be characterized and analyzed by thermal imaging cameras. This concept can be utilized to detect leakages [
16,
17]. The system for Methane and Ethane gas leak detection using a thermal camera is proposed in [
18]. Jadin and Ghazali [
19] presented a method for detecting gas leak using infrared image analysis. The system was designed by the technique of image processing, which are data acquisition, image preprocessing, image processing, feature extraction, and classification.
Single modality sensing methods may not achieve the system’s required accuracy and robustness as such systems are limited to sensor characteristics. Individual sensors are limited to temporal and spatial characteristics [
20]. A thermal imaging system can identify the presence of gas but fails to identify its type. Hence, a concept of multimodal/multi-sensor data fusion came into existence. Data fusion combines information from multiple sources to obtain the better output compared to any individual modality taken alone [
20]. Kalman filter proposed in [
21] is one of the most widely used sensor fusion algorithms in robotics applications like position and orientation estimation, guided vehicles, etc. However, it requires the input data from two sensors in a similar format. In the situation under consideration, the gas sensor data is a scalar value whereas input from thermal image is a two-dimensional vector. Hence, Kalman filter cannot be used in this application of fusion 1D and 2D vectors. With the advancement and flexibility of AI frameworks, a combination of different AI algorithms can be used to extract important features in an efficient and improved manner and improve classification accuracy [
22,
23,
24]. This paper presented an AI-based methodology that employs the Deep Learning (DL) frameworks for performing a fusion of multimodality data from multiple sources to detect and classify the gasses. The system is equipped with various gas detecting sensors, and a thermal imaging camera and sensor fusion is performed using the DL algorithms.
The focus of the proposed method is to extract features using two different deep learning paradigms and apply an early fusion method to fuse these features to train a classifier for detecting and subsequently identifying the gas. The proposed method can be used to detect a particular gas in a mixed environment of gases. It does not require a manual operator to operate and is a more robust solution as it incorporates the measurements from multiple gas sensors and thermal imaging cameras. In case one modality is generating false negatives, the fusion with other modality can help identify the correct outcome more effectively. On the other hand, if one modality is giving false positives, the other modality helps to bring down the combined accuracy of fused output, thereby providing accurate predictions.
The main contributions of the paper can be listed as follows:
an innovative multimodal AI-based framework for the fusion of two separate modalities for robust and more reliable gas detection1 is proposed and presented
the use of early fusion of the outputs of deep learning architectures CNN and LSTM is demonstrated for Gas Detection and identification of the leaked gases
In summary, the main contributions of this work are twofold. Firstly, multimodal AI-based framework for the fusion for gas detection and identification is presented in this paper. This framework is faster, easier to deply and generic. Secondly, the use of early fusion of the of outputs from CNN and LSTM is demonstrated for Gas Detection and identification of the leaked gases. The vanilla architectures are considered for the implementation of CNN and LSTM frameworks. Having advanced frameworks like AlexNet [
25], ResNet [
26] will add to the computational complexity of the system due to very deep architectural frameworks. the use of CNN facilitates faster processing and is also suitable for the deployment in real-time systems. The results show that false positives and negatives in the fused output are lower than the individual modalities. The experimental setup is designed to collect the real-time data using a gas sensor array and thermal camera, to preprocess the collected data and validate the developed framework. Our approach is highly generic and can be extended to a number of other applications involving multiple sensors and their data fusion. Innovation lies in the development of state-of-the-art AI techniques for solving a highly relevant social and industrial issue of identifying gas leakage and controlling it in time to reduce loss of property and human lives in extreme cases.
The paper is organized as:
Section 2 provides a brief overview of AI-based multimodal fusion methods. The frameworks for data collection and preprocessing along with the proposed system architecture are presented in
Section 3.
Section 4 provides a detailed discussion on obtained results, and
Section 5 concludes the paper by mentioning future scope.
3. Framework for System Design and Experimentation
The system consists of gas sensors and a thermal camera for identifying the gas concentrations and thermal images of the type of gases. The block diagram indicating the data collection process is presented in
Figure 1.
Figure 2 provides the structure and steps followed for training the network and
Figure 3 indicates the testing phase. The detailed description for the processes indicated in these figures is provided in further sections.
3.1. Gas Sensors
Gas Sensors detect the presence of gas by converting the chemical information to electrical information. Metal Oxide Semiconductor (MQ) gas sensors are appropriate as they are compact, have fast response speed, and long service life [
34,
35]. Each sensor consists of a heating element that produces the analog output voltage proportional to the gas concentration. The performance of Gas sensor depends on various sensor characteristics like sensitivity, selectivity, detection limit, response time, etc. [
36]. Different gas sensors namely MQ2, MQ3, MQ5, MQ6, MQ7, MQ8 and MQ135 are used in the present work. These sensors are sensitive to various gases like Methane, Butane, LPG, Alcohol, Smoke, Natural Gas, Carbon Monoxide, Air Quality etc. (
Table 1).
3.2. Thermal Camera
Thermal camera is a device that measures the temperature variations using the infrared light. Every pixel on a camera image sensor is an infrared temperature sensor and gets a temperature of all points at the same time. The images are generated according to temperature format and displays images in the form of RGB. Unlike normal imaging cameras, thermal camera is not constrained by dark surroundings and can work with any environment regardless of its shape and texture [
37]. Seek Thermal Camera, used in this work, is a compact thermal camera consisting of 206 × 156 Thermal Sensor, a 36-degree field of view, measurement of temperature range −40 ℃ to 330 ℃, framerate <9 Hz, and 32,136 Thermal Pixels to be able to see a thermal image easily.
The gas sensors and thermal camera are used simultaneously to collect data for training and testing of the developed fusion model. The next part of the paper describes the data collection and its preprocessing in detail.
3.3. Data Collection and Preprocessing
To the best of the authors’ knowledge, no data consisting of thermal images and gas sensors for the representation of gas has yet been collected and available in the open domain for direct use. Hence, in this work, data of the sensors and thermal imaging camera is collected manually for model training and validation purposes.
The experimental data is collected through an array of 7 gas sensors as well as using the Seek Thermal Camera. The gas sensors were placed at 1 mm apart.
In the experimentation, two specific gas sources are identified, namely, the gases originating from perfumes and gases emitted by incense sticks. The experimentation setup and workflow for the data collection is shown in
Figure 4.
Sensor readings and thermal images are recorded for each of these two gas sources were collected at a time interval of 2 s continuously for one and a half hours. In this time, gas was sprayed with an interval of 15 s for the first 30 min, with 30 s intervals for the next 30 min and 45 s intervals for the next 30 min. A few representative samples for three classes (no gas, perfume, and smoke) with the thermal image and corresponding gas array data are shown in
Table 2. The sensors provide the analog voltage equivalent to the gas concentration. The analog value is converted to the 10-bit digital value using an analog to digital converter. These 10 bits of digital values are shown for representation purposes in
Table 1. Each sensor is sensitive to more than one gas, and hence sensors are calibrated appropriately. A data set in total consists of 6400 samples where 1600 samples belong to perfume, 1600 samples belong smoke, 1600 samples belong to mixture of perfume and smoke and 1600 samples belong to neutral environment (No gas).
3.4. Data Preprocessing
Deep learning models require a large amount of training data for appropriate and efficient operation. Due to the availability of limited data, data augmentation techniques are used, which helped to increase the dataset size. The diversity of limited thermal images is increased using data augmentation techniques such as rescaling and resizing. The
Figure 5 shows the ground truth image (
Figure 5a) and all images generated using rotation and tilting operations (
Figure 5b).
3.5. Feature Extraction from Thermal Images Using CNN
A total of 6400 thermal images and corresponding 6400 labels (No Gas, Alcohol, Smoke, mixture of Alcohol and Smoke) are considered in this experimentation. A train-test split of 80:20 was done such that out of total images, 4096 are used for training and 1024 samples are used for Validation whereas 1280 images are used for testing purposes.
In the process of development of the CNN model, multiple experimentations were carried out with different architectures, and various hyperparameter tuning approaches were applied. It was found that three convolution-pooling layer architecture followed by a dropout layer (dropout of 0.25) is providing the best accuracy and recall. Model is optimized with different optimizers and the best performing optimizer is selected for further processes. An ADAM optimizer with a 0.001 learning rate with a decay of 1 × 10 and L1-L2 regularization (0.005) in the first two Conv-Max Pool pairs are applied to avoid overfitting of the model. The model is trained for 300 epochs, which resulted in the testing accuracy of 93%.
3.6. Feature Extraction from Gas Sensor Measurements Using LSTM
Sensor measurements are sequential and hence sequence model namely LSTM Network is used for extracting the features from these measurements. The architecture of the LSTM model consists of the input layer followed by a single LSTM layer with 5 cells. LSTM layer is regularized with L2 regularization. The LSTM layers are followed by the classifier layer with the Softmax activation function.
This LSTM network was trained on different optimizers with a fixed learning rate of 0.001 to find the best optimizer. Through the trial and error, it was observed that Adam optimizer was fitting to the model the best and also converging quickly. Hence Adam optimizer is selected for analysis and experimentation work. It can be observed that Adam optimizer fits and converges quickly. The model is trained for 300 epochs, and we obtained the testing accuracy of 83%.
3.7. Multimodal Fusion of Image and Sequence Data
In this phase of the work, the features extracted from the thermal images and gas sensor measurements fused for accurate decision making. The proposed architectures of the image and sequence data fusion model are presented in
Figure 6 (early fusion) and
Figure 7 (late fusion). The focus of the work was to build a fused classifier that consists of both gas sensor sequence array and thermal images. In the fusion process, LSTM and CNN’s output must be in the same feature space before fusion can be performed.
The fusion model is optimized with an Adam optimizer with a 0.001 learning rate and 1 × 10 decay. Regularization (0.005) is applied for avoiding overfitting of the fusion model. The model is trained for 300 epochs, which resulted in the testing accuracy of 96%.
Late fusion model is also implemented for the fusion of gas sensors array data with the thermal image. Late fusion being the decision level fusion, the predictions of from individual models namely LSTM model and CNN model are obtained individually. Then the Late fusion process is applied in two ways. In first trial, maximum of the predictions from individual results is taken as final fusion value. Hereafter, this is referred as Max fusion. In another trial, arithmetic average of the individual model predictions is considered as final fusion, referred as Average fusion.
The presented models of early and late fusion are implemented and validated with the dataset available. The next section describes the results obtained and comparison between the fusion models.
4. Results and Discussion
The multimodal AI-based fusion model for gas detection and is presented in this work. Two modalities, namely, thermal images and gas sensor measurements, are considered in this work of gas detection. The CNN architecture is applied for extracting features from the thermal images, whereas, LSTM framework is used for extracting features from the sequences of gas sensor measurements. The implementation of the proposed model in done using the Python 3 using Keras framwork on TensorFlow platform. Open source Google Colab GPU is used for training and testing of the proposed model. It is based on Intel Xeon Processor with 13 GB RAM. The CNN model starts converging at around the 20th epoch, whereas LSTM reaches convergence at around the 90th epoch. It was observed that the fused model stabilizes at around the 20th epoch itself. The accuracy of the gas sensor array is comparatively lower since the outcome of one sensor (out of 7 sensors considered) is typically not very accurate due to the mixing of gases in the air. The thermal camera-based model individually performs comparatively better; however, in the air, the thermal signature of gaseous emissions may be generated due to multiple gases or multiple sources of exhausts, and having a gas sensor to validate the type of gas is extremely helpful in identification. It was noticed and observed that the individual models are underperforming compared to the fusion models. In the fusion models, the individual modalities either collaborate or oppose the outcomes of the individual modality, thereby making the system more reliable and accurate. By performing regularization techniques on individual models, namely CNN and LSTM, testing accuracy of 93% with Thermal Images and 82% for Gas Sequences is achieved. However, the early fusion of features from both CNN and LSTM has provided the testing accuracy of 96%, which is greater than individual models accuracies. In the case of late fusion (max fusion and average fusion) the accuracy was observed to be around 96%.
Table 3 shows the individual training and testing accuracy, loss, precision, recall, and F1 scores for all four classes considered in this study. The accuracy comparison for the individual models is shown in
Figure 8. It can be observed that the fusion models outperform the individual models as the predictions in the individual models are based on both the modality data. It can also be noticed that the accuracy of early fusion is slightly higher than the late fusion models as in this case the fusion happens at the feature level which allows the interaction amongst the modalities. The confusion matrices are also plotted for all the frameworks and are shown in
Figure 9.
The fusion model is trained for 300 epochs, and accuracy and loss curves are analyzed and provides better performance than individual models.
It is evident from the confusion matrices that the false positives and false negatives obtained from the fusion models are considerably lower than the individual models. Hence it can be concluded that the fusion models are outperforming the individual models. Also, the higher testing accuracy of the fusion models demonstrates that the resultant fusion system is more robust and reliable than individual models and performs the task of gas identification and classification with superiority. Analytically, false positives and false negatives appear due to various aspects of the model and data. Primary reason could be the mixing of gases to an extent which makes it difficult for the model to clearly classify. The majority of the false predictions are arriving because of moderate probability of predicting a class. A model can be trained rigorously using the more and varied data samples to solve the false prediction due to boundary line probabilities.