Gas Detection and Identification Using Multimodal Artificial Intelligence Based Sensor Fusion

With the rapid industrialization and technological advancements, innovative engineering technologies which are cost effective, faster and easier to implement are essential. One such area of concern is the rising number of accidents happening due to gas leaks at coal mines, chemical industries, home appliances etc. In this paper we propose a novel approach to detect and identify the gaseous emissions using the multimodal AI fusion techniques. Most of the gases and their fumes are colorless, odorless, and tasteless, thereby challenging our normal human senses. Sensing based on a single sensor may not be accurate, and sensor fusion is essential for robust and reliable detection in several real-world applications. We manually collected 6400 gas samples (1600 samples per class for four classes) using two specific sensors: the 7-semiconductor gas sensors array, and a thermal camera. The early fusion method of multimodal AI, is applied The network architecture consists of a feature extraction module for individual modality, which is then fused using a merged layer followed by a dense layer, which provides a single output for identifying the gas. We obtained the testing accuracy of 96% (for fused model) as opposed to individual model accuracies of 82% (based on Gas Sensor data using LSTM) and 93% (based on thermal images data using CNN model). Results demonstrate that the fusion of multiple sensors and modalities outperforms the outcome of a single sensor.


Introduction
Engineering innovation refers to the solving the social and industrial problems via use of the innovative engineering technologies and approaches. With the rise of industrialization and bridging of socio-economic gap between different strata of society, use of chemicals has been on rise. Assistive technology is the technological domain consisting of systems having either software or hardware alone or both designed to enhance and maintain human capabilities in situations that require special attention. Different solutions in assistive technology range from unmanned vehicles-based surveillance applications to healthcare applications like automated wheelchairs, pose estimates, etc. In this work, we propose an assistive technology solution for a very relevant problem of gas detection and identification for domestic, industrial, and outside environments.
Industrial hazards can cause chemical and/or radioactive damage to the surrounding environment. With the rapid developments in the industrialization and automated chemical plants, gas leakage is a common issue. Explosions, fires, spills, leaks, and waste emissions are some of the consequences of industrial accidents [1,2]. Residential cooking and carelessness in disposing of wastes generate unnecessary fumes, are the significant reasons of fume leakages. An article presented in the media revealed that burning wood, biomass, and dung led to 326,000 of the estimated 645,000 premature deaths from outdoor air pollution, which constitutes about 50% of the total deaths due to outdoor pollution [3]. Harmful gases such as Liquid Petroleum Gas (LPG), Compressed Natural Gas (CNG), Methane, Propane, and other flammable and toxic gases, if not used carefully and adequately, may lead to accidents and, in some cases, disastrous consequences. A gas leak is an unintended crack, hole, or porosity in a joint or machinery, which excludes different fluids and gases, allowing the escape of a closed medium. In any plant or industrial setup, a gas leak test is a quality control step that must be performed before a device is set up. As a precautionary measure, gas sensors are set up near the leakage prone equipment. However, the sensors are not able to detect gas in a mixed gas environment. Sensors are also prone to and limited to their operating characteristics. Human intervention is not always possible in leakage situations, primarily due to the hazardous nature of gases. Smoke emissions during leakages give rise to unclear vision problems, fire and smoke leakages demand the immediate evacuation of persons with mobile disability. Breathing these dangerous fumes may lead to dizziness, unconsciousness, and mass disaster if not treated properly. In the case of gas leakage in chemical factories, it can cause explosions. Therefore, detecting gas leakages and explosions within a short period is of utmost importance. Early detection of gas leakage with higher accuracy and reliability using the stateof-art techniques is an essentially required assistive technology solution. Detecting a particular gas or different gases in the mixture of gases is also challenging and requires technological attention. Existing methods of mixed gas detection methods include a way of using a Colorimetric Tape [4]. In this method, a dry material of tape reacts with the gas being emitted and leaves a special stain for different gases under consideration. The more the gas concentration, the darker the stain on the tape [5]. Gas Chromatography is another methodology that separates mixtures of gases based on differences in boiling points, polarity, and vapor pressure [6]. This method has high separation efficiency but requires a large apparatus and workforce to operate [7]. Other than the chemical methods of gas detection and the advancements in interdisciplinary technologies, various Artificial Intelligence (AI) based techniques are also reported in the literature. Different machine learning algorithms such as Logistic Regression, Random Forest, and Support Vector Machines (SVM) are proposed in the literature for gas detection [8]. However, these methods require multiple hyperparameters tuning and statistical calculation for accurate and robust gas classification. It increases the processing time, the power used, and computations [9]. Adbul Majeed [10] provided a methodology that selected top weighted features from complex datasets for improving the time complexity as well as accuracy of the machine learning models.
Khalaf [11] proposed an electronic nose system of classification and concentration estimation that uses least square regression. An array of eight different gas sensors is used to identify gases' concentration in [12]. In this work, Deep convolutional neural networks are employed for the application of gas classification. It was shown that the deep learning algorithms can learn features from the measurements from gas sensors in a better way and can achieve higher classification accuracy. Bilgera et al. [13] presented a fusion of different AI models for Gas Source Localization to determine the point of leakage in a ground using six various gas sensors. Pan et al. [14] presented a deep learning approach consisting of a hybrid framework comprised of the Convolutional Neural Network (CNN) and Long short-term memory (LSTM) to extract sequential information from transient response curves. Fast Gas Recognition algorithm based on hybrid CNN and Recurrent Neural Network (RNN) is presented in [9]. It was shown that the fusion model outperforms Support Vector Machine Then the Softmax classifiers are constructed using these features. These reported approaches use sequential methods based on the gas sensor data directly. However, there are several issues with using only a gas sensors-based detection and identification approach. The primary reason is that the proportion of gas in air is very low in some cases, and the gases are not identifiable with standard gas sensors. This generates false negatives or false positives and hence hampers the detection accuracy of the system. Additionally, low-cost sensors are typically less sensitive and may not provide accurate measurements. Another method observed for gas detection is the use of thermal imaging. When a gas is leaked, the surrounding temperature increases compared with the normal conditions.
The increase in temperature can be characterized and analyzed by thermal imaging cameras.
This concept can be utilized to detect leakages [16,17]. The system for Methane and Ethane gas leak detection using a thermal camera is proposed in [18]. Jadin and Ghazali [19] presented a method for detecting gas leak using infrared image analysis. The system was designed by the technique of image processing, which are data acquisition, image preprocessing, image processing, feature extraction, and classification.
Single modality sensing methods may not achieve the system's required accuracy and robustness as such systems are limited to sensor characteristics. Individual sensors are limited to temporal and spatial characteristics [20]. A thermal imaging system can identify the presence of gas but fails to identify its type. Hence, a concept of multimodal/multi-sensor data fusion came into existence. Data fusion combines information from multiple sources to obtain the better output compared to any individual modality taken alone [20]. Kalman filter proposed in [21] is one of the most widely used sensor fusion algorithms in robotics applications like position and orientation estimation, guided vehicles, etc. However, it requires the input data from two sensors in a similar format. In the situation under consideration, the gas sensor data is a scalar value whereas input from thermal image is a two-dimensional vector. Hence, Kalman filter cannot be used in this application of fusion 1D and 2D vectors. With the advancement and flexibility of AI frameworks, a combination of different AI algorithms can be used to extract important features in an efficient and improved manner and improve classification accuracy [22,23,24]. This paper presented an AI-based methodology that employs the Deep Learning (DL) frameworks for performing a fusion of multimodality data from multiple sources to detect and classify the gasses. The system is equipped with various gas detecting sensors, and a thermal imaging camera and sensor fusion is performed using the DL algorithms.
The focus of the proposed method is to extract features using two different deep learning paradigms and apply an early fusion method to fuse these features to train a classifier for detecting and subsequently identifying the gas. The proposed method can be used to detect a particular gas in a mixed environment of gases. It does not require a manual operator to operate and is a more robust solution as it incorporates the measurements from multiple gas sensors and thermal imaging cameras. In case one modality is generating false negatives, the fusion with other modality can help identify the correct outcome more effectively. On the other hand, if one modality is giving false positives, the other modality helps to bring down the combined accuracy of fused output, thereby providing accurate predictions.
The main contributions of the paper can be listed as follows: 1.
an innovative multimodal AI-based framework for the fusion of two separate modalities for robust and more reliable gas detection1 is proposed and presented The paper is organized as: Section 2 provides a brief overview of AI-based multimodal fusion methods. The frameworks for data collection and preprocessing along with the proposed system architecture are presented in Section 3. Section 4 provides a detailed discussion on obtained results, and Section 5 concludes the paper by mentioning future scope.

Theoretical Background
Fusing the data from multiple sensors makes the system more robust and reliable than the single sensor-based systems. There are various methods of sensor fusion using AI paradigms proposed in the literature. This section briefly discusses these methods as a precursor to our system framework and experimentation setup.

Methodologies for Multimodal Data Fusion
A modality refers to something that can be experienced in the environment. It is a type of information that can be felt and is stored. Some examples could be-text information, image information, smell, taste, auditory, video, and touch. Multimodal Sensor Fusion refers to combining sensor data from different sources to produce more consistent, accurate, and useful information than individual sensors to reduce false positives and false negatives. The fusion architectures can be of three types: early fusion, late fusion, and hybrid fusion [27,28]. Early fusion combines the raw data or the features extracted from the raw data [29]. This is a suitable

Convolutional Neural Network
Each Thermal image consists of non-linear features and are stored digitally in RGB format. Simple Neural Networks are not able to generalize complex patterns in images.
Convolutional Neural Networks (CNN) learns to recognize differences and patterns in images.
CNN [30] consists of -Convolution, Max Pooling, Flattening, and ANN layers. The primary purpose of convolution is to find features in an image using a feature detector and put them into a feature map.

Recurrent Neural Network
Recurrent Neural Networks (RNN) [31] consists of an essential memory element due to which the present output depends not only on current input but also on previous input.
However, as the input sequence size increases, the problem of vanishing gradient is observed Memory (LSTM), consisting of gates and memory elements, were introduced. These gates help regulate and extract information from the input and pass on gradients to the next node enabling the new sequence to be trained as equivalent as the earlier sequence and prioritize learning [32]. Also, LSTMs are more effective than conventional RNN [33].
Sensor measurements are a continuous stream of data, and hence LSTM framework is applicable for extracting the features from the sensor measurements. The thermal camera provides images, and CNN is an appropriate choice for feature extraction. The two considered modalities are having different characteristics and do not have any time-level correlation.
Hence, in our proposed framework, we have employed early fusion of features extracted by the LSTM model from gas sensors and by the CNN model from the thermal images data. The further section provides the details of the pipeline for data collection using the specified sensors, preprocessing the collected data, and developing the fusion frameworks for the proposed work.

Framework for System Design and Experimentation
The system consists of gas sensors and a thermal camera for identifying the gas concentrations and thermal images of the type of gases. The block diagram indicating the data collection process is presented in Figure 1. Figure 2 provides the structure and steps followed for training the network and Figure 3 indicates the testing phase. The detailed description for the processes indicated in these figures is provided in further sections.

Gas Sensors
Gas Sensors detect the presence of gas by converting the chemical information to electrical information. Metal Oxide Semiconductor (MQ) gas sensors are appropriate as they are compact, have fast response speed, and long service life [34,35]. Each sensor consists of a heating element that produces the analog output voltage proportional to the gas concentration.

Thermal Camera
Thermal camera is a device that measures the temperature variations using the infrared light. Every pixel on a camera image sensor is an infrared temperature sensor and gets a temperature of all points at the same time. The images are generated according to temperature format and displays images in the form of RGB. Unlike normal imaging cameras, thermal camera is not constrained by dark surroundings and can work with any environment regardless of its shape and texture [37]. Seek Thermal Camera, used in this work, is a compact thermal camera consisting of 206 × 156 Thermal Sensor, a 36-degree field of view, measurement of temperature range −40 ℃ to 330 ℃, framerate <9 Hz, and 32,136 Thermal Pixels to be able to see a thermal image easily.
The gas sensors and thermal camera are used simultaneously to collect data for training and testing of the developed fusion model. The next part of the paper describes the data collection and its preprocessing in detail.

Data Collection and Preprocessing
To the best of the authors' knowledge, no data consisting of thermal images and gas sensors for the representation of gas has yet been collected and available in the open domain for direct use. Hence, in this work, data of the sensors and thermal imaging camera is collected manually for model training and validation purposes.
The experimental data is collected through an array of 7 gas sensors as well as using the Seek Thermal Camera. The gas sensors were placed at 1 mm apart.
In the experimentation, two specific gas sources are identified, namely, the gases originating from perfumes and gases emitted by incense sticks. The experimentation setup and workflow for the data collection is shown in

Data Preprocessing
Deep learning models require a large amount of training data for appropriate and efficient operation. Due to the availability of limited data, data augmentation techniques are used, which helped to increase the dataset size. The diversity of limited thermal images is increased using data augmentation techniques such as rescaling and resizing. The Figure 5 shows the ground truth image (Figure 5a) and all images generated using rotation and tilting operations (Figure   5b). In the process of development of the CNN model, multiple experimentations were carried out with different architectures, and various hyperparameter tuning approaches were applied.

Feature Extraction from Thermal Images Using CNN
It was found that three convolution-pooling layer architecture followed by a dropout layer (dropout of 0.25) is providing the best accuracy and recall. Model is optimized with different optimizers and the best performing optimizer is selected for further processes. An ADAM optimizer with a 0.001 learning rate with a decay of 1 × 10−3 and L1-L2 regularization (0.005) in the first two Conv-Max Pool pairs are applied to avoid overfitting of the model. The model is trained for 300 epochs, which resulted in the testing accuracy of 93%.

Feature Extraction from Gas Sensor Measurements Using LSTM
Sensor measurements are sequential and hence sequence model namely LSTM Network is used for extracting the features from these measurements. The architecture of the LSTM model consists of the input layer followed by a single LSTM layer with 5 cells. LSTM layer is regularized with L2 regularization. The LSTM layers are followed by the classifier layer with the Softmax activation function.
This LSTM network was trained on different optimizers with a fixed learning rate of 0.001 to find the best optimizer. Through the trial and error, it was observed that Adam optimizer was fitting to the model the best and also converging quickly. Hence Adam optimizer is selected for analysis and experimentation work. It can be observed that Adam optimizer fits and converges quickly. The model is trained for 300 epochs, and we obtained the testing accuracy of 83%.

Multimodal Fusion of Image and Sequence Data
In this phase of the work, the features extracted from the thermal images and gas sensor measurements fused for accurate decision making. The proposed architectures of the image and sequence data fusion model are presented in Figure 6 (early fusion) and Figure 7 (late fusion). The focus of the work was to build a fused classifier that consists of both gas sensor sequence array and thermal images. In the fusion process, LSTM and CNN's output must be in the same feature space before fusion can be performed.

Results and Discussion
The multimodal AI-based fusion model for gas detection and is presented in this work.
Two modalities, namely, thermal images and gas sensor measurements, are considered in this work of gas detection. The CNN architecture is applied for extracting features from the thermal Thermal Images and 82% for Gas Sequences is achieved. However, the early fusion of features from both CNN and LSTM has provided the testing accuracy of 96%, which is greater than individual models accuracies. In the case of late fusion (max fusion and average fusion) the accuracy was observed to be around 96%. Table 3 shows the individual training and testing accuracy, loss, precision, recall, and F1 scores for all four classes considered in this study. The accuracy comparison for the individual models is shown in Figure 8. It can be observed that the fusion models outperform the individual models as the predictions in the individual models are based on both the modality data. It can also be noticed that the accuracy of early fusion is slightly higher than the late fusion models as in this case the fusion happens at the feature level which allows the interaction amongst the modalities. The confusion matrices are also plotted for all the frameworks and are shown in Figure 9.    The fusion model is trained for 300 epochs, and accuracy and loss curves are analyzed and provides better performance than individual models.
It is evident from the confusion matrices that the false positives and false negatives obtained from the fusion models are considerably lower than the individual models. Hence it can be concluded that the fusion models are outperforming the individual models. Also, the higher testing accuracy of the fusion models demonstrates that the resultant fusion system is more robust and reliable than individual models and performs the task of gas identification and classification with superiority. Analytically, false positives and false negatives appear due to various aspects of the model and data. Primary reason could be the mixing of gases to an extent which makes it difficult for the model to clearly classify. The majority of the false predictions are arriving because of moderate probability of predicting a class. A model can be trained rigorously using the more and varied data samples to solve the false prediction due to boundary line probabilities.

Conclusions
In this work, a multimodal AI-based fusion framework for reliable identification and detection of gases is developed. We considered four classes (2 individual gases, alcohol vapor obtained from perfume and smoke from incense sticks, 1 as mixture of these gases and 1 no gas) for data collection using sensors, namely, thermal camera for capturing the thermal signature of the gases and array of gas sensors (7 numbers) for detection of specific gases. The data collected is unique and has 5200 samples of both Thermal Images and Gas Sensor Sequence of vector size (1 × 7) Sensors. Both these modalities were fused using Early and Late Fusion Techniques. In summary, the contribution of this work is in bringing in innovative