MultimodalGasData: Multimodal Dataset for Gas Detection and Classiﬁcation

: The detection of gas leakages is a crucial aspect to be considered in the chemical industries, coal mines, home applications, etc. Early detection and identiﬁcation of the type of gas is required to avoid damage to human lives and the environment. The MultimodalGasData presented in this paper is a novel collection of simultaneous data samples taken using seven different gas-detecting sensors and a thermal imaging camera. The low-cost sensors are generally less sensitive and less reliable; hence, they are unable to detect the gases from a longer distance. A thermal camera that can sense the temperature changes is also used while collecting the present multimodal dataset to overcome the drawback of using only the sensors for detecting gases. This multimodal dataset has a total of 6400 samples, including 1600 samples per class for smoke, perfume, a mixture of smoke and perfume, and a neutral environment. The dataset is helpful for the researchers and system developers to develop and train the state-of-the-art artiﬁcial intelligence models and systems.


Introduction
Innovations in engineering designs are helping mankind solve industrial and social problems. Technology is solving many problems in chemical industries; however, industrial hazards can cause damage to the nearby environment. Gas leakage is a common problem in the chemical industries. Explosions, leaks, waste emissions, fires, etc. are some of the common reasons behind the industrial disasters. The carelessness in residential waste disposal also leads to fumes and hazardous gas leakages. The burning of woods is also one of the major causes of air pollution. The gases leaked during mining operations are also the reason behind the deaths of the workers. Even though the machines undergo gas leak tests before installments, gas leakage incidences were reported. Gas-detecting sensors are generally employed near the leakage-prone areas. However, the sensors are sensitive to particular gas and unable to identify leakage when the mixture of gases is present. The manual identification of gases using the chemical apparatus is always not the feasible solution due to the hazardous nature of the chemicals; for example, the smoke leakage blocks the visibility. Hence, the automatic identification of the gas is of extreme importance in the gas-leaking situations. It helps to save human lives as well as machines.
In the literature, many techniques have been employed to identify gas leakage detection. In recent years, the Internet of Things (IoT)-enabled systems that can detect gas leakage using various low-cost sensors have been developed [1][2][3][4]. However, these systems Data 2022, 7, 112 2 of 8 have limited capabilities due to the use of low-sensitive sensors. The detection of gas in a mixed environment is a challenging task and requires the use of technological advancements. Chemical methodologies that include Colorimetric Tape and Gas Chromatography are popular methodologies for detecting particular gas concentrations in a mixed environment [5][6][7]. The least square-based methodology is proposed by Khalaf for gas classification and estimating the gas concentration [8]. machine learning techniques are employed for gas detection purposes in [9]. Deep Neural Network-based methodologies are also applied for accurate gas detection purposes in [10][11][12][13]. However, the methodologies proposed in all these articles are based on the input data from multiple gas-detecting sensors. The array of sensors sensitive to different gases is formed and considered in experimentation.
However, there are a few concerns with utilizing just a gas sensors-based recognition and detection methodology. Generally, the concentration of gas in air is comparatively low; in some cases, the gases are not detectable with a standard set of gas sensors. This leads to ambiguous detection and further affects the detection accuracy of the framework. In addition, cheaper sensors are normally less accurate and may not give exact estimations. When any leakage happens, the surrounding temperature increases; this temperature change can be identified using the thermal camera [14] as well. Using a thermal camera provides the advantage of determining the gas leakage from a longer distance.
As a result of the increase in the usability of artificial intelligence and data analytic methods, the need for the availability of an accurate training dataset is increasing. The availability of the dataset will not only help to train the system but also will provide the platform and a base for the new dataset generation. The existing datasets for the detection of gases are majorly collected using the sensors array only. The various datasets available for gas detection are discussed in the next section. However, all these datasets lack multimodal information.
This paper discusses a novel multimodal dataset collected using the gas sensors along with a thermal camera. The key aspects of the present dataset are mentioned below:

•
The dataset is logged using two modalities; images from the thermal camera and numerical values from gas sensors. • The dataset is collected using two gas sources (smoke and perfume), which are used to generate data for four classes: smoke, perfume, mixture of smoke and perfume, and neutral environment. • As per the authors' knowledge, this is the only open-source multimodal dataset for gas detection purposes. • Low-cost sensors are generally less sensitive and may not detect the gas emissions from longer distances. Hence, the use of multimodal data (thermal images along with sensors' measurements) helps to detect the presence of gas from a long distance and even with less concentration. • The dataset is of interest to the researchers and professionals working in the domain of gas detection and electronic nose. It is also useful for the system designers developing an e-nose for robotic and autonomous systems. • The dataset can be used to train machine learning and deep learning models and then deploy the algorithms in real-time systems. • The current version of the dataset also provides the basis for the further extension of the dataset where more gases and their mixtures can be taken into consideration.
Furthermore, this paper is organized as follows: Section 2 provides the information and links for the existing sensors-based gas datasets. A detailed description of the proposed MultimodalGasData and its collection procedure is presented in Section 3. The downloadable link for the dataset is also mentioned in Section 3. Section 4 concludes the paper.

Gas Detection Datasets
Various datasets are available for the researchers to take the advancements further. This section provides the information and links for the available datasets.

Gas Sensor Array Drift Dataset Dataset
During the years 2007 to 2011, Alexander Vergara collected the dataset from gas sensors. It consists of 13910 measurement samples. A total of 16 metal oxide gas sensors were used in the process, and data were collected for six gases at different concentration levels. This open-source dataset is available at https://archive.ics.uci.edu/ml/datasets/ gas+sensor+array+drift+dataset (accessed on 5 June 2022) for research purposes. This dataset is primarily collected for the development of gas sensor drift compensation purposes and is discussed in [15]. The extension of this dataset with concentration information is provided at http://archive.ics.uci.edu/ml/datasets/Gas+Sensor+Array+Drift+Dataset+at+Different+ Concentrations (accessed on 5 June 2022).

Gas Sensor Array under Dynamic Gas Mixtures
Jordi Fonollosa collected the dataset using gas sensors for the mixed environment. Various gases considered in this experimentation are Ethanol, Ethylene, Ammonia, Acetaldehyde, Acetone, and Toluene. An array of 16 gas-detecting sensors was formed, and the dataset was collected for about 12 h. Two different mixtures are considered in this experiment; one of them is Ethylene and Methane in air, and another one is Ethylene and CO in air. This dataset is open source and available at https://www.kaggle.com/uciml/ gas-sensor-array-under-dynamic-gas-mixtures/version/2 (accessed on 5 June 2022). The experimentation using the dataset is presented in [16].

Gas Sensors for Home Activity Monitoring Data Set
This dataset consists of data collected using eight different metal oxide gas sensors for home activity monitoring. It consists of 100 recordings taken from the sensor array. The complete dataset can be accessed using http://archive.ics.uci.edu/ml/datasets/gas+ sensors+for+home+activity+monitoring (accessed on 5 June 2022). The experimentation using this dataset is presented in [17].

Gas Sensor Array Exposed to Turbulent Gas Mixtures Dataset
This dataset is collected using eight different chemo-resistive gas sensors. It provides the measurements when a mixture of Ethylene and Methane is present at different levels of concentrations. It provides the measurements taken at 180 instances (http://archive.ics.uci. edu/ml/datasets/Gas+sensor+array+exposed+to+turbulent+gas+mixtures (accessed on 5 June 2022)). The experimental validation of the collected dataset is presented in [18].
All these datasets discussed above are collected using the gas sensors only. However, to the authors' knowledge, there does not exist any dataset simultaneously collected using gas sensors and thermal cameras. Using a thermal camera along with gas sensors increases the performance accuracy even in low gas concentration areas. Hence, this paper introduces the multimodal dataset collected for the accurate identification and classification of gases based on various gas sensors and thermal camera.

Multimodal Gas Detection Dataset
The current work focuses on the dataset collected using multiple gas detecting sensors and a thermal camera.

Setup for Dataset Collection
The system used while collecting the dataset consists of seven metal-oxide gas detecting sensors along with a thermal camera. Figure 1 shows the framework used for data collection. Various sensors used are MQ2, MQ3, MQ5, MQ6, MQ7, MQ8 and MQ135, along with a thermal camera: Seek Compact Thermal Imaging Camera (UW-AAA).  The experimentation setup completed while collecting the present dataset is shown in Figure 2. Using the two gas sources, four classes are generated. The considered seven gas-detecting sensors are reactive to different and multiple gases.

Gas Sensors
Gas sensors distinguish the existence of a particular gas to generate electrical signals. Gas sensors based on Metal Oxide Semiconductor (MQ) technology are the suitable choice, as these sensors are compact in size, have a quick response time, and also have a longer life [20,21]. Every sensor comprises a warming component that delivers the simple result voltage corresponding to the gas concentration. Table 1 provides the information about the different sensors and their respective sensitive gases.

Thermal Camera
A thermal imaging camera is a gadget that recognizes variations in temperature based on infrared light. Each pixel of a camera is basically an infrared temperature sensor; all such pixel sensors measure the temperature of all points simultaneously. The RGB images are then produced by temperature configuration. Dissimilar to typical imaging cameras, the performance of a thermal camera is not affected by dim environmental factors and can work with any climate [22]. The considered Seek Compact Thermal Imaging Camera is compact in size. It has a resolution of 206 × 156. It can measure temperature in the range of −40 to 330°C, and it has a frame rate of <9 Hz frequency.
In this work, thermal cameras and gas sensors are used simultaneously to collect the dataset. For a better understanding of the actual sensors used in this work, the pictures of the thermal camera and sensors are shown in Figure 3.

Process of Dataset Collection
In this work, the dataset is generated using two gas sources: namely, perfume and smoke. Park Avenue Deodorants contains 95% alcohol and is used as a source of perfume. Incense sticks are used to generate smoke. Smoke is majorly a combination of Carbon Monoxide, Carbon Dioxide, Nitrogen Dioxide and Sulfur Dioxide as well as other gases in small quantity [23].
The MultimodalGasData dataset presented in this work is gathered through a variety of seven gas sensors along with utilizing the thermal camera. During experimentation, the gas sensors were kept at a distance of 1 mm from each other. In the dataset collection process, two explicit gas sources were identified and considered. One of the sources was the gases generated after spraying a perfume spray, and another one was the gases generated after lighting the incense sticks.
Thermal images and readings of the gas sensors were logged for each of the gas sources as an individual class. Another class considered was the mixture of gases generated after generating gases from both sources together. The data are logged at a frequency of 2 s continuously for the duration of 90 min. During the process, the gas was spread in the environment at an interval of 15 s for the first 30 min; then, a 30 s interval is followed for the next 30 min, and a 45 s interval is considered for the last 30 min. For better understanding and representation purposes, a few data samples are shown in Table 2. The table provides two samples of thermal images and the corresponding gas sensors' measurements for each considered class. The images shown are the data samples collected using a thermal camera, whereas the numerical values represent the readings collected using considered gas sensors.
The numerical values are the digital equivalents of the analog signals from gas sensors obtained after using the 10-bit analog to digital converter. The way the gases are discharged while collecting the dataset could affect their thermal signatures. Hence, to avoid conflicts and maintain uniformity, the gases were spread all the time similarly. In addition, appropriate care has been taken to spread a constant amount of gas every time.

Dataset Description
A MultimodalGasData dataset has 6400 samples in total. These 6400 samples are divided equally in four classes. The dataset contains 1600 samples of perfume class, 1600 samples belonging to the smoke class, and 1600 samples for a mixture of perfume and smoke class. The remaining 1600 samples were collected for the neutral environment (No gas) class. To showcase the variation in the developed dataset, the statistical analysis is also carried out, and the obtained statistical properties for the collected data from gas sensors are presented in Table 3. The dataset is published online at the Mendeley Data repository. The direct link for the dataset is https://data.mendeley.com/datasets/zkwgkjkjn9/2 (accessed on 10 June 2022). This repository contains two folders namely 1.
Thermal Camera Images The Gas Sensors Measurements folder contains a CSV file having measurements taken using seven gas sensors. In the file, the first column represents the serial number; the following seven columns consist of measurements from gas sensors in the sequence mentioned in the tables. The ninth column denotes the class to which the measurements belong (no gas, perfume, smoke, mixture). The tenth column represents the name of the corresponding thermal image. The Thermal Camera Images folder consists of four different zip files belonging to four classes (no gas, perfume, smoke, mixture). Each zip file consists of images belonging to a particular class. Thermal images are named following the nomenclature 'serialNumber_Class'. The same value of the serial number from the CSV file and thermal images folders signifies that the measurements are taken simultaneously.

Conclusions
The presented MultimodalGasData is the dataset collected for the researchers and system designers to work on the gas detection and classification application. It is a novel multimodal dataset consisting of data samples collected using multiple gas detecting sensors and a thermal imaging camera. It provides the advantage of detecting the presence of certain gas using two modalities: images and numerical values. The dataset can train machine learning and deep learning models and deploy the algorithms in real-time systems. The current version of the dataset also provides a basis for the researchers to enhance and increase the sample size by considering more gases and their combinations.