Ensemble Machine Learning Model for Accurate Air Pollution Detection Using Commercial Gas Sensors

This paper presents the results on developing an ensemble machine learning model to combine commercial gas sensors for accurate concentration detection. Commercial gas sensors have the low-cost advantage and become key components of IoT devices in atmospheric condition monitoring. However, their native coarse resolution and poor selectivity limit their performance. Thus, we adopted recurrent neural network (RNN) models to extract the time-series concentration data characteristics and improve the detection accuracy. Firstly, four types of RNN models, LSTM and GRU, Bi-LSTM, and Bi-GRU, were optimized to define the best-performance single weak models for CO, O3, and NO2 gases, respectively. Next, ensemble models which integrate multiple single weak models with a dynamic model were defined and trained. The testing results show that the ensemble models perform better than the single weak models. Further, a retraining procedure was proposed to make the ensemble model more flexible to adapt to environmental conditions. The significantly improved determination coefficients show that the retraining helps the ensemble models maintain long-term stable sensing performance in an atmospheric environment. The result can serve as an essential reference for the applications of IoT devices with commercial gas sensors in environment condition monitoring.


Introduction
Effectively monitoring and controlling air quality has become an important issue of concern to the public today. Air pollution significantly impacts human health, ranging from mild chronic respiratory symptoms to acute respiratory infections, exacerbating pre-existing heart and lung diseases [1,2]. Even if people only expose themself to mild air pollution, it will shorten their lives [3]. Thus, people living in urban areas or nearby industrial areas demand information on atmospheric air quality because they have a higher risk of exposure to elevated air pollution [4].
To acquire the air quality information, government and environmental protection agencies have started to set up fixed-site air quality monitoring stations in various regions. These monitoring stations have accurate instruments to regularly monitor air quality in the environment, analyze the concentration of pollutants, and provide information to the public for reference [5,6]. However, building more fixed-site air quality monitoring stations is difficult due to terrain limitations and the high cost of setting up and maintaining the stations. The monitoring data provided by the stations are relatively sparse and therefore cannot meet the increasing demand for air quality information. Therefore, various techniques have been proposed to improve the spatial density of air quality information, such as the detection of short-term air quality campaigns by the mobile laboratory [7], interpolation In fact, both the ANN and the RNN have made good progress in in-field gas sensing. ANN first made feasibility in gas classification and concentration detection; RNN further processed the sequence measured concentration data and interference factors to improve accuracy. Moreover, it is noticeable that sensors may have inconsistent performance in different measurement scenarios. For example, performance can vary significantly in different areas due to different environmental conditions [10,26,29]. In practical applications, the long-term correlation between low-cost gas sensors and reference instruments is not stable, mainly due to the change in field temperature and humidity [10]. Thus, the gas-sensing calibration technology still has the problem of generalizability.
The ensemble model is a recent solution to the bottleneck of deep learning, which improves the prediction performance of a single model by training multiple single models and combining their prediction results [30,31]. Ensemble machine learning has been used widely in various fields, such as face recognition, target tracking, and bioinformatics [32][33][34][35]. On the other hand, the current research on gas sensing with machine learning are all based on a single individual model, and thus an ensemble model has the potential to improve gas-sensing performance.
Therefore, this paper aims to study the development of ensemble models to monitor in-field gas concentrations with low-cost commercial gas sensors. The generalizability of the trained model to be used with sensors in different environmental conditions was also tested. First, we collected the concentrations of CO, O 3 , and NO 2 gases and the atmospheric conditions with homemade IoT devices for the following model training. The preprocessing of data included outlier detection and normalization. Second, four types of RNN model, LSTM, GRU, Bi-LSTM, and Bi-GRU, were introduced; a loss function and an evaluation function were also defined for training and optimizing a single RNN model. In the third part, the four types of optimized RNN single model were presented. Then, an ensemble model containing static models, i.e., the optimized RNN single models and a dynamic model, was composed and trained. For better generalizability of the model, a retraining procedure for the ensemble model was processed to make the model more flexible to adapt to various environmental conditions and improve the long-term sensing performance. Finally, the discussion and conclusion were presented.

IoT Device Designing and Data Collection
To detect gaseous pollutants in the environment, the authors developed a low-cost wireless gas-sensing device in the study. Figure 1 shows the low-cost Internet of Things (IoT) [36] device used in the research. It consists of four components: a gas sensor, a NodeMCU WIFI chip, a Homemade PCB, and a Arduino Mega2560. The gas sensor is a commercial component sold on the market, which detects the concentration of target gases, including CO, O 3 , and NO 2 . The sensors also detect the temperature and humidity in the atmosphere simultaneously. The Arduino Mega2560 stores the data detected by the sensors, while the NodeMCU WIFI chips upload the data stored by the Arduino Mega2560 to the cloud via the WIFI devices. The homemade PCB, designed by the author, integrates the interface of various electronic components, which reduces the occupied space and increases the stability of the component. The typical total cost of this device is 200 USD.
The low-cost IoT devices were then set up at a monitoring station established by the Environmental Protection Administration (EPA), Taiwan. This monitoring station is located in Guting Elementary School, Taipei city. Air quality measurements were taken using the instruments in the station (HORIBA APMA360 for CO, ECOTECH ML9810 for O 3 , and ECOTECH ML9841 for NO 2 ) and then published on the EPA website. Our IoT devices were placed in the instrument shelter of the Guting monitoring station to ensure that the device's environment was not disturbed by illumination fromthe sun, rain, and ground radiation. Then, the IoT devices measured the ambient temperature, humidity, and gas concentrations in the atmosphere continuously for three months. The data collected from 5 January to 23 March 2021, were used to develop machine learning models, including training, validation, and first testing; the data recorded from 23 March to 14 April were used for more testing to examine the models' long-term stability.
concentrations in the atmosphere continuously for three m 5 January to 23 March 2021, were used to develop mach training, validation, and first testing; the data recorded f used for more testing to examine the models' long-term s The IoT architecture divides into the perception, netw shown in Figure 2, the hardware of the perception layer i sible for detecting target gas concentrations and transmit data to the network layer. The hardware of the network la is responsible for receiving the information transmitted b loading the received data to Google Cloud through the d plication layer is a personal computer. It obtains the infor ception layer from the Google cloud and performs data data. The preprocessing process includes outlier cleaning, selection. Thus, the preprocessed atmosphere data transm the IoT architecture for further gas concentration calculati The IoT architecture divides into the perception, network, and application layers. As shown in Figure 2, the hardware of the perception layer is a low-cost IoT device responsible for detecting target gas concentrations and transmitting the detected concentration data to the network layer. The hardware of the network layer is the 4G WIFI router, which is responsible for receiving the information transmitted by the perception layer and uploading the received data to Google Cloud through the device. The hardware of the application layer is a personal computer. It obtains the information transmitted by the perception layer from the Google cloud and performs data preprocessing on the acquired data. The preprocessing process includes outlier cleaning, data normalization, and feature selection. Thus, the preprocessed atmosphere data transmits to the trained AI model using the IoT architecture for further gas concentration calculation. concentrations in the atmosphere continuously for three months. The data collected from 5 January to 23 March 2021, were used to develop machine learning models, including training, validation, and first testing; the data recorded from 23 March to 14 April were used for more testing to examine the models' long-term stability. The IoT architecture divides into the perception, network, and application layers. As shown in Figure 2, the hardware of the perception layer is a low-cost IoT device responsible for detecting target gas concentrations and transmitting the detected concentration data to the network layer. The hardware of the network layer is the 4G WIFI router, which is responsible for receiving the information transmitted by the perception layer and uploading the received data to Google Cloud through the device. The hardware of the application layer is a personal computer. It obtains the information transmitted by the perception layer from the Google cloud and performs data preprocessing on the acquired data. The preprocessing process includes outlier cleaning, data normalization, and feature selection. Thus, the preprocessed atmosphere data transmits to the trained AI model using the IoT architecture for further gas concentration calculation. The commercial gas sensors used in the mentioned IoT device have different performance characteristics. Table 1 shows their detecting ranges and resolutions, and the crosssensitivity information of each gas sensor is listed in Table 2. The manufacturer of these gas sensors is SPEC [37]. This study refers to each air pollutant's annual average concentration values in 2020, as shown in Table 3 [38]. The average annual concentration of CO gas in 2020 was 0.35 ppm, indicating that the average annual concentration of CO gas was The commercial gas sensors used in the mentioned IoT device have different performance characteristics. Table 1 shows their detecting ranges and resolutions, and the cross-sensitivity information of each gas sensor is listed in Table 2. The manufacturer of these gas sensors is SPEC [37]. This study refers to each air pollutant's annual average concentration values in 2020, as shown in Table 3 [38]. The average annual concentration of CO gas in 2020 was 0.35 ppm, indicating that the average annual concentration of CO gas was greater than the resolution of the CO gas sensor (0.1 ppm in Table 1). Thus, the CO gas sensor can effectively respond to the target gas in the environment at this concentration level. The annual average concentration of O 3 gas in 2020 was 30.9 ppb, indicating that the annual average concentration of O 3 gas was slightly higher than the resolution of the O 3 gas sensor (20 ppb). The O 3 gas sensor can also respond to the target at this concentration level in the atmospheric environment. However, the concentration interpretation error of O 3 is higher, because the concentration resolution of the O 3 gas sensor is slightly poor. The annual average concentration of NO 2 gas in 2020 was 11.16 ppb, indicating that the annual average concentration of NO 2 gas was less than the resolution of the NO 2 gas sensor (20 ppb), and its standard deviation is ±5.01 ppb, indicating that the concentration of NO 2 gas is highly dispersive. Therefore, although the NO 2 sensor may respond to the target gas at this concentration level, the concentration interpretation error of the NO 2 sensor is more significant than that of the CO and O 3 sensors due to the poor concentration resolution of the NO 2 sensor.

Outlier Detection for Feature Selection
In the study, a local outlier factor (LOF) algorithm [39] filtered outlier data appearing in the data collection procedure. The problem of an insufficient cache occurred, given the long-term monitoring of the low-cost IoT devices in the environment. Therefore, the IoT devices were designed to reset regularly every 5 min. In the resetting process, conflicts between the Arduino Mega2560 embedded device, the NodeMCU WIFI chip, and the gas sensor occasionally appeared, resulting in mixed outliers in data uploading, which is defined as data loss. Data loss caused the features of gas concentration, temperature, and humidity to change rapidly in a short period of time and disturbed the data preparation. Therefore, we used a LOF algorithm to filter outliers generated by the data loss. The idea of the LOF algorithm is to quantify the density of each sample point and compare the density of the quantized sample point with the density of its neighboring points. Whether a sample point is an outlier depends on the degree of difference between the local density of the sample point itself and that of its neighboring reference points. If the local density is significantly different from its reference neighbor, the sample point is regarded as an outlier and vice versa. Applying this method to outliers can prevent drastic data changes due to data loss and effectively improve the quality of training data.
For example, the features collected by the O 3 gas sensors are shown in Figure 3. Figure 3 contains the raw values (orange line) and corresponding physical quantities (blue line) of the concentration, temperature, and humidity. The raw values are the digital signals of the ADC converter (in nano amperes nA). According to the conversion formula given by the dealer, the raw values convert into the corresponding physical quantities. The physical quantity is the ppb value of O 3 gas concentration, temperature ( • C), and relative humidity (RH). The converted physical quantity is stored as an integer variable, which loses the data after the decimal point. Therefore, in feature selection, only the raw value of the gas sensor is retained, and the physical quantity obtained by the converted formula of the gas sensor is discarded. the ADC converter (in nano amperes nA). According to the conversion formula given by the dealer, the raw values convert into the corresponding physical quantities. The physical quantity is the ppb value of O3 gas concentration, temperature (°C), and relative humidity (RH). The converted physical quantity is stored as an integer variable, which loses the data after the decimal point. Therefore, in feature selection, only the raw value of the gas sensor is retained, and the physical quantity obtained by the converted formula of the gas sensor is discarded.

Data Preprocessing-Normalization and Division
The datasets collected from the environment required processing before developing machine learning models. The variation ranges of the data collected by gas sensors are different, and thus normalization was adopted to avoid the unbalance effect of the essential features. Data were normalized using the MinMaxScaler equation, and the formula is expressed as: where X is the maximum value of the data, X is the maximum value, and X is the original value. After the normalization, the X . is scaled to the range [0, 1], and the trend properties remain.
The data collected from the environment is time-dependent with a sequence property. The commonly used random data shuffling is unsuitable for developing our machine learning models, because data leakage occurs if the sequential data is normalized after random shuffling. The shuffled normalized data for model training may contain the information of the testing dataset, such as the upper and lower limits, which can lead to over-optimistic training results for the offline model training.

Data Preprocessing-Normalization and Division
The datasets collected from the environment required processing before developing machine learning models. The variation ranges of the data collected by gas sensors are different, and thus normalization was adopted to avoid the unbalance effect of the essential features. Data were normalized using the MinMaxScaler equation, and the formula is expressed as: where X max is the maximum value of the data, X min is the maximum value, and X is the original value. After the normalization, the X norm. is scaled to the range [0, 1], and the trend properties remain. The data collected from the environment is time-dependent with a sequence property. The commonly used random data shuffling is unsuitable for developing our machine learning models, because data leakage occurs if the sequential data is normalized after random shuffling. The shuffled normalized data for model training may contain the information of the testing dataset, such as the upper and lower limits, which can lead to over-optimistic training results for the offline model training.
This study divided the data into three parts: the training and validation datasets in the training phase and the testing set in the testing phases. As an example, Figure 4 shows these three parts for the CO gas concentration. The blue, orange, and green lines represent the training, validation, and testing data. These sets do not overlap each other in time series to avoid the problem of data leakages in the model development. The training dataset is for the model training, it is validation data to be used for the hyperparameters' adjustment by minimizing the validation error, and the testing set is to test the actual performance of the target gas model. All datasets in this paper were treated using this method to develop a high-performance model.
Sensors 2022, 21, x FOR PEER REVIEW adjustment by minimizing the validation error, and the testing set is to test the act formance of the target gas model. All datasets in this paper were treated using this to develop a high-performance model.

Basics of Machine Learning
In machine learning, hyperparameters are parameters used to control the l process. Proper hyperparameters can make the model converge to a better local mi Hyperparameters include the model hyperparameter, i.e., the model type, and th rithm hyperparameters, including the layers, the number of neurons, and the l rate. Since the data in this study belongs to time-series data, considering the pro the data, the Recurrent Neural Network (RNN) [21] is selected. RNN can be used lyze time-series data and extract information between data through the gate uni the model so that the RNN has a breakthrough in processing time-series data.

Recurrent Neural Network
The four types of RNNs used in the subsequent experiments are introduced section. The first RNN is the Long Short-Term Memory (LSTM) network. The LST prises multiple memory cells, and each memory cell has three gate units: the forg input gate, and output gate. LSTM controls the amount of hidden state infor through the gated unit and improves the phenomenon that Simple-RNN is prone gradient explosion or gradient vanishment, when dealing with long time-series pr Figure 5 shows a schematic diagram of the LSTM structure composed of three m cell units, and the parameters are defined in the following equations.

Basics of Machine Learning
In machine learning, hyperparameters are parameters used to control the learning process. Proper hyperparameters can make the model converge to a better local minimum. Hyperparameters include the model hyperparameter, i.e., the model type, and the algorithm hyperparameters, including the layers, the number of neurons, and the learning rate. Since the data in this study belongs to time-series data, considering the property of the data, the Recurrent Neural Network (RNN) [21] is selected. RNN can be used to analyze time-series data and extract information between data through the gate unit inside the model so that the RNN has a breakthrough in processing time-series data.

Recurrent Neural Network
The four types of RNNs used in the subsequent experiments are introduced in this section. The first RNN is the Long Short-Term Memory (LSTM) network. The LSTM comprises multiple memory cells, and each memory cell has three gate units: the forget gate, input gate, and output gate. LSTM controls the amount of hidden state information through the gated unit and improves the phenomenon that Simple-RNN is prone to, i.e., gradient explosion or gradient vanishment, when dealing with long time-series problems. Figure 5 shows a schematic diagram of the LSTM structure composed of three memory cell units, and the parameters are defined in the following equations.
the model so that the RNN has a breakthrough in processing time-series data.

Recurrent Neural Network
The four types of RNNs used in the subsequent experiments are introduced in this section. The first RNN is the Long Short-Term Memory (LSTM) network. The LSTM comprises multiple memory cells, and each memory cell has three gate units: the forget gate, input gate, and output gate. LSTM controls the amount of hidden state information through the gated unit and improves the phenomenon that Simple-RNN is prone to, i.e., gradient explosion or gradient vanishment, when dealing with long time-series problems. Figure 5 shows a schematic diagram of the LSTM structure composed of three memory cell units, and the parameters are defined in the following equations.  The forget gate is represented by f t , the input gate is i t , and the output gate is o t . The index t presents the time step. Thus, x t and h t are the input and hidden states at the current moment. c t represents the updating value of cell state, and c t is the cell state at the current time. The forget gate, input gate, and output gate are activated by the sigmoid function and the updating value of the cell state by the hyperbolic tangent function. W f , W i , W o , W c are the weight matrix of the forget gate, input gate, output gate, and cell state, respectively, The forget gate f t controls the proportion of c t−1 , the cell state at the last time step, to be forgotten in the current cell state c t ; the input gate i t determines how much of the current updating value c t is needed; the output gate o t determines the proportion of the cell state c t to be used as an output and to obtain the hidden state h t of the memory cell at the current moment. For more information on LSTM, refer to Hochreiter and Schmidhuber [23].
The second type of RNN is the Gated Recurrent Unit (GRU) network, a simplified version of LSTM. Each GRU memory cell has two gate units: the update gate and the reset gate. Compared with LSTM, GRU has fewer parameters, reducing the time spent in model training and the cost of hardware calculation. Figure 6 shows a schematic diagram of the GRU structure composed of three memory cell units, and Equations (8)-(11) define the operators.
In the above formula, r t represents the reset gate, z t the update gate, h t the updating value of the hidden state, and x t and h t are the input and hidden state at the current time, respectively. The reset gate and the input gate are activated by the sigmoid function and the updating value of the hidden state by the hyperbolic tangent function. W r , W z , W h are the weight matrix of the reset gate r t , update gate z t , and the updating value of the hidden state h t , and b r , b z , b h are the bias matrix of r t , z t , and h t , respectively. The update gate controls the ratio of the hidden state h t−1 at the last time step (t − 1) and the updating value of the hidden state h t at the current time; the reset gate r t resets the information of the hidden state h t−1 . More information on GRU is in reference [24]. information on LSTM, refer to Hochreiter and Schmidhuber [23].
The second type of RNN is the Gated Recurrent Unit (GRU) network, a simplified version of LSTM. Each GRU memory cell has two gate units: the update gate and the reset gate. Compared with LSTM, GRU has fewer parameters, reducing the time spent in model training and the cost of hardware calculation. Figure 6 shows a schematic diagram of the GRU structure composed of three memory cell units, and Equations (8)-(11) define the operators.
In the above formula, represents the reset gate, the update gate, ℎ the updating value of the hidden state, and and ℎ are the input and hidden state at the current time, respectively. The reset gate and the input gate are activated by the sigmoid function and the updating value of the hidden state by the hyperbolic tangent function.
, , are the weight matrix of the reset gate , update gate , and the updating value of the hidden state ℎ , and , , are the bias matrix of , , and ℎ , respectively. The update gate controls the ratio of the hidden state ℎ at the last time step 1 and the updating value of the hidden state ℎ at the current time; the reset gate resets the information of the hidden state ℎ . More information on GRU is in reference [24].
The third and fourth types of RNNs are the Bi-directional Long Short-Time Memory    value of the hidden state ℎ , and , , are the bias matrix of , , and ℎ , respec tively. The update gate controls the ratio of the hidden state ℎ at the last time step 1 and the updating value of the hidden state ℎ at the current time; the reset gate resets the information of the hidden state ℎ . More information on GRU is in refer ence [24].
The    Compared with LSTM and GRU, Bi-LSTM and Bi-GRU improve time flow. In the forward-transmission RNN, a new backward RNN is built, and the time flow of the two is precisely opposite. Therefore, for Bi-RNNS, there are two RNNs with opposite time flows, namely the forward layer and the backward layer; the information of memory cells corresponding to the same time is provided by the output value of both the forward and backward RNNs. Therefore, compared with the general form of RNN, Bi-RNN can consider the information of the whole time series and make full use of the context of the time series. For more information on Bi-RNN, refer to Schuster and Paliwal [25].

Construction of Model
In this study, the machine learning models used to detect gas concentration include three parts: the input layer, the hidden layer, and the output layer, as shown in Figure 9. The input layer preprocesses raw data using the LOF algorithm and the MinMaxScaler method. The hidden layer comprises the recurrent neural layer (the blue box in Figure 9) and the fully connected layer (the orange circles, referred to as the FC layer). The Bi-LSTM memory cells (hereafter referred to as the BiL layer) are shown in the recurrent neural layer as an example. The number of layers and neurons varies according to the type of gas model; the dense layer number is set as two. The weight of the hidden layer of the model is optimized by the Adam optimizer [40] with a mean square error (MSE) loss function so that the loss function converges to a better local minimum. The output layer receives the information from the previously hidden layer and calculates the detected gas concentration through a linear activation function. In our study, the three-layer models are flexible enough to develop high-performance machine learning models.
Sensors 2022, 21, x FOR PEER REVIEW

Construction of Model
In this study, the machine learning models used to detect gas concen three parts: the input layer, the hidden layer, and the output layer, as show The input layer preprocesses raw data using the LOF algorithm and the method. The hidden layer comprises the recurrent neural layer (the blue b and the fully connected layer (the orange circles, referred to as the FC layer) memory cells (hereafter referred to as the BiL layer) are shown in the re layer as an example. The number of layers and neurons varies according to model; the dense layer number is set as two. The weight of the hidden laye is optimized by the Adam optimizer [40] with a mean square error (MSE) l that the loss function converges to a better local minimum. The output lay information from the previously hidden layer and calculates the detected tion through a linear activation function. In our study, the three-layer mod enough to develop high-performance machine learning models.  The concentration-detecting models use a loss function and an evaluation function. The MSE loss function in the hidden layer is defined as: where y predict(i) is the model gas concentration output value, y true(i) is the actual value, and n is the total sample number.
After subtracting the model output value and the corresponding actual value of each sample, then dividing the corresponding true value and taking the absolute value, the MAPE of each sample is obtained. MAPE defines a dimensionless error, and thus it is suitable to use to measure the differences between the model output and the actual values, even under different numerical magnitudes.

Development of Ensemble Models
Ensemble models consisting of different RNNs were developed to detect target gas concentrations in this study. Traditionally, developers use the validation dataset and take the evaluated index to obtain the best machine learning model. The steps include: 1. a series of hyperparameter tests by the training data; 2. evaluating the performance by the validation data; and 3. selecting the model hyperparameter which shows the best evaluation in the last step. The optimal model selected by the above procedures can achieve the best performance in the validation set, but some disadvantages exist. First, the procedure is timeand labor-consuming, but only one, the optimal hyperparameter configuration, is selected. The other models that result from the hyperparameter optimization are abandoned. Next, the features of the validation dataset can not guarantee consistency with those of the future new data. The model with the best performance of the validation-run set may not have the best performance while applied to the new data in the future. Thus, improvement by using an ensemble model was proposed to enable the modification of the model and learn the generalizability of future data.
Developing ensemble models includes optimizing hyperparameters, comparing memory cells, and ensembling and retraining the best model. The machine learning programs were based on Python3.8, Tensorflow-gpu 2.4.0, and execution on graphics cards of NVIDIA Titan XP and NVIDIA RTX 3080. Details are in the following subsections.

Optimization of Hyperparameters
In machine learning, hyperparameters include the model hyperparameters and the algorithm hyperparameters. Model hyperparameters, such as the number of layers, neurons, and input features, affect the model's best performance. Algorithm hyperparameters include the selection of the optimizer, learning rate, batch size, etc., which significantly affect the convergence and training time of the model. A series of hyperparameter optimizations was processed in the study. We first optimized the model hyperparameter to determine the basic architecture of each gas model. Then, the algorithm hyperparameters were optimized so that each gas model could shorten the training time and converge to a better local minimum.
The model hyperparameters of a single weak model for detecting a specific gas were determined firstly. Configurations of the model for a single-gas detection (CO, O 3 , and NO 2 ) are shown in Table 4. We compare the influence of the number of BiL layers on the performance of each gas model, which is one to three layers, respectively. Each layer has a specific number of neurons, which is a power of 2, as shown in the brackets. Finally, the number of BiL layers of the CO model was set to two, and the number of BiL layers of the O 3 model and the NO 2 model was set to three. Next, we compared the performance of the model validation set with the number of input features to determine the number of input features for each gas model. The input feature contains the raw values from the gas sensors (gas concentration, humidity, and temperature). The MAPE of the CO gas model is 16.31% when it is trained with the dual gas features of CO and O 3 and 17.35% when trained with the single-gas feature of CO. The performance of the dual-gas features is improved by 1.04%. The MAPE of the O 3 gas model was 36.98% after training with O 3 and NO 2 dual gas features and 41.67% after training with O 3 single gas features. The performance using the dual-gas feature is 4.7% higher than that of the single-gas feature; the reason is that the O 3 gas sensor is disturbed by NO 2 gas (as mentioned in Section 2.1). Therefore, adding the O 3 gas feature can significantly improve the performance of the validation set of the O 3 gas model. The MAPE of the NO 2 gas model was 86.27% after training with O 3 and NO 2 dual gas features and 68.05% after training with NO 2 single gas features. The performance of the single-gas feature is 18.22% higher than that of the dual-gas feature; the reason is that the NO 2 gas sensor is less disturbed by O 3 gas similarly. Therefore, adding the O 3 gas feature will reduce the validation set performance of the NO 2 gas model. Through the above experiments, we determined the model hyperparameters of the gas model, including the basement architecture and input features. Next, the algorithm hyperparameter for each gas model will be optimized.
The algorithm hyperparameters, e.g., the batch size and the dropout layer coefficient, are optimized to reduce the model's training time and improve the model's convergence. The batch size is the number of samples used for training once. A larger batch size can shorten the training time of the model, but the variance between batches is slight when calculating the gradient of each batch in reverse. Therefore, the gradient obtained from each batch varies little, and the lack of gradient randomness tends to fall into a poor local minimum. Smaller batches require a longer calculation time for each iteration, which prolongs the training time of the model and increases the time cost of model tuning. However, compared with the large batch, the small batch has the advantage of gradient randomness, resulting in it converging better to the local minimum. In summary, choosing an appropriate batch size is necessary to balance the training time and convergence of the model. Figure 10 shows the experimental results of the effects of the batch size in developing the gas models. Although the batch size of 64 achieved the best convergence, it took twice the computation time as long as the batch size of 256, and the performance difference between the two was only 1.68%. Finally, the batch size of 256 was selected as the best batch size configuration for the CO gas model. The decision of O 3 gas and NO 2 gas models also considered the time cost, and the final batch sizes were 128 and 256, respectively.
After this, the dropout layer coefficients of each gas model were determined. The dropout layer is a method used to improve model overfitting by shielding a certain percentage of neurons in each epoch, so that model training does not rely too much on specific neurons for training and prevents model overfitting [41]. By adjusting the coefficient of the dropout layer appropriately, the overfitting phenomenon of each target gas model on the training set can be effectively alleviated, and the performance of the verification set of each target gas model can be effectively improved. Figure 11 shows that, when the dropout coefficient is 0.15, the CO gas and the O 3 gas models have the best validation set performance, with 10.25% and 34.07% MAPE, respectively. The NO 2 gas model has the best validation performance when the dropout coefficient is 0.075, with 48.35% MAPE.
resulting in it converging better to the local minimum. In summary, choosing an appropriate batch size is necessary to balance the training time and convergence of the model. Figure 10 shows the experimental results of the effects of the batch size in developing the gas models. Although the batch size of 64 achieved the best convergence, it took twice the computation time as long as the batch size of 256, and the performance difference between the two was only 1.68%. Finally, the batch size of 256 was selected as the best batch size configuration for the CO gas model. The decision of O3 gas and NO2 gas models also considered the time cost, and the final batch sizes were 128 and 256, respectively. After this, the dropout layer coefficients of each gas model were determined. The dropout layer is a method used to improve model overfitting by shielding a certain percentage of neurons in each epoch, so that model training does not rely too much on specific neurons for training and prevents model overfitting [41]. By adjusting the coefficient of the dropout layer appropriately, the overfitting phenomenon of each target gas mode on the training set can be effectively alleviated, and the performance of the verification set of each target gas model can be effectively improved. Figure 11 shows that, when the dropout coefficient is 0.15, the CO gas and the O3 gas models have the best validation set performance, with 10.25% and 34.07% MAPE, respectively. The NO2 gas model has the best validation performance when the dropout coefficient is 0.075, with 48.35% MAPE.

Comparison of Memory Cells
The memory cell used in the model is a vital model hyperparameter to be discussed. The recurrent neural layer uses the Bi-LSTM memory cell unit in the last section. More types of different memory cell units, including Bi-GRU, LSTM, and GRU, are compared based on their performance in the validation set testing. Figure 12  The NO 2 sensor has the lowest ratio of the average annual concentration to resolution, and thus the MAPE of the NO 2 sensor model is always higher than the models of the other two gases. Compared with the CO gas model and the O 3 gas model, the NO 2 gas models have the most considerable performance variation while different memory cells are used. Subsequently, by integrating each gas model trained by four different memory cells, an ensemble model can be retrained to be the best model for gas concentration detecting. els are 143.25% (Bi-GRU), 87.37% (LSTM), and 55.24% (GRU), respectively. show that the gas models trained by different memory cells have various perf the validation set. Basically, the commercial sensor's native resolution limits mance. The NO2 sensor has the lowest ratio of the average annual concentra lution, and thus the MAPE of the NO2 sensor model is always higher than th the other two gases. Compared with the CO gas model and the O3 gas model, models have the most considerable performance variation while different m are used. Subsequently, by integrating each gas model trained by four differ cells, an ensemble model can be retrained to be the best model for gas conce tecting.

Ensemble Models to Obtain the Best Model
Ensemble models are proposed in this study to reuse all the single weak models trained in the last steps. Figure 13 shows schematic diagrams of the ensemble models for CO gas; models for detecting O 3 and NO 2 gas are constructed in the same way. The orange dashed line in Figure 13 indicates the four types of recurrent neural models trained in Section 4.2. These recurrent neural models are integrated, and their parameters inherited from the last step are frozen in the ensemble model; thus, it is named a static model. The green dashed line in Figure 13 highlights a fully connected neural network responsible for receiving output values and training data from the static model and then determining their parameters through backpropagation. Since the weight coefficients (w i and w o in Figure 13) will change in the further retraining procedure, this NN is named the dynamic model. The dynamic model learns the deviation relation of different RNN models through retraining, summarizes the target gas concentration calculated by the static model, and outputs the final summarized target gas concentration value.
The performance values of the ensemble model and every single weak model for three target gases are shown in Figure 14. Considering the CO gas models, in the training phase, the validation set testing of the ensemble model is not optimal, with a MAPE of 13.56%. The single models using Bi-LSTM and Bi-GRU memory cells have better MAPE values of 10.25% and 12.08%. However, in the testing phase, where the new data (i.e., testing dataset) were used, the 15.23% MAPE of the ensemble model is the best one in all models. The results show that all the CO single weak models have significantly lower performance values in the testing set than in the validation set. The Bi-LSTM single model, which has the best performance in the validation set, is not globally optimal for the test set, indicating that the single weak model has poor applicability to the new data. The ensemble model maintains a certain model performance on the new data and effectively improves the model's generalizability to the new data. their parameters through backpropagation. Since the weight coefficients (w ure 13) will change in the further retraining procedure, this NN is name model. The dynamic model learns the deviation relation of different RNN m retraining, summarizes the target gas concentration calculated by the sta outputs the final summarized target gas concentration value. The performance values of the ensemble model and every single w three target gases are shown in Figure 14. Considering the CO gas models, phase, the validation set testing of the ensemble model is not optimal, w 13.56%. The single models using Bi-LSTM and Bi-GRU memory cells hav values of 10.25% and 12.08%. However, in the testing phase, where the testing dataset) were used, the 15.23% MAPE of the ensemble model is the models. The results show that all the CO single weak models have sign performance values in the testing set than in the validation set. The Bi-LSTM which has the best performance in the validation set, is not globally optim set, indicating that the single weak model has poor applicability to the new semble model maintains a certain model performance on the new data improves the model's generalizability to the new data.
A similar result was obtained from the experiments using O3 gas mode ble model's validation performance (34.74% MAPE) for O3 gas is not the pared to the other four single weak models, while the performances of the s cell model using GRU and Bi-LSTM were MAPEs of 29.78% and 34.07% However, in the testing phase, all models examined new data (i.e., test data and the integrated model had the best MAPE (37.14%) again. The O3 gas en is more applicable and keeps the model's generalizability to new data. x FOR PEER REVIEW 15 of 21 In the tests of the NO2 gas models, the ensemble model has the best performance in both the validation and testing sets, and the MAPE values were 43.67% and 67.37%, respectively. The performance of the NO2 ensemble model decreases less than that of the single weak model in the test set compared with that of the single weak model in the validation set, indicating that the NO2 ensemble model is better than that of the single weak model in the application of new data. However, there is still room for improvement compared with the CO gas and O3 gas ensemble models.
It can be summarized that the static model part of the ensemble models contains the fundamental properties of the gas sensors found by the optimized single weak models. Further, the dynamic model part is a combination of the calibrated models and thus can achieve better performance by tuning weight coefficients. According to these experimental results of different gas ensemble models, the model deviation of a single-memorycell model was effectively offset by integrating more types of recurrent neural models. Thus, ensemble models can have a better generalizability for handling individual differences in the commercial sensors of the same module; therefore, the ensemble model was chosen as the best model for gas concentration detection.

Ensemble Model Retraining
In the previous section, the ensemble model had the best performance for each gas, but further tests observed a decayed performance while more new data were input. The reason is that the data used to train the ensemble RNN models were collected from January 5 to March 23, containing only a partial property in a whole year. Thus when the new data collected from March 23 to April 14 was input into the model, the performance became unstable, because the atmospheric conditions changed across different seasons. Therefore, we propose a periodical retraining procedure. The periodic retraining procedure regularly updates the dynamic model's weights to conform to the deviation of the characteristic distribution of the new dataset collected in the atmospheric environment in each period. It extends the life cycle of the integrated gas models.
The flow chart of an optimal gas model is set out in Figure 15. Data engineering is the first step in dealing with a new dataset, including preprocessing, outlier cleaning, and normalization, as mentioned in Section 2. The second step, model engineering, contains a recursive work-model online, performance monitoring, and model retraining. We obtained the best ensemble model through a training series, as shown in Sections 4.1-4.3. Then, the model was activated to calculate gas concentrations; this model was named model online. Under regular monitoring, the model's performance declined with time; therefore, the program counts on the amount of data that the model has calculated and decides whether to retrain the dynamic model. While the amount of data calculated by the model reached two hundred, model retraining was triggered in the study. The testing A similar result was obtained from the experiments using O 3 gas models. The ensemble model's validation performance (34.74% MAPE) for O 3 gas is not the best one compared to the other four single weak models, while the performances of the single-memory-cell model using GRU and Bi-LSTM were MAPEs of 29.78% and 34.07%, respectively. However, in the testing phase, all models examined new data (i.e., test dataset) for testing, and the integrated model had the best MAPE (37.14%) again. The O 3 gas ensemble model is more applicable and keeps the model's generalizability to new data.
In the tests of the NO 2 gas models, the ensemble model has the best performance in both the validation and testing sets, and the MAPE values were 43.67% and 67.37%, respectively. The performance of the NO 2 ensemble model decreases less than that of the single weak model in the test set compared with that of the single weak model in the validation set, indicating that the NO 2 ensemble model is better than that of the single weak model in the application of new data. However, there is still room for improvement compared with the CO gas and O 3 gas ensemble models.
It can be summarized that the static model part of the ensemble models contains the fundamental properties of the gas sensors found by the optimized single weak models. Further, the dynamic model part is a combination of the calibrated models and thus can achieve better performance by tuning weight coefficients. According to these experimental results of different gas ensemble models, the model deviation of a single-memory-cell model was effectively offset by integrating more types of recurrent neural models. Thus, ensemble models can have a better generalizability for handling individual differences in the commercial sensors of the same module; therefore, the ensemble model was chosen as the best model for gas concentration detection.

Ensemble Model Retraining
In the previous section, the ensemble model had the best performance for each gas, but further tests observed a decayed performance while more new data were input. The reason is that the data used to train the ensemble RNN models were collected from January 5 to March 23, containing only a partial property in a whole year. Thus when the new data collected from March 23 to April 14 was input into the model, the performance became unstable, because the atmospheric conditions changed across different seasons. Therefore, we propose a periodical retraining procedure. The periodic retraining procedure regularly updates the dynamic model's weights to conform to the deviation of the characteristic distribution of the new dataset collected in the atmospheric environment in each period. It extends the life cycle of the integrated gas models.
The flow chart of an optimal gas model is set out in Figure 15. Data engineering is the first step in dealing with a new dataset, including preprocessing, outlier cleaning, and normalization, as mentioned in Section 2. The second step, model engineering, contains a recursive work-model online, performance monitoring, and model retraining. We obtained the best ensemble model through a training series, as shown in Sections 4.1-4.3. Then, the model was activated to calculate gas concentrations; this model was named model online. Under regular monitoring, the model's performance declined with time; therefore, the program counts on the amount of data that the model has calculated and decides whether to retrain the dynamic model. While the amount of data calculated by the model reached two hundred, model retraining was triggered in the study. The testing dataset became a new training dataset in the retraining procedure. The upper and lower limits of the scaling scale of the new training dataset were consistent with the original training dataset to ensure the consistency of the new and old datasets. We used the new training dataset to train the dynamic model again and then updated the weight coefficients (as shown in Section 4.3). By periodically updating the dynamic model's weight coefficients, the gas model's performance at each stage is stabilized and extends the gas model's life cycle.  The ensemble model for CO gas was used to demonstrate the performance retrained model. Figure 16 shows the actual CO concentration values provided EPA, and the lines in different colors present the definition of the dataset. The initia weak models were trained by the training dataset (the blue line, which contain pieces of data), and the ensemble model for CO was defined. Then the weights dynamic model are defined by updating the weights of the original model accord the newly added data in each period (i.e., the orange, green, red, purple, and brown The amount of data for each period is 200, and the initial learning rate of the mode times that of the original static model. The retraining method of the O3 and NO2 ga els is the same as that of the CO gas model: use the previous period's data as new tr data, and retrain the model through the new data of the previous period to impro generalization ability of the retrained model in the next period. The ensemble model for CO gas was used to demonstrate the performance of the retrained model. Figure 16 shows the actual CO concentration values provided by the EPA, and the lines in different colors present the definition of the dataset. The initial single weak models were trained by the training dataset (the blue line, which contains 1500 pieces of data), and the ensemble model for CO was defined. Then the weights of the dynamic model are defined by updating the weights of the original model according to the newly added data in each period (i.e., the orange, green, red, purple, and brown lines). The amount of data for each period is 200, and the initial learning rate of the model is 0.1 times that of the original static model. The retraining method of the O 3 and NO 2 gas models is the same as that of the CO gas model: use the previous period's data as new training data, and retrain the model through the new data of the previous period to improve the generalization ability of the retrained model in the next period. dynamic model are defined by updating the weights of the original model according to the newly added data in each period (i.e., the orange, green, red, purple, and brown lines). The amount of data for each period is 200, and the initial learning rate of the model is 0.1 times that of the original static model. The retraining method of the O3 and NO2 gas models is the same as that of the CO gas model: use the previous period's data as new training data, and retrain the model through the new data of the previous period to improve the generalization ability of the retrained model in the next period.   Figure 17a shows the significant differences between the output concentrations and the actual values. The performance was estimated using the residual sum of squares of the linear regression model, i.e., the R 2 value. In the fourth interval, the R 2 value of the original CO gas model is negative, hinting at the failure of the linear regression model, and the model's overall performance is not stable enough to handle changes in new data. The results of the O3 and NO2 gas models are similar, and the results are summarized in Table 5. The retrained model updated the gas model through the last period's data and dynamically corrected the concentration interpretation of the static model in each period. Thus Figure 17b shows a smaller difference   Figure 17a shows the significant differences between the output concentrations and the actual values. The performance was estimated using the residual sum of squares of the linear regression model, i.e., the R 2 value. In the fourth interval, the R 2 value of the original CO gas model is negative, hinting at the failure of the linear regression model, and the model's overall performance is not stable enough to handle changes in new data. The results of the O 3 and NO 2 gas models are similar, and the results are summarized in Table 5. The retrained model updated the gas model through the last period's data and dynamically corrected the concentration interpretation of the static model in each period. Thus Figure 17b shows a smaller difference in each period. The retrained model has an average R 2 of 0.73 over the four intervals. As shown in Table 5, the long-term average R 2 of the four intervals of the retrained O 3 and NO 2 gas ensemble model are 0.51 and 0.37, respectively, which are better than the results of the original ensemble models. in each period. The retrained model has an average R 2 of 0.73 over the four intervals. As shown in Table 5, the long-term average R 2 of the four intervals of the retrained O3 and NO2 gas ensemble model are 0.51 and 0.37, respectively, which are better than the results of the original ensemble models.
Compared with the original model without retraining, the sensor performance of the retrained model is much more stable in the different periods. The retraining procedure can update the parameters of the dynamic model in the ensemble model, meaning that the machine learning model with regular retraining is a potential solution to calibrate sensors deployed in different environments (area or season). Thus, the model's life cycle is effectively prolonged.    Compared with the original model without retraining, the sensor performance of the retrained model is much more stable in the different periods. The retraining procedure can update the parameters of the dynamic model in the ensemble model, meaning that the machine learning model with regular retraining is a potential solution to calibrate sensors deployed in different environments (area or season). Thus, the model's life cycle is effectively prolonged.
Furthermore, the contribution of each static model to the final output value of the ensemble model was estimated. The degree of the gradient contribution was used, which was obtained by differentiating the output of the dynamic model with respect to that of the static model as follows: gradient contribution = ∂Dynamic model output ∂Static model output (14) The gradient contribution of every data point in each interval was calculated, summed, and averaged, and the results are shown in Figure 18. In the original gas models, the gradient contribution of each static model to the dynamic model is unchanged in all periods. The original gas model interprets gas concentration on the new data with the same gradient contribution in each period; the model cannot be dynamically adjusted with time, resulting in poor model performance and generalizability on the new dataset. Compared with the original gas model, the retrained gas model uses the new data of the previous period to update the weight coefficients of the dynamic model. Thus, the retrained gas model adjusts the gradient contribution of the static model and corrects the static model output in each period; the improved concentration estimations were observed in Figure 17 and Table 5. The retraining procedure provides the ensemble models better generalizability on new data. Compared with the original gas model, the retrained gas model uses the new data of the previous period to update the weight coefficients of the dynamic model. Thus, the retrained gas model adjusts the gradient contribution of the static model and corrects the static model output in each period; the improved concentration estimations were observed in Figure 17 and Table 5. The retraining procedure provides the ensemble models better generalizability on new data.

Feasibility of Onsite Gas Sensing
This study discussed the feasibility of using IoT gas sensors in a natural atmospheric environment. According to the Kernel Density Estimation (KDE) [42] of humidity and temperature, as shown in Figure 19, our sensors were in a rapid-change humidity and temperature environment instead of a constant temperature/humidity chamber in a labor-

Feasibility of Onsite Gas Sensing
This study discussed the feasibility of using IoT gas sensors in a natural atmospheric environment. According to the Kernel Density Estimation (KDE) [42] of humidity and temperature, as shown in Figure 19, our sensors were in a rapid-change humidity and temperature environment instead of a constant temperature/humidity chamber in a laboratory. Under these varying conditions, our ensemble model can still maintain a relatively stable performance. The ensemble machine learning model can apply IoT sensors to achieve accurate air pollution detection. Figure 18. The gradient contributions of the static models in the original ensemble model for (a) CO, (b) O3, and (c) NO2 gases and in the retrained ensemble models for (d) CO, (e) O3, and (f) NO2 gases.

Feasibility of Onsite Gas Sensing
This study discussed the feasibility of using IoT gas sensors in a natural atmospheric environment. According to the Kernel Density Estimation (KDE) [42] of humidity and temperature, as shown in Figure 19, our sensors were in a rapid-change humidity and temperature environment instead of a constant temperature/humidity chamber in a laboratory. Under these varying conditions, our ensemble model can still maintain a relatively stable performance. The ensemble machine learning model can apply IoT sensors to achieve accurate air pollution detection. Figure 19. Distribution of (a) humidity and (b) temperature in each period.
A comparison of the sensing performance and equipment cost of the gas sensing equipment of other manufacturers is shown in Table 6 (reference source: AQ-SPEC [43]). The R 2 performance of our low-cost IoT device with AI assistance in all target gases is Figure 19. Distribution of (a) humidity and (b) temperature in each period.
A comparison of the sensing performance and equipment cost of the gas sensing equipment of other manufacturers is shown in Table 6 (reference source: AQ-SPEC [43]). The R 2 performance of our low-cost IoT device with AI assistance in all target gases is slightly lower than that of other gas-sensing devices, but the gap is not significant. Considering the cost of a large number of deployed gas detection equipment, the lowcost IoT device in this study is far less expensive than other brands of gas detection equipment. Based on cost performance advantages, more low-cost IoT devices such as the one developed in this study can be deployed to improve the spatial density information of CO, O 3 , and NO 2 gases. slightly lower than that of other gas-sensing devices, but the gap is not significant. Considering the cost of a large number of deployed gas detection equipment, the low-cost IoT device in this study is far less expensive than other brands of gas detection equipment. Based on cost performance advantages, more low-cost IoT devices such as the one developed in this study can be deployed to improve the spatial density information of CO, O3, and NO2 gases.

Conclusions
This paper studied ensemble models of RNN for onsite gas concentration detection using low-cost commercial sensors. IoT sensing devices for CO, O3, and NO2 were designed, fabricated, and then deployed in the field to monitor atmospheric air conditions. The time-sequence data of concentration, temperature, and humidity were collected for three months. Single weak RNN models for the three target gases were developed first, slightly lower than that of other gas-sensing devices, but the gap is not significant. Considering the cost of a large number of deployed gas detection equipment, the low-cost IoT device in this study is far less expensive than other brands of gas detection equipment. Based on cost performance advantages, more low-cost IoT devices such as the one developed in this study can be deployed to improve the spatial density information of CO, O3, and NO2 gases.

Conclusions
This paper studied ensemble models of RNN for onsite gas concentration detection using low-cost commercial sensors. IoT sensing devices for CO, O3, and NO2 were designed, fabricated, and then deployed in the field to monitor atmospheric air conditions. The time-sequence data of concentration, temperature, and humidity were collected for three months. Single weak RNN models for the three target gases were developed first, slightly lower than that of other gas-sensing devices, but the gap is not significant. Considering the cost of a large number of deployed gas detection equipment, the low-cost IoT device in this study is far less expensive than other brands of gas detection equipment. Based on cost performance advantages, more low-cost IoT devices such as the one developed in this study can be deployed to improve the spatial density information of CO, O3, and NO2 gases.

Conclusions
This paper studied ensemble models of RNN for onsite gas concentration detection using low-cost commercial sensors. IoT sensing devices for CO, O3, and NO2 were designed, fabricated, and then deployed in the field to monitor atmospheric air conditions. The time-sequence data of concentration, temperature, and humidity were collected for slightly lower than that of other gas-sensing devices, but the gap is not significant. Considering the cost of a large number of deployed gas detection equipment, the low-cost IoT device in this study is far less expensive than other brands of gas detection equipment. Based on cost performance advantages, more low-cost IoT devices such as the one developed in this study can be deployed to improve the spatial density information of CO, O3, and NO2 gases.

Conclusions
This paper studied ensemble models of RNN for onsite gas concentration detection using low-cost commercial sensors. IoT sensing devices for CO, O3, and NO2 were designed, fabricated, and then deployed in the field to monitor atmospheric air conditions. The time-sequence data of concentration, temperature, and humidity were collected for Igienair (Zaack AQI) $3000 0.84 to 0.87 0 0.53 to 0.58

Conclusions
This paper studied ensemble models of RNN for onsite gas concentration detection using low-cost commercial sensors. IoT sensing devices for CO, O 3 , and NO 2 were designed, fabricated, and then deployed in the field to monitor atmospheric air conditions. The timesequence data of concentration, temperature, and humidity were collected for three months. Single weak RNN models for the three target gases were developed first, and then the ensemble models combining four types of RNN models were defined and studied. Results showed that the ensemble models improved the sensing performance for all gases. The results show that integrating four types of RNN models can significantly improve the performance in the testing set, showing a better result than any single RNN model. The static model part of the ensemble models contains the fundamental properties of the gas sensors, and the dynamic model part is a combination to achieve better performance. Thus, the ensemble model has a better generalizability for the commercial sensors for gas concentration detection.
Furthermore, a retraining procedure was designed as the optimal model to maintain stable model performance and prolong the life cycle. The performance of the original model without retraining is volatile in different periods, while the retraining model can solve this problem well. The periodic retraining procedure can update the parameters of the dynamic model in the ensemble model, meaning that the trained machine learning models can be easily applied while the sensors are deployed in a different environment (area, season). The results showed that the long-term average determination coefficient (R 2 ) of the CO gas model reaches 0.73, it reached 0.51 for the O 3 gas model and 0.37 for the NO 2 gas model. The performance is still limited by the native sensitivity and the target selectivity. However, with the help of our ensemble models, these sensors have a specific correlation with the actual concentration announced by the EPA. The results promise accurate air pollution detection feasibility using commercial gas sensors in natural changing temperatures and humidity environments.