Assessment and Calibration on Low-cost PM 2.5 Sensor using Machin Learning (Hybrid-LSTM Neural Network): Feasibility Study to Build Air Quality Monitoring System

: Although commercially-available low-cost air quality sensors have low accuracy, the sensor system are being used to collect the data for the regulation of PM 2.5 emission caused by industrial activities or to estimate the personal exposure for PM 2.5 . In this work, to solve the accuracy problem of low-cost PM sensor, we developed a new PM2 2.5 calibration model by combining the deep neural network (DNN) optimized in calibration problem and a LSTM optimized in time-dependent characteristics. First, two datasets were generated to test the accuracy performance and generalization performance of the PM 2.5 calibration machine learning (ML) model. The PM 2.5 concentrations, temperature and humidity by low-cost sensor and gravimetric-based PM 2.5 measuring instrument were sampled for a sufficiently long time. The proposed model was compared with benchmark (multiple linear regression model) and low-cost sensor results. For root mean square error (RMSE) for PM 2.5 concentrations, the proposed model reduced 41-60% of error compared to the raw data of low-cost sensor, and reduced 30-51% of error compared to the benchmark model. R 2 of ML model, MLR and raw data were 93, 80 and 59 %. Also, the developed model still showed consistent calibration performance when calibrated with new sensors in different locations. Low-cost sensors combined with ML model not only can improve the calibration performance of benchmark, but also can be applied to the sensor monitoring systems for various epidemiologic investigations and regulatory decisions.


Introduction
Air pollution caused by industrialization and urbanization is causing serious environmental and health problems. For example, fine particulate matter (PM) is generated from various emission sources of industrial activities such as industry, transportation and combustion. In particular, fine dust with a diameter of less than 2.5 μm (PM2.5) causes various diseases such as cardiovascular diseases, asthma, and neurotoxicity because it is directly exposed to the lungs and circulatory system. Therefore, it is very important to get the data for regulation on industrial emission by monitoring the PM2.5 concentration generated by the emission activity [1].
In South Korea, gravimetric-based PM2.5 measuring instrument has been used as national reference method (NRM) to monitor the PM2.5 concentrations. However, installing NRM equipment at each of short-distance sampling locations is expensive (>$10,000). This limits obtaining PM2.5 information at the community level.
Light-scattering low-cost PM2.5 sensors can make a possibility to solve the cost problem. Since the low-cost sensor can obtain the PM concentration in real-time, it has been used in various studies such as personal exposure assessment [3,4], indoor exposure estimation [5] and outdoor monitoring [6,7,8]. However, low-cost sensors are sensitive to environmental variables such as temperature and humidity due to their light scattering method. Badura et al. [9] conducted the validation test to evaluate the reliability of the low-cost sensors in outdoor field during a long period using the national standard measuring equipment. Above 80% relative humidity, raw data by low-cost sensors observed an apparent overestimation of PM2.5 concentration measurements.
Vogt et al. [10] performed the comparison of three models of low-cost PM2.5 sensors (Plantower 5003, Sensirion SPS30 and Alphasense OPC-N3) against the gravimetric device in outdoor field. Among the low-cost sensors, the SPS 30 sensor showed high accuracy in PM2.5 concentration measurement and high correlation among individual sensors. However, it has been shown that the low-cost sensor has lower accuracy than the national standard measurement network equipment for a certain measurement period due to the limitations of the physical characteristics of the sensor.
Zusmana et al. [11] developed metropolitan region-specific calibration models based on the multi-linear regression method (MLR) and the time-series data by various low-cost sensors (PM2.5, temperature and humidity) and the NRM network equipment (PM2.5) to solve the sensitivity problem driven by environmental variables. The calibration model confirmed the possibility of applying a low-cost sensor at the community-level by solving the accuracy degradation caused by the physical characteristics of low-cost sensor. However, the metropolitan region-specific calibration model still showed low accuracy (R2 = 0.67-0.84) in a specific data period.
In this study, we develop a low-cost PM2.5 calibration model using machine learning (ML) methodology to solve the limitation of accuracy degradation due to the light-scatter physical characteristics of low-cost sensors. This work has two purposes; (1) development on a low-cost calibration machine learning (ML) model with higher accuracy and comparison with the MLR modeling method and proposed ML model. (2) A generalization test to show if the PM2.5 calibration machine learning algorithm is applicable to the new spatial-temporal field condition.
The process of this study is shown in Figure 1. First, low-cost Sensirion SPS30 and NRM equipment were colocated to develop the ML model. If high concentrations of PM2.5 are not sampled, incorrect performance evaluation results may be obtained [11]. Therefore, the experiment is carried out until PM2.5 samples are obtained more than 50 ∕ . ML model and MLR model are developed based on the obtained data sample (train set), and model performance is compared using the other independent data not used for model development (test set). The MLR is used as a benchmark for evaluating the newly developed calibration ML model. Finally, generalization tests are performed to confirm that the optimized ML model is applicable in the new location-period. (2) Machine learning model development based on the collected data set. Calibration performance from the developed model is compared with raw data by low-cost sensor and calibration results by benchmark method (multi-linear regression method). Also, the proposed machine learning method extends the new sensor to test model generalization.
The novelty of this study is divided into three categories; (1) Improvement of the calibration performance against the existing calibration model (MLR) by using a new machine learning algorithm. it is very important because it does not only provide reliable community-level monitoring, but also can help exposure assessment in epidemiological studies. (2) Development on low-cost sensor ML model using sensor data and a gravimetric sampling device for PM2.5. To our knowledge, there is very scarce literature calibrating low-cost sensor systems against reference gravimetric methods. (3) Generalization test for whether the developed ML model is applicable at the new measurement location and time. Most of papers for the low-cost sensor calibration confirm the validity of the calibration model at the only same location and time, but not at the new locations and times. Generalization test is essential to build a monitoring sensor-network that provides PM2.5 concentrations at various locations.

Air quality measurement instruments
In general, ML algorithm functions the relationship between input variables and output variables. In this study, input variables were set as PM2.5 by Sensirion low cost SPS 30 (<$50) and temperature and humidity by Sensirion SHT85 (< $30) to model the complex relation for environmental variables and PM2.5 of light-scatter method.
It is very important to have consistent precision among low-cost sensors in order to build a monitoring sensor network system by ML model. Because the low-cost Sensirion SPS 30 has excellent inter-sensor precision [10], the ML model based on SPS 30 have possibility of maintaining the consistent performance even with new sensors. Therefore, the PM2.5 measurement results by the SPS 30 sensor is set as the input variable. Environmental variables such as temperature and humidity have an effect on decreasing the accuracy of low-cost sensor based on light-scatter method [12]. Therefore, two environmental variables were also set as input conditions to model the complex physical characteristics among PM2.5, temperature and humidity.
The target variable is PM2.5 concentration measured from the gravimetric instrument. The quality of target variable plays an important role to develop ML model with high calibration performance. The gravimetric method is based on TEOM (Tapered Element Oscillating Microbalance) technology, which intakes the atmospheric air through a filter, heats it, continuously measures the filter weight, and calculates the mass concentration of PM in near real time. It has been used in many countries to monitor PM2.5 concentrations in field because TEOM has high accuracy in field test compared the various air quality devices [13].

Dataset for calibration machine learning (ML) modeling
Dataset is required to develop and test the ML model. Dataset includes the labeled time series type by the aforementioned input variables (PM2.5 of SPS 30, temperature and humidity of SHT 85) and target variables (TEOM). In general, the dataset is divided into a training set for estimating the ML model parameters and a test set for evaluating the ML model calibration performance. In this study, two datasets were generated to test the accuracy performance and generalization performance of the PM2.5 calibration ML model.
For validating the accuracy performance of ML model, the test set must have different combinations of variables compared with the train set. In addition, ML model must consider adequate design space between training set and test set. For example, if a variable with a higher concentration range than the training set used for model development is input to the developed ML model, the calibration performance has possibility of deteriorate [14]. In other words, the training set must contain a sufficiently high concentration of PM2.5. Also, data set with a small concentration range of PM2.5 may give incorrect evaluations of certain metrics, such as R2 [11].
In this study, Low-cost Sensirion SPS30 and SHT85and NRM equipment were colocated to develop the ML model as shown Figure 2. The sampled data of time series type (PM2.5 of SPS 30, temperature and humidity of SHT 85 and PM2.5 of TEOM) are shown in Figure 3. The data sampling period was measured over 110 days. 77 days were designated as training set and 33 days were designated as test set. The maximum PM2.5 concentrations in the train set and test set were sampled for a sufficiently long time to include PM2.5 data higher than 50 μg ∕ m at least, which is scenario of high concentration determined by world health organization (WHO). Maximum PM2.5 concentration measured by Gravimetric method is 115 μg ∕ m . Therefore, the concentration of PM2.5 sampled in this work is enough high to validate model calibration performance.   Table 1 represents the statistical information (maximum, minimum, average and standard deviation for the collected dataset) of the corresponding dataset. We correlate the complexity between input variables (data of low cost sensors) and target variable (data of high accuracy device) through ML method based on the data set. In addition, additional dataset was sampled with new sensors in different locations to validate the generalized performance of ML model. Generalization test is essential to build a monitoring sensor-network that provides PM2.5 concentrations at various locations.

Machine learning algorithm
The measured PM2.5 concentration and environmental variable data (temperature and humidity) have time series characteristics. That is, air quality data has a time-dependent characteristic, which is a relationship between past data and current data. Among machine learning algorithms, the long short-term memory (LSTM) neural network is an algorithm optimized for time-dependent characteristics. In this study, in order to calibrate the low-cost PM2.5 sensors, we develop a new PM2.5 calibration model by customizing the deep neural network (DNN) optimized in calibration problem and a LSTM optimized in time-dependent characteristics. The overall system architecture of the PM2.5 calibration HybridLSTM algorithm proposed in Figure 4. (Step 1) PM2.5, temperature, and humidity of the low-cost sensor data described in Section 2.1 are input to the network in the form of a time series including historical trends for 24 hours. Values entered with historical data provide time-dependent properties between time series data through LSTM cells. ( Step 2) The time-dependent values are passed to the DNN architecture, and the neural network parameters are trained to minimize the differences between the values of the target variables (TEOM PM2.5) and results predicted by the model. The key to the Hy-bridLSTM algorithm is to approach the calibration problem differently from the application of conventional LSTM approach. For example, HybridLSTM algorithm is to make the time series of the target variable (TEOM PM2.5) the same as the last time series of the input variables (low cost PM2.5, temperature and humidity). The results feed the DNN to get the complex non-linearity between input and target. Since the theory for the timedependent of LSTM [15,16] and the process of calculation of complex nonlinearity between input and output of DNN [17,18] have been well described in many studies, in this study, the description of theory is omitted in order to avoid unnecessary repetition.
Since calibration accuracy by ML model varies depending on the combination of hyper-parameters (such as learning rate, network architecture, batch size, optimization function, etc.), it is very important to find an optimized hyper-parameter. however, there is a limit to comparing a huge number of combinations. Therefore, many studies determine the hyper-parameter by trial and error methods [17,18]. In this study, hyper-parameters with optimal ML calibration performance were determined by changing various hyperparameter combinations. The hyper-parameter optimization results were summarized in Table 2. Callback is a parameter for how long the LSTM cell gives time-dependency, and the number of DNN layer and node are parameters that determine the degree of nonlinearity between the input variable and the output variable. Too many layers cause overfitting and deteriorate the calibration performance of new input data. The learning rate is that the neural network reduces the loss between the input and output. A learning rate having too large prevents the solution from convergence. Batch size represents the size divided among the entire train set for training the neural network. Epoch refers to the number of iterations to train a neural network. The Adam algorithm was used to optimize the neural network because Adam method showed high convergence and accuracy among many algorithms in regression problem [17].

Benchmark method
The multi-linear regression (MLR) method, which showed high correction performance in the previous studies [11], was used as a benchmark to evaluate the performance of the hybridLSTM model proposed in this study. Benchmark method is the same as the equation below; where, , and are SPS30 PM2.5, temperature and humidity, and is the result by TEOM equipment PM2.5. and are parameters optimized by the dataset described above. The benchmark model was developed using the same training data used to develop HybridLSTM model, and the model performance is evaluated using the same test data.
The metrics used for model development and evaluation were R 2 and root mean square error (RMSE). In general, many performance metrics are used to evaluate regression models, but in evaluating sensor calibration performance, two indicators can be sufficiently explained [11]. The metrics for R 2 and RMSE are expressed as follows; Where, , , ̅ and n represent the TEOM sample, the corrected result by the model, the average of the TEOM samples, and the total number of samples, respectively.

3.1.
Comparison of accuracy among proposed model, benchmark and low-cost sensor Figure 5 shows the learning process of the ML model during 10 training experiment using the aforementioned optimal hyper-parameter. The validation set determines whether the model have an overfitting problem for new PM2.5 data. The proposed model for the loss of the validation set and the train set sufficiently was converged during the repeated training experiment as shown Figure 5. In other words, the developed model represents a robust model without overfitting. Therefore, researchers can develop and use a consistent model using the hyper parameter optimized in this study to calibrate the low-cost PM2.5 sensors. Figure 5. Results of robust model test. The robust test is performed on 10 repeat training experiments to validate that a ML model have the consistent calibration performance. Figure 6 show the RMSE as a result of calibrating the test set at 7-day intervals using the optimized model, benchmark, and SPS-30 sensor. The proposed model with time-dependent characteristics showed higher calibration performance for all periods than the benchmark model and raw data. The quantitative error reduction rate for each of periods is shown in Table 3. The proposed model reduced 41-60% of error compared to the raw data (low-cost sensor), and reduced 30-51% of error compared to the calibration results by benchmark model. Figure 6. Results of error comparison of HybridLstm, benchmark and raw data at 1-week intervals based on the aforementioned test set.  Figure 7 represents the comparison results of the developed model, benchmark model, and raw sensor against all samples of TEOM. Raw data by low-cost PM2.5 sensors showed a huge overestimation in more than 50 um/m3. The incorrect monitoring in high-concentration situations not only leads to incorrect exposure assessment, but also leads to errors in determining government regulation. The benchmark method underestimated compared to the gravimetric method. On the other hand, the proposed model showed small variation results when compared to the TEOM results in the high concentration as well as in the low concentration section. HybridLSTM had the most similar results to TEOM, and R 2 was about 93 %.

Generalization test of develop machine learning model
A high-reliability sensor monitoring network can be established only when the developed PM calibration model shows generalized calibration performance for new locations and devices. Therefore, in this study, a model generalization test was performed to see if the verified model with high calibration performance consistently maintains high performance in new locations and sensors.
The generalization test was performed at a new location by using deep learning model developed from the previous accuracy evaluation process as shown Figure 9. The two new positions are approximately 10 km apart and are located at the school (Figure 9 (a)) and near the road (Figure 9 (b)), respectively. Samples were collected until the peak PM2.5 concentration reached higher than 70 μg ∕ m for correct calibration performance evaluation. The number of samples for generalization test set is 504. The statistical information (minimum, maximum, average and standard deviation) of test set are shown in Figure 9. In near load, the variation of low-cost sensor was higher than the position of school due to the emission of transportation. However, the performance (R2 > 90 %) of the ML model of is maintained despite the two different positional characteristics and new sensor equipment. It was possible to improve the accuracy of low-cost PM2.5 sensors as well as build a reliable sensor monitoring network using the machine learning methodology proposed in this study.

Discussion and conclusion
In this study, a new PM2.5 machine learning calibration model (HybridLSTM) was developed and the calibration performance was compared with the existing metropolitan region-specific calibration model and raw data. Also, generalized performance test was performed for validating the possibility of establishing a sensor monitoring network. The results performed are summarized as follows.
(1) HybridLSTM PM2.5 calibration model with time-dependent characteristics showed optimal performance in improving the accuracy of low-cost PM2.5 sensors. For RMSE, the proposed model reduced 41-60% of error compared to the raw data of low-cost sensor, and reduced 30-51% of error compared to the benchmark model. Raw data by lowcost PM2.5 sensors showed a huge overestimation compared to the gravimetric method in samples of high PM2.5 concentrations. The benchmark method showed the underestimated calibration results compared to the measurement results by the gravimetric method. The incorrect monitoring in high-concentration situations not only leads to incorrect exposure assessment, but also leads to errors in determining government regulation. The proposed model showed the small variation with NRM method.
(2) Most of papers for the low-cost sensor calibration confirm the validity of the calibration model at the only same location and time, not the new location and time. When the developed model was applied to a new position and sensor, it showed consistent calibration performance. Low-cost sensors combined with machine learning models not only exceed the performance of existing benchmarks, but also sensor monitoring systems for various epidemiologic investigations and regulatory decisions can produce higher reliability results.
The proposed model solves the existing accuracy limitations of low-cost sensor and can provide results with high reliability not only for monitoring but also for research in various environmental fields. Although generalized performance was shown in two locations in this study, the method proposed in this study needs to be verified in more locations to build a more reliable sensor monitoring network. Therefore, in future work, we plan to test whether low-cost PM2.5 sensors combined with machine learning at various locations and time including different seasons can be applied to sensor network construction. When constructing a sensor network with high resolution based on high accuracy, we will test the possibility of providing air quality information to areas where sensors are not installed through the interpolation method.  Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: The data presented in this work are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.