Next Article in Journal
Effects of Different Ethanol/Diesel Blending Ratios on Combustion and Emission Characteristics of a Medium-Speed Diesel Engine
Next Article in Special Issue
Comparison Process of Blood Heavy Metals Absorption Linked to Measured Air Quality Data in Areas with High and Low Environmental Impact
Previous Article in Journal
Detoxification of Copper and Chromium via Dark Hydrogen Fermentation of Potato Waste by Clostridium butyricum Strain 92
Previous Article in Special Issue
Impact of Air Pollution on Global Burden of Disease in 2019
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Prediction of PM2.5 Concentration on the Basis of Multi-Time Scale Fusion

School of Computer & Information Engineering, Heilongjiang University of Science & Technology, Harbin 150027, China
*
Author to whom correspondence should be addressed.
Processes 2022, 10(1), 171; https://doi.org/10.3390/pr10010171
Submission received: 20 December 2021 / Revised: 10 January 2022 / Accepted: 11 January 2022 / Published: 17 January 2022
(This article belongs to the Special Issue Air Quality Monitoring for Smart Cities and Industrial Applications)

Abstract

:
Long-term prediction of hour-concentration of PM2.5 (particles in atmospheric suspension with effective dimensions equal or lower than 2.5 microns) is of great significance for environmental protection and people’s health. At present, the prediction of hour-concentration of PM2.5 is mostly single-step prediction, which is to predict PM2.5 concentration at a future time point based on a period of historical data. In this paper, a model based on multi-time scale fusion is proposed to study single-step prediction and multi-step prediction, respectively. Experimental results show that the proposed model is better than stacked LSTM and CNN-LSTM in predicting PM2.5 hour-concentration.

1. Introduction

At present, the traditional methods in the field of PM2.5 concentration prediction research mainly combine four conventional methods formed by meteorology, environmental science, mathematics, and computational science. That is, empirical model prediction based on historical data and statistical methods, probability model prediction based on statistical and mathematical methods or models, prediction based on synthetic methods, and prediction based on conventional machine learning models. With the rapid development of deep learning, Fan et al. used the recurrent neural network model for predicting PM2.5 concentration in the future 1 hour based on air quality and meteorological data in the past 48 h [1]. Qi et al. proposed the model GCN-LSTM (A model based on Graph Convolutional Network and Long Short-Term Memory), which proved that the model was superior to CNN (Convolutional Neural Networks) and LSTM in predicting the air quality in the future one hour [2]. He et al. combined the wavelet transform with the LSTM (Long Short-Term Memory) model and took the daily average concentration as the input to predict the pollutant concentration of the next day, and proved that the proposed model was superior to MLR (Mixed Logistic Regression), LSTM and WT-MLR (Mixed Logistic Regression based on Wavelet Transform) [3]. Huang and Kuo constructed the model APNet (Attention-based Parallel Networks) and proved through experiments that the model was superior to CNN and LSTM in predicting PM2.5 concentration in the future one hour [4].
However, the above studies only focus on PM2.5 concentration of single-step prediction, did not predict the PM2.5 concentration for a period of time in the future, that is, did not make a multi-step prediction. To solve this problem, this article builds a CNN-LSTM network on the basis of the combination of attention mechanism, the multi-time scale fusion model of multi-time scale features is integrated. The aim is to accurately predict the value of PM2.5 corresponding to each hour in a continuous period of time in the future. Through experiments, the validity and superiority of the method proposed in this paper are verified.

2. PM2.5 Prediction Model Based on Multi-Time Scale Fusion

2.1. LSTM (Long Short-Term Memory)

LSTM is a variant produced to solve the long-term dependence problem that RNN (Recurrent Neural Network) cannot solve, which effectively alleviates the gradient explosion problem that RNN cannot avoid and can better predict the time series [5]. The architecture of an LSTM memory cell is shown in Figure 1, where each cell has three “gate” structures, include, the input gate, the forget gate, and the output gate. A chain of repeating cells forms the LSTM layer. The calculation process of the spatiotemporal feature matrix X = [x1, x2, …, xt] in the LSTM layer is given in Equations (1)–(6). Equation (1) represents the forget gate and it decides what information should be thrown away from the cell state. The directions are: Input ht−1 and xt into the forget gate, and calculate the output value ft of the forget gate through the sigmoid activation function. Equations (2) and (3) represent the input gate, which decides what new information should be stored in the state of cell. The directions are: Input ht−1 and xt into the input gate, and get it and c ˜ t through the sigmoid activation function and tanh activation function respectively. Equation (4) uses the output of the forget gate and the input gate to update the current cell state. Equations (5) and (6) together constitute the output of the current cell. The directions are: First, input ht−1 and xt into the output gate, and calculate the output ot of the output gate through the sigmoid activation function. Then get the current cell output ht by calculating the output of the output gate and the state of the current cell.
The following Equations (1)–(6) describe the internal calculation process of an LSTM neural unit:
f t = σ ( w f . [ h t 1 , x t ] + b f )
i t = σ w i . h t 1 , x t + b i
c ˜ t = tan h ( w c . h t 1 , x t + b c )
c t = f t c t 1 + i t c ˜ t
o t = σ w o . h t 1 , x t + b o
h t = o t tan h ( c t )
where ft is the output of forget gate, the value range of ft is (0,1); it is the output of input gate, the value range of it is (0,1); c t is the state of the current cell; ot is the output of output gate, the value range of ot is (0,1); h t 1   is the output of the previous cell; h t is the output of the current cell; wf, wi, wc, and wo are the weight matrices for input vector xt at time step t; bf, bi, bc, and bo are the bias vectors; σ is sigmoid activation function; tanh is hyperbolic sine function; ⊙ stands for element-wise multiplication of the matrix; ⊗ stands for multiplication; ⊕ stands for the sum operation;

2.2. Ensemble Empirical Mode Decomposition (EEMD)

As a noise-assisted signal decomposition method, EEMD adds white noise to the original signal and performs EMD decomposition on it, and finally calculates lumped average using the results of multiple decomposition [6].
The specific operation steps are as follows:
(1) Set the overall average times M;
(2) Add a white noise ni(t) with standard normal distribution to the original signal x(t) to generate a new signal:
x i t = x t + n i t
where ni(t) is i-th additive white noise sequence; xi(t) is the additional noise signal of the i-th test, i = 1, 2, 3, … M.
(3) EMD decomposition is performed on the obtained signal xi(t) containing noise to obtain the form of their respective IMF (Intrinsic Mode Function) sum:
x i t = j = 1 J c i , j t + r i , j t
where ci,j(t) is the J-th IMF obtained by decomposing after adding white noise for the i-th time. ri,j(t) is the residual term represents the average trend of the signal, and j is the number of IMF;
(4) Repeat steps (2) and (3) for M times, decompose and add white noise signals with different amplitudes each time, and the set of IMF is: c 1 , j t , c 2 , j t , c M , j t , where j = 1, 2, 3, … J;
(5) Based on the principle that the statistical average value of unrelated sequences is zero, the above IMF is calculated by aggregate average to obtain the final IMF, namely:
c j t = 1 M i = 1 M c i , j t
where cj(t) is the j-th IMF, i = 1, 2, … M, j = 1, 2, … J;

2.3. Attention Mechanism

The attention mechanism mimics the internal process of biological observation behavior [7]. His principle is through a set of weights α T s e T t = α T e T t , α T e + 1 T t , α T s T t to express the value of a certain time slice in the target sequence x T t and the dependent sequence x T s e = [ x T e , x T e + 1 , , x T s ] relevance. Eeach element in x T t and x T s e has the same dimension. Map x T t and x T s e to the parameter space:
  Query = x T t W Q
  Key = x T s e W k
Value = x T s e W v
where WQ is dx*dq dimensional Query parameter matrix; Wk is dx*dk dimensional Key parameter matrix; Wv is dx*dv dimensional Value parameter matrix;
The attention mechanism is divided into three stages: in the first stage, the target sequence is mapped from x T t map of dx dimension to Query of dq dimension, and similarly transformed x T s e into matrix mapping to Key matrix with dk element dimension and Value matrix with dv element dimension, calculating the similarity between Query and Key; In the second stage, the original score of the first stage is normalized, and the α T s e T t weight of Value is calculated by Softmax. In the third stage, the Value is weighted and summed according to the weight coefficient to obtain the attention Value.

2.4. Multi Time Scale Fusion Model

In this paper, the multi-time scale fusion model is applied to the prediction of PM2.5 hour-concentration for the first time, and the model process is shown in Figure 2. EEMD (Ensemble Empirical Mode Decomposition) decomposition can decompose the original PM2.5 sequence into new sequences with different time scales. CNN-LSTM was employed to extract characteristic information of time series. Attention_layer pays attention to important features and ignores non-important features through attention mechanism to improve prediction accuracy.
The specific steps are as follows:
(1) Input the original PM2.5 sequence into the EEMD model, and perform EEMD decomposition on the original PM2.5 concentration data. This is the first improvement made by the model in this paper on the basis of CNN-LSTM model. Compared with the original sequence, the decomposed sequence can more precisely express the period of the original sequence and better obtain information of different time scales.
(2) The original PM2.5 data sequence and the decomposed PM2.5 sequence were input into CNN-LSTM network composed of two layers of Conv1d and one layer of LSTM respectively for feature extraction. As convolutional neural network has excellent feature extraction and feature expression capabilities, LSTM has natural advantages in processing time sequence. Therefore, CNN and LSTM are used in combination in feature extraction in this paper. In this paper, the decomposed sequences are recombined into new sequences according to different time scales and used as the input of different network layers respectively with the original sequence.
(3) The outputs of different LSTM layers output the prediction results through the attention mechanism layer. Attention mechanism is another improvement based on CNN-LSTM. Through attention mechanism, more important feature information can be paid attention to in features of different time scales to improve the accuracy of prediction.

3. Results and Discussion

3.1. Experimental Configuration and Data Set Description

The experimental environment of this paper uses TensorFlow + Keras framework, Python 3.7 development language, the system uses Windows, with multiple Python library functions for code implementation and result analysis.
The data in this paper are the monitoring data from ground stations in Harbin, mainly including AQI, PM2.5, PM10, O3, and other data. The update frequency is one hour, and the time span is from May 2014 to April 2021. PM2.5 is shown in Figure 3.

3.2. Data Pre-Processing

In this paper, data pre-processing includes data cleaning and data normalization. During data cleaning, clear redundant data. When the pollutant data is missing, this paper uses 8 h moving average data to replace it. After processing, the short-term missing values that still exist are supplemented by simple linear interpolation of adjacent values, and the missing data that are too long are deleted.
The normalization of maximum and minimum values is used in this paper, as follows:
f * = f f min f max f min
where fmax is the maximum value of sample data; fmin is the minimum value of sample data.

3.3. EEMD Decomposition of PM2.5 Concentration

In this paper, the pre-treated TIME series of PM2.5 value is decomposed into 14 IMF series and one trend item, as shown in Figure 4.
For the period calculation of IMF components, this paper uses the average period as the period of IMF components. The calculation results are shown in Table 1 below. According to the cycle calculation results, imF1-IMF4 is hour scale, IMF5-IMF9 is day scale, IMF10-IMF12 is month scale, and IMF13-IMF14 is year scale.

3.4. Evaluation Index

The following indicators are selected as the evaluation criteria in this paper:
(1) RMSE (Root Mean Square Error)
RMSE = 1 M m = 1 M y m   y m 2
where ym is the true value in the test set; ym’ is the predicted value.
(2) MAE (Mean Absolute Error)
MAE = 1 m m = 1 M Y Y
where Y is predicted results; Y is true value.
(3) R2adj (Adjusted R-Square)
R 2 = 1 m = 1 M y m y - m 2 m = 1 M y m y - 2
R 2 adj = 1 ( 1 R 2 ) ( n 1 ) n p 1
where y m is the true value in the test set; y - m is the predicted value; y - is the average of the true values in the test set; R2 is R-Square; n is the number of samples; p is the number of features; R2adj offsets the impact of the number of samples on R2, so that the value of R2adj is between zero and one, and the larger the value of R2adj, the better the performance of the model.

3.5. Comparison of Experimental Results

3.5.1. Impact of Historical Time Windows on Model Performance

PM2.5 data is affected by a variety of related time series, but the change of each time series value does not immediately affect PM2.5 concentration value, which means that the variable value at the previous moment has a lag effect on the PM2.5 concentration value at the next moment, which may be strong in the short term and weak in the long term [8]. A smaller window size cannot guarantee sufficient long-term memory input for LSTM model, while a larger window size will increase the input of irrelevant information and increase the unnecessary computational complexity of the model [9]. In order to determine the appropriate historical time window, the historical time window in this study starts from 12 h, and every 12 h is a time interval. The prediction scale is the concentration of 1 h PM2.5 in the future. The results are shown in Table 2 below. When the historical time window is 36 h, the RMSE, MAE and R2 of the model in this paper are 9.66, 6.95, and 0.95, respectively, which are the best. For LSTM model, when the history time window is 24 h, RMSE 14.0 is the best. When the historical time window is 36 h, MAE is 7.63 and R2 is 0.89. For CNN-LSTM model, when the historical time window is 24 h, RMSE is 13.66, MAE is 9.88, and R2 is 0.91. The model in this paper is superior to the comparison model in terms of indicators. The RMSE of the model is 31% lower than that of LSTM and 25% lower than that of CNN-LSTM. For the index MAE, it is 24% lower than LSTM and 22% lower than CNN-LSTM. For index R2, it is 5% higher than LSTM and 3% higher than CNN-LSTM.

3.5.2. Performance Comparison of Multi-Step Prediction

In order to test the multi-step prediction performance of the model in this paper for PM2.5 hour-concentration, experiments were carried out on the three models for 1 h, 4 h, 8 h, 12 h, and 24 h in the future, respectively, and the results are shown in Table 3. It can be seen from Table 3 that: (1) each model achieves the best effect when the prediction step size is one hour, and the evaluation indexes of the model proposed in this paper are better. (2) With the increase of prediction step size, the accuracy of prediction decreases, but the prediction evaluation index of the model proposed in this paper is superior to LSTM and CNN-LSTM in each prediction time scale. Therefore, it indicates that the model proposed in this paper is effective in improving the long-term prediction accuracy.
In order to display the forecast results intuitively, the forecast data from 26 February 2021 to 18 March 2021 are selected for display, as shown in Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9 below. The blue represents the real data value, the yellow is the predicted value of the LSTM model, the green is the predicted value of the CNN-LSTM model, and the red is the predicted value of the model in this article. It can be seen from Figure 5 and Figure 6 that when the prediction step length is short, although the prediction results of the other two models and the predicted future trend can be well consistent with the real data, the model proposed in this article has achieved better results. At the same time, the model proposed in this article is also superior to the other two models in peak prediction. It can be seen from Figure 7, Figure 8 and Figure 9 that as the prediction duration increases, the accuracy of the peak prediction and the prediction of the future trend of each model decreases. When the prediction time step is 24 h, the prediction trend of LSTM and CNN-LSTM starts to be opposite to that of the real data, as shown in the predicted value between 400 h and 450 h in Figure 9. The prediction results and future trends of the model in this article can be better agreement with the real data. Therefore, the model in this article can better simulate the long-term forecast of PM2.5.

4. Conclusions

The prediction of PM2.5 concentration is of great significance for People’s Daily life and environmental governance. Because the characteristic information of different time scales has different influence on the prediction results, a multi-time scale fusion model is proposed in this paper. The experimental results show that the proposed multi-time scale fusion model is superior to the comparison model in single and multi-step prediction, indicating that the multi-time scale fusion is effective for long-term prediction. In addition, in this paper, only the data of one site is used for the experiment, the amount of data is too small, and the influence between sites is not taken into account. In the future, PM2.5 between adjacent stations will be studied and analyzed, and the accuracy of prediction will be improved by studying the spatial correlation between stations.

Author Contributions

Writing—original draft preparation, W.X.; writing—review and editing, J.Z.; funding acquisition, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by National Natural Science Foundation of China under Grant NSFC-61803148.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Fan, J.X.; Li, Q.; Zhu, Y.J.; Hou, J.X.; Feng, X. Research on time and space prediction model of air pollution based on RNN. Sci. Surv. Mapp. 2017, 42, 76–83. [Google Scholar]
  2. Qi, B.L.; Guo, K.P.; Yang, B.; Du, Y.M.; Liu, M.; Wang, J.N. Air quality prediction based on GCN-LSTM. Appl. Comput. Syst. 2021, 30, 208–213. [Google Scholar]
  3. He, Z.X.; Li, L. A prediction model of air pollutant concentration based on wavelet transform and LSTM. Environ. Eng. 2021, 39, 111–119. [Google Scholar]
  4. Huang, C.J.; Kuo, P.H. A deep CNN-LSTM model for particulate matter (PM2.5) forecasting in smart cities. Sensors 2018, 18, 2220. [Google Scholar] [CrossRef] [Green Version]
  5. Liu, J.; Shahroudy, A.; Xu, D.; Kot, A.C.; Wang, G. Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 3007–3021. [Google Scholar] [CrossRef] [Green Version]
  6. Wu, Z.; Huang, N.E. Ensemble empirical mode decompo-sition: A noise-assisted data analysis method. Adv. Adapt. Data Anal. 2009, 1, 1–41. [Google Scholar] [CrossRef]
  7. Lu, Y.; Yang, J.; Shao, Z.J.; Zhu, C.C. Robust prediction of PM (2.5) based on staged temporal attention network. Environ. Eng. 2021, 39, 1–9. [Google Scholar]
  8. Huang, W.J.; Li, D.Y.; Huang, Y. Long-term prediction of PM2.5 concentration based on deep learning. Appl. Res. Comput. 2021, 38, 1809–1814. [Google Scholar]
  9. Li, X.; Peng, L.; Yao, X.J.; Cui, S.L.; Hu, Y.; You, C.Z.; Chi, T. Long short-term memory neural network for air pollutant concentration predictions: Method development and evaluation. Environ. Pollut. 2017, 231, 997–1004. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The structure of LSTM neurons.
Figure 1. The structure of LSTM neurons.
Processes 10 00171 g001
Figure 2. System flow of multi-time scale network model.
Figure 2. System flow of multi-time scale network model.
Processes 10 00171 g002
Figure 3. Changes in PM2.5 concentration over time.
Figure 3. Changes in PM2.5 concentration over time.
Processes 10 00171 g003
Figure 4. EEMD decomposition results of PM2.5 concentration.
Figure 4. EEMD decomposition results of PM2.5 concentration.
Processes 10 00171 g004
Figure 5. The prediction results of the three methods with a prediction step of 1 h.
Figure 5. The prediction results of the three methods with a prediction step of 1 h.
Processes 10 00171 g005
Figure 6. The prediction results of the three methods with a prediction step of 4 h.
Figure 6. The prediction results of the three methods with a prediction step of 4 h.
Processes 10 00171 g006
Figure 7. The prediction results of the three methods with a prediction step of 8 h.
Figure 7. The prediction results of the three methods with a prediction step of 8 h.
Processes 10 00171 g007
Figure 8. The prediction results of the three methods with a prediction step of 12 h.
Figure 8. The prediction results of the three methods with a prediction step of 12 h.
Processes 10 00171 g008
Figure 9. The prediction results of the three methods with a prediction step of 24 h.
Figure 9. The prediction results of the three methods with a prediction step of 24 h.
Processes 10 00171 g009
Table 1. The period of each IMF component of PM2.5 concentration.
Table 1. The period of each IMF component of PM2.5 concentration.
IMF ComponentPeriod/h
IMF13
IMF25
IMF38
IMF415
IMF525
IMF646
IMF789
IMF8168
IMF9321
IMF10659
IMF111395
IMF124000
IMF138572
IMF1420,000
RES--
Table 2. Performance comparison of models in different historical time windows.
Table 2. Performance comparison of models in different historical time windows.
Historical Window TimeLSTMCNN-LSTMModel of This Paper
RMSEMAEAdjusted
R2
RMSEMAEAdjusted R2RMSEMAEAdjusted R2
12 h14.659.390.9013.859.560.9110.627.270.94
24 h14.09.230.9112.908.800.929.797.020.95
36 h14.249.150.9012.968.910.929.666.950.95
48 h16.8710.570.8613.309.200.9110.377.250.94
60 h17.1611.070.8614.9010.340.8910.997.560.94
72 h17.4111.450.8514.9210.400.8911.067.420.94
Table 3. Comparison of the performance of the three methods for different time step predictions.
Table 3. Comparison of the performance of the three methods for different time step predictions.
Time Step (Predicted)LSTMCNN-LSTMModel of This Paper
RMSEMAEAdjusted
R2
RMSEMAEAdjusted
R2
RMSEMAEAdjusted
R2
1 h14.259.150.9012.968.910.929.966.950.95
4 h14.959.720.8914.519.700.9011.688.100.93
8 h16.8811.160.8615.2010.260.8914.259.690.90
12 h17.6511.270.8417.6011.320.8515.009.850.89
24 h21.2113.480.7820.8013.630.7918.4811.780.83
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhang, J.; Xia, W. Prediction of PM2.5 Concentration on the Basis of Multi-Time Scale Fusion. Processes 2022, 10, 171. https://doi.org/10.3390/pr10010171

AMA Style

Zhang J, Xia W. Prediction of PM2.5 Concentration on the Basis of Multi-Time Scale Fusion. Processes. 2022; 10(1):171. https://doi.org/10.3390/pr10010171

Chicago/Turabian Style

Zhang, Jianfei, and Wangui Xia. 2022. "Prediction of PM2.5 Concentration on the Basis of Multi-Time Scale Fusion" Processes 10, no. 1: 171. https://doi.org/10.3390/pr10010171

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop