Haze Prediction Model Using Deep Recurrent Neural Network

: In recent years, haze pollution is frequent, which seriously affects daily life and production process. The main factors to measure the degree of smoke pollution are the concentrations of PM2.5 and PM10. Therefore, it is of great signiﬁcance to study the prediction of PM2.5/PM10 concentration. Since PM2.5 and PM10 concentration data are time series, their time characteristics should be considered in their prediction. However, the traditional neural network is limited by its own structure and has some weakness in processing time related data. Recurrent neural network is a kind of network specially used for sequence data modeling, that is, the current output of the sequence is correlated with the historical output. In this paper, a haze prediction model is established based on a deep recurrent neural network. We obtained air pollution data in Chengdu from the China Air Quality Online Monitoring and Analysis Platform, and conducted experiments based on these data. The results show that the new method can predict smog more effectively and accurately, and can be used for social and economic purposes.


Introduction
Haze is one of the common meteorological disasters in China, which has caused an extensive impact on all aspects of people's productivity and life. In terms of physical health, it affects people's respiratory systems, greatly increasing the probability of respiratory diseases, and hinders cardiovascular health. In haze weather, the inhalable particles in the air increase poor air mobility, resulting in bacteria, the slow spread of the virus, the concentration of the virus in the space range increases, making people more vulnerable to diseases. In terms of traffic, haze leads to low air visibility, easily causing traffic accidents.
Due to the many adverse effects of haze, many scientists have carried out relevant studies on haze. Lv, Z. et al. [1] analyzed the causes of the haze, believing that it is related to the exhaust emissions of vehicles and other vehicles. Wu, J. et al. [2] analyzed the causes of haze from multiple angles. Liu, Y. et al. [3] studied the problem of industrial development under the influence of haze, and pointed out the impact of haze on human production and life, especially the air control industry. Thus, the emergence of haze has produced a serious interference with our daily life [4][5][6].
Prediction of haze thus becomes extremely important to prevent the occurrence of severe haze. Baklanov, A. and Zhang, Y. [7] made a systematic introduction and summary of the development of modern atmospheric composition modeling and AQF (Air quality forecast) system in 2020. Gao et al. [8] evaluated the impacts of Global Positioning System Zenith Total Delay (GPS-ZTD) data assimilation on meteorology and aerosol simulations in this study using the WRF-CMAQ (the Weather Research and Forecasting model and Community Multiscale Air Quality) modeling system over the NCP during 1-31 December 2019. Hertwig et al. [9,10] developed the Lagrange dispersion model system in 2015 and applied it to haze prediction. The experiment proved that the system could provide a reliable prediction. Liu et al. [11,12] analyzed the fuchsia haze events in Nanjing in 2017. Singh, V. et al. [13] proposed a method based on co-kriging to predict air pollution. The method has been officially applied to the urban area of Milan.
In recent years, deep learning has been well applied in many fields and has gradually replaced traditional methods [14][15][16][17]. Compared with traditional technology, artificial neural networks [18,19] have been widely used in the field of haze prediction. Yin et al. [20] established a multi-convolution neural network haze prediction model to forecast haze in 2021. Avijoy Chakma et al. proposed a method that uses a deep Convolutional Neural Network (CNN) to classify natural images into different categories based on their PM2.5 concentrations in 2017 [21]. Compared with the two-dimensional data of image processing [22], haze is more suitable to be regarded as a kind of sequence data. Ziyan Zhang et al. proposed a haze prediction method based on 1D-CNN in 2021 [23]. As BP neural network has the characteristics of predicting nonlinear complex systems, BP neural network is also used in the field of haze prediction. Limei Ma et al. proposed a haze prediction method based on BP neural network in 2017 [24]. As the haze data is a time series, the above methods cannot well analyze the time relationship in the data. Therefore, RNN (Recurrent Neural Network) specially used for sequence analysis has been applied in the field of haze prediction [25,26].
When solving practical problems, the RNN model of a hidden layer has limited representation ability, and the results of haze prediction under multiple influencing factors are not excellent. In order to solve the above problems, a haze prediction system based on deep cyclic neural network is designed, the time-based back-propagation algorithm is selected to train the prediction model, and the PM2.5 and PM10 prediction experiments are completed. The results showed that the model has high feasibility and rationality and obtained accurate and reliable results, providing a valuable reference for improving air monitoring of relevant departments.

Materials and Method
In this study, we collected the air pollution data of 6 monitoring stations in Chengdu from the website of Chengdu air quality online monitoring and analysis platform https: //www.aqistudy.cn (accessed on 17 March 2018).
The six monitoring stations are in Shahepu, Liangjia lane, Junpingjie, Caotangsi, Sanwayao and Shilidian in Chengdu. The data provided by the monitoring stations are the concentrations of PM2.5, PM10, O 3 , CO, NO 2 and SO 2 , of which the measurement unit of PM2.5, PM10, O 3 and SO 2 is (µg/m 3 ), while NO 2 and CO are measured in (mg/m 3 ). The monitoring data update frequency is once an hour. The geographical locations of the above six monitoring points in the urban area of Chengdu are shown in Figure 1.
By compiling crawler code using Python 2.70 by Python Software Foundation, this study collected the statistical data of the above monitoring points from 1 June 2014 to 30 June 2017. The average data of 6 monitoring points are used to represent the air quality in Chengdu, with a total of 26,120 data. The data format is shown in Table 1.

Mean Completion Data
As shown in Table 1, the original data is missing at a certain time. We use the mean interpolation method to complete some missing data. The specific method is shown in Equation (1): X t represents the missing weather data at the current time, and X t−1 represents the weather data at the previous moment, X t+1 represents the weather data at a time after. After interpolation, a total of 27,024 valid data were used in this study.

Mean Completion Data
As shown in Table 1, the original data is missing at a certain time. We use the mean interpolation method to complete some missing data. The specific method is shown in Equation (1): represents the missing weather data at the current time, and represents the weather data at the previous moment, represents the weather data at a time after. After interpolation, a total of 27024 valid data were used in this study.

Standardized Data Processing
It can be seen from Table 1 that there is a large difference in the numerical range of the six data collected. The data with larger values will increase the influence proportion of the model in the neural network and weaken the data characteristics with lower values. In order to avoid the error caused by different numerical ranges, the above data are normalized in this study. All data are normalized to between -1 and 1, as shown in Equation (2): Among them, ′ indicates the weather data set after the standardization process is completed.
represents the original weather data set, and indicates the mean of the weather data set.
indicates the maximum value of the weather dataset, represents the minimum value of the weather dataset.

Standardized Data Processing
It can be seen from Table 1 that there is a large difference in the numerical range of the six data collected. The data with larger values will increase the influence proportion of the model in the neural network and weaken the data characteristics with lower values. In order to avoid the error caused by different numerical ranges, the above data are normalized in this study. All data are normalized to between −1 and 1, as shown in Equation (2): Among them, X indicates the weather data set after the standardization process is completed. X represents the original weather data set, and X indicates the mean of the weather data set. X max indicates the maximum value of the weather dataset, X min represents the minimum value of the weather dataset.
In the subsequent experiments, we will use the first 80% of the entire data set as the test set, the next 10% as the verification set, and the last 10% as the test set. Divided by time.

Methods
Recursive neural networks, simply RNN, are a class of neural networks used to process sequence data [27]. Sequence data refers to a kind of data that can reflect the degree of a phenomenon or something, and there is a close connection between before and after data. RNN includes input layer, hidden layer, output layer, hidden layer plays the most important role, hidden layer not only includes the output of the input layer at this time, Atmosphere 2021, 12, 1625 4 of 12 but also the output of its previous moment, RNN processing of sequence data is mainly reflected in the memory, data information characteristics, applied to the output, because RNN depth reflects the input and output process, it is also called depth network, the following is the model development diagram of RNN ( Figure 2).

Methods
Recursive neural networks, simply RNN, are a class of neural networks used to process sequence data [27]. Sequence data refers to a kind of data that can reflect the degree of a phenomenon or something, and there is a close connection between before and after data. RNN includes input layer, hidden layer, output layer, hidden layer plays the most important role, hidden layer not only includes the output of the input layer at this time, but also the output of its previous moment, RNN processing of sequence data is mainly reflected in the memory, data information characteristics, applied to the output, because RNN depth reflects the input and output process, it is also called depth network, the following is the model development diagram of RNN ( Figure 2). Unlike general neural networks that make connections only between layers, the RNN also connects between neurons between layers. We can use the hidden layer as the storage space for the entire network, and when the RNN is unfolded, we find that it can be used to supervise classification learning recurrent neural networks, introduce a directional neural network that remembers previous information and applies, which essentially distinguishes it from traditional feedforward neural networks. For traditional neural networks, it can train samples more efficiently, simulating complex feature relationships more realistically and credibly. RNN uses the reverse propagation method to calculate the forward propagation process based on the time and order of network propagation [28].
However, in the practical problem solving, RNN is still insufficient, expresses limited ability, and the hidden layer cannot express the whole model, so in this paper, in predicting the concentration scenario of PM2.5 and PM10, a deep recurrent neural network is designed to better complete the prediction of PM2.5 and PM10 concentration. To study three aspects of RNN deeper than other methods: input to hidden function; hidden to output function; and hidden to hidden transformation [29]. Since the depth of the deep recursive network model is mainly reflected in the hidden layer, we start from the hidden layer to determine the optimal model by changing the number and the units of the hidden layers [30] (Figure 3). Unlike general neural networks that make connections only between layers, the RNN also connects between neurons between layers. We can use the hidden layer as the storage space for the entire network, and when the RNN is unfolded, we find that it can be used to supervise classification learning recurrent neural networks, introduce a directional neural network that remembers previous information and applies, which essentially distinguishes it from traditional feedforward neural networks. For traditional neural networks, it can train samples more efficiently, simulating complex feature relationships more realistically and credibly. RNN uses the reverse propagation method to calculate the forward propagation process based on the time and order of network propagation [28].
However, in the practical problem solving, RNN is still insufficient, expresses limited ability, and the hidden layer cannot express the whole model, so in this paper, in predicting the concentration scenario of PM2.5 and PM10, a deep recurrent neural network is designed to better complete the prediction of PM2.5 and PM10 concentration. To study three aspects of RNN deeper than other methods: input to hidden function; hidden to output function; and hidden to hidden transformation [29]. Since the depth of the deep recursive network model is mainly reflected in the hidden layer, we start from the hidden layer to determine the optimal model by changing the number and the units of the hidden layers [30] (Figure 3).

Deep Recurrent Neural Network Training Process
Recursive neural networks can encode long past information with the function of temporal storage. During the training of deep recurrent neural networks, the temporal backpropagation algorithm is trained to apply it to the recurrent layer. For the errors generated by the backpropagation algorithm, we propagate in two directions for the former layer transmitted to the current layer and the previous time it transmits to: (1) Define annotations (

Deep Recurrent Neural Network Training Process
Recursive neural networks can encode long past information with the function of temporal storage. During the training of deep recurrent neural networks, the temporal backpropagation algorithm is trained to apply it to the recurrent layer. For the errors Atmosphere 2021, 12, 1625 5 of 12 generated by the backpropagation algorithm, we propagate in two directions for the former layer transmitted to the current layer and the previous time it transmits to: (1) Define annotations ( According to Figure 4a, training the network requires the following processing.  The input layer → the hidden layer. As shown in Equations (3) and (4): The hidden layer → the output layer. As shown in Equations (5) and (6): Among them, f and g are the hidden layer's activation functions and the output layer, respectively. The activation function ensures the nonlinearity of the whole neural network, which significantly improves its expression ability. and are deviations.
(2) Select the loss function and activation function This paper chooses the variance sum function as the loss function of training the neural network, as shown in Equation (7): The input layer → the hidden layer. As shown in Equations (3) and (4): The hidden layer → the output layer. As shown in Equations (5) and (6): Among them, f and g are the hidden layer's activation functions and the output layer, respectively. The activation function ensures the nonlinearity of the whole neural network, which significantly improves its expression ability. b j and b k are deviations.
(2) Select the loss function and activation function This paper chooses the variance sum function as the loss function of training the neural network, as shown in Equation (7): (3) Backpropagation algorithm along time Design a neural network of more than 2 layers to capture longer historical information to transmit further the error (Figure 4b). Therefore, the backward propagation of error can be defined as Equations (8) and (9): where h is the hidden layer node index when the time step is t, and j is the hidden layer node index when the time step is t − 1.

(4) Weight matrix update calculation
For brevity, the index p of the training sample is not added to our symbol here. Then, the error of the output layer is expressed as Equation (10): The output layer weights are expressed as Equation (11): The error gradient propagates from the output layer to the hidden layer. As shown in Equation (12): Obtain the error vector with the Equation (13): The weight matrix V is calculated as shown in the Equation (14): The cyclic weight matrix W is calculated as shown in the Equation (15): The research process predicts PM2.5 concentrations, the input for the past 24 h of PM2.5 density, and the current moment of PM10, O 3 , CO, NO 2 , and SO 2 output to predict the moment of PM2.5 density. As a result, the number of input layer neurons is 29. The number of output layer neurons is 1. The same as forecasting PM10 concentration, just change the input data of PM10 to PM2.5.

Evaluation
The main content of this study is to construct different deep neural network prediction models by changing the number of hidden layers and the number of neural units in each hidden layer, and to explore its impact on PM2.5/PM10 prediction accuracy.
In this study, we used root mean square error (RMSE) as the evaluation index [31][32][33]. The formula is as Equation (16): where, i represents the number of test sample points and m represents the estimated total number of days. T i is the actual concentration at the test sample point. P i is the predicted concentration of the test sample point. The experimental results are compared with the actual data at 360 times. Different hidden layers are selected to compare the effects of different hidden layers on the prediction accuracy of the model. For different hidden layers, the neuron number analysis prediction model is used.
In this research, the number of hidden layer nodes is determined according to the empirical formula l = √ m + n + α. The parameter m represents the number of input layer nodes, n represents the number of output layer nodes, α is a positive constant less than 10, and l represents the number of hidden layer neurons. Additionally, because the input of this model is 24 + 5 (24-h PM2.5 concentration value, and the other five pollutant concentration values at the current moment), the output node is 1.
Therefore, the number of hidden neurons is between 6-16. We chose 10 as the initial number of neurons in this model, and it decreases step by step according to the increase of the hidden layer.
The number of hidden layers and the distribution of neurons are shown in Table 3: Table 3. Number of hidden layers and the distribution of neurons. Each group of experiments was conducted five times, and the mean value was selected as the experimental result for analysis.

Hidden Layers Neuron Distribution
The PM2.5/PM10 concentration values are divided into 6 grades in the process to explain the experimental results further. The classification of PM2.5/PM10 concentration value is shown in Table 4. Suppose the predicted result and the actual measurements are in the same grade. In that case, the predicted result is judged to be superior. Suppose the predicted result and the actual measurements are in two adjacent grades. The predicted result is judged to be acceptable. Suppose the predicted results are more than two grades different from the actual measurements. The predicted results are judged to be unacceptable.

Results
The results and optimization of RNN model prediction are achieved by changing the number of hidden layers.
The prediction results are shown in Table 5. The results shows the RMSE and optimal evaluations with different number of hidden layer and neuron distribution.

Discussion
Aiming at the PM2.5/PM10 concentration prediction problem, this paper builds a deep cyclic neural network composed of RNN. It predicts the PM2.5/PM10 concentration in Chengdu urban area. According to the prediction, the results are analyzed. The results show that under the same other conditions, the more hidden layers, the higher the prediction accuracy. When reaching a specific value, the accuracy is roughly the same; under the same network, the prediction accuracy of PM2.5 is significantly higher than that of PM10.
Previous studies on PM2.5/PM10 prediction have mostly used linear regression methods, but in fact, the relationship between these factors is quite complex and often nonlinear. Therefore, in this paper, the deep learning theory is used to construct the haze prediction model. The deep cycling neural network prediction model and the deep, longterm memory prediction model predict the PM2.5/PM10 concentration in the short term. Deep recurrent neural networks have many advantages, with stronger parameter prediction power than other neural networks and higher accuracy in the prediction results. RNN and long-term memory prediction models are examples of recurrent networks that take the hidden state of the previous layer as input and generate the hidden state of the current

Discussion
Aiming at the PM2.5/PM10 concentration prediction problem, this paper builds a deep cyclic neural network composed of RNN. It predicts the PM2.5/PM10 concentration in Chengdu urban area. According to the prediction, the results are analyzed. The results show that under the same other conditions, the more hidden layers, the higher the prediction accuracy. When reaching a specific value, the accuracy is roughly the same; under the same network, the prediction accuracy of PM2.5 is significantly higher than that of PM10.
Previous studies on PM2.5/PM10 prediction have mostly used linear regression methods, but in fact, the relationship between these factors is quite complex and often nonlinear. Therefore, in this paper, the deep learning theory is used to construct the haze prediction model. The deep cycling neural network prediction model and the deep, longterm memory prediction model predict the PM2.5/PM10 concentration in the short term. Deep recurrent neural networks have many advantages, with stronger parameter prediction power than other neural networks and higher accuracy in the prediction results. RNN and long-term memory prediction models are examples of recurrent networks that take

Discussion
Aiming at the PM2.5/PM10 concentration prediction problem, this paper builds a deep cyclic neural network composed of RNN. It predicts the PM2.5/PM10 concentration in Chengdu urban area. According to the prediction, the results are analyzed. The results show that under the same other conditions, the more hidden layers, the higher the prediction accuracy. When reaching a specific value, the accuracy is roughly the same; under the same network, the prediction accuracy of PM2.5 is significantly higher than that of PM10.
Previous studies on PM2.5/PM10 prediction have mostly used linear regression methods, but in fact, the relationship between these factors is quite complex and often nonlinear. Therefore, in this paper, the deep learning theory is used to construct the haze prediction model. The deep cycling neural network prediction model and the deep, long-term memory prediction model predict the PM2.5/PM10 concentration in the short term. Deep recurrent neural networks have many advantages, with stronger parameter prediction power than other neural networks and higher accuracy in the prediction results. RNN and long-term memory prediction models are examples of recurrent networks that take the hidden state of the previous layer as input and generate the hidden state of the current layer as output, improving its prediction accuracy and a larger processing range of data [34]. Compared with general models with ordinary structures, the dynamic properties of the data changing with time series are fully considered [35].
However, the research work is still in the preliminary stage. There are still many problems that can be further studied.
(1) In the concentration prediction of PM2.5/PM10, this article gives up the meteorological data with the bottom correlation is abandoned. Instead, the highly correlated PM2.5, PM10, O 3 , CO, NO 2 , and SO 2 are selected as variables to train the prediction model. Therefore, the factors causing PM2.5 pollutions cannot be explored [36]. The scene of haze formation cannot be predicted with high accuracy, which has a certain impact on the experimental prediction results. How to improve the accuracy and study the experimental prediction using the factors causing PM2.5/PM1s0 pollution needs to be further improved. (2) In this paper, the prediction experiment is only given the Chengdu city PM2.5/PM10 concentration. Due to the geographical environment, the local climate and haze pollution situation is different, and the prediction results may not be the same, so in this experiment, the best model predicted results might not be applicable to other cities, to determine the best prediction model in other regions requires retraining in the mode [37]. Therefore, how to find a suitable multiregional haze prediction model still needs to be studied, thinking about further improving the scope of research to China, and making the prediction of PM2.5/PM10 concentration change. (3) The timescale chosen in this paper is small, the prediction of PM2.5/PM10 is only a short-term prediction, and the prediction model design is not sliding, so the prediction model effect is only suitable for the prediction of PM2.5/PM10 concentration content in short-term haze weather, and cannot be used to directly predict future long-term results. Instead, short-term predictions were used as variables to predict long-term concentrations of PM2.5/PM10 [38,39]. In the next step, we will choose the long-time scale and PM2.5 /PM10 concentration content range to increase to the degree of heavy pollution, and conduct the time series analysis of the severe haze areas to study its development mechanism [40,41].

Conclusions
In this paper, interpolation completion and data standardization were carried out on the collected data. Correlation analysis was carried out on the processed data. The data were rounded up according to the correlation coefficient. The PM2.5, PM10, O 3 , CO, NO 2 , and SO 2 concentrations were finally selected as the input variables to predict PM2.5 and PM10. The same training data set and test set are selected to build a deep recurrent neural network of RNN. The experimental conclusions are as follows: (1) The prediction results of PM2.5 and PM10 are related to the number of hidden layers.
The larger the number of hidden layers is, the higher the prediction accuracy is. Therefore, the prediction effect of PM2.5 is higher than that of PM10, which is related to the data characteristics of PM2.5 and PM10. As the concentration data of PM2.5 is lower than PM10 and the concentration range is smaller, the RMS error is small, and the accuracy is high. (2) According to the prediction results of PM2.5, the prediction accuracy is related to the number of hidden layers in the case of determining the number of nodes. The prediction results of the deep recurrent neural network are as follows: with the increase of the number of hidden layers, the higher the prediction accuracy is, and the deep recurrent neural network with eight hidden layers has the best prediction results.
(3) According to the PM10 prediction results, as with the PM2.5 prediction results, prediction accuracy is related to the number of hidden layers. The larger the number of hidden layers is, the higher the prediction accuracy will be. The prediction results of a deep recurrent neural network with eight hidden layers are the best. (4) To ensure prediction accuracy, the most simplified neural network is selected to predict PM2.5/PM10. Good results can be achieved when PM2.5/PM10 is predicted using a deep recurrent neural network with seven hidden layers. Funding: This work was jointly supported by the Sichuan Science and Technology Program (2021YFQ0003).