Research on the Spatiotemporal Characteristics and Concentration Prediction Model of PM 2.5 during Winter in Jiangbei New District, Nanjing, China

: Accurate prediction of PM 2.5 concentration is one of the key tasks of air pollution assess-ment, early warning, and treatment. In this paper, four monitoring sites were arranged in Jiangbei New District of Nanjing City, China. The environmental parameters such as PM 2.5 /PM 10 concentration, temperature, and humidity were monitored from January to February 2020. A gated recurrent unit (GRU) network based on the PM 2.5 concentration prediction model was established to predict PM 2.5 concentration. The mean relative error (MRE), root mean square error (RMSE), and Pearson correlation coefﬁcient were selected as the evaluation criteria for the accuracy of the GRU model. The data set was divided into a training set, a test set and a validation set at a ratio of 7:2:1, and the GRU model was used to predict the hourly value of PM 2.5 concentration in the next week. The prediction results show that the Pearson correlation coefﬁcients between the predicted values and the monitored values of the four monitoring sites have reached more than 0.9, reﬂecting a strong correlation. The relative average errors are around 10%. The GRU model prediction of NJAU (Nanjing Agricultural University)-Pukou Campus Site is the most accurate, and the correlation coefﬁcient, MRE, and RMSE are 0.970, 7.85%, and 9.6049, respectively, reﬂecting the good prediction performance of the model. Therefore, this research supports the prediction of air quality in different cities and regions, so people can take protective measures in advance and reduce the damage caused by air pollution to human bodies. results show that the predicted values are very close to the measured value. The Pearson correlation coefﬁcient is greater than 0.9, which reﬂects the strong correlation between the two sites. MRE and RMSE are smaller, among which the prediction effect of Pukou Campus of Nanjing Agricultural University is the best, as the Pearson correlation coefﬁcient is 0.970, MRE is 7.85, and RMSE is 9.6049. The experimental results show that the model has good prediction accuracy and generalization ability. The PM 2.5 spatiotemporal characteristics and GRU model studied in this paper show the potential for analyzing and predicting urban air quality. It could be popularized in other cities and regions, as well as provide decision support for decision-makers on air pollution assessment, early warning, and treatment. It also could be applied to improve the environmental air quality and lay a theoretical foundation for better development of urban construction. At the same time, it can provide people with air quality forecasts, help people know the air quality situation in the future in advance, and take relevant preventive measures to reduce the impact of air pollution on their health.


Introduction
Due to the rapid development of China's economy, environmental pollution issues are becoming more serious. According to 2018 "China's environmental state communique" [1], the air quality in 217 of the 338 cities in China reached light levels of pollution, with some achieving even higher levels. The number of days with PM 2.5 (particles less than or equal to 2.5 microns) as the primary pollutant accounted for 60.0% of the days with severe or higher pollution. The chemical composition of PM 2.5 is complex [2], and its scattering and extinction effects will cause a decrease in atmospheric visibility [3], resulting in haze weather, which has a severe impact on the health of residents [4]. Dockery [5] studied the effect of air pollution on children's health in six cities. They believed that particulate matter could cause earaches, asthma, and other diseases, and could also cause irreversible loss of lung function to the respiratory system. Zhao et al. [6] assessed the association between PM 2.5 exposure and asthma development. The research showed that soluble PM 2.5 extract exposure leads to the specific degradation of the tight junctions that regulated paracellular permeability and induced the apoptosis of airway epithelial cells (AECs). It resulted in the disruption of the airway epithelial barrier function and contributed to the development and exacerbation of asthma. Zhu et al. [7] first used the city-level baseline mortality rates to analyze PM 2.5 -related premature deaths in Chinese cities. For megalopolises and metropolises, the effect of PM 2.5 abatement on alleviating public health burdens was obvious, which provided important policy implications for cities to mitigate the negative impacts of air pollution.
In recent years, haze weather is still one of the key issues that is concerning to the Chinese government. The traditional prediction methods include the grey model [8], time series [9], linear regression [10], Bayesian [11], etc. However, the formation process of PM 2.5 is complex and has nonlinear features. Therefore, the applicability of traditional methods declines. At present, the neural network prediction model is widely used because of its advantages in nonlinear fitting [12]. Perez et al. [13] used ANN (artificial neural network) model to predict the hourly concentration of PM 2.5 . This model has a simple structure and excellent prediction ability, but also has defects in regard to local over-optimization and over-fitting. Chen et al. [14] used the mathematical model of BP neural networks to predict the mass concentration of PM 2.5 . The algorithm established the correlation between the aerosol optical depth (AOD) and the concentration of PM 2.5 , and has achieved prediction accuracy of 88.5%. However, there are some issues related to the local over-optimization and the long training time. Prakash et al. [15] used wavelet function instead of Sigmoid function as the activation of the wavelet neural network model, which significantly shortened the training time and effectively avoided falling into the local minimum. However, there were some defects when the gradient method was used in optimizing the network parameters, such as slow convergence speed when approaching the minimum value and swing decline when in liner search. These defects occurred because it is difficult to adapt to the complex and changeable characteristics of PM 2.5 . Huang et al. [16] proposed a deep learning model APNet based on CNN (convolutional neural network) and LSTM (long short-term memory), which had better prediction performance than using CNN or LSTM alone. Still, the model ignored the spatial relationship, and the time step of prediction yielded few results. Gated recurrent unit (GRU) is a kind of the RNN (recurrent neural network) algorithm. It is an improved model based on the LSTM proposed by Cho et al [17]. Hochreiter et al. [18] proposed LSTM. Gate structure was first introduced by LSTM in network structure, and the introduction of constant error carousel (CEC) unit mitigated the gradient explosion and disappearance caused by RNN when training method of the BPTT (back propagation through time). Gers et al. [19] further improved the LSTM network structure by adding a forgetting gate and peephole, thus solving the problem that the internal state of the LSTM network will indefinitely grow when processing continuous input streams. However, the relatively complex internal structure of LSTM and many model parameters could reduce the training speed of this model. GRU makes up this deficiency of the LSTM model. It combines the input gate and forgetting gate of LSTM into the update gate and replaces the output gate of LSTM with a reset gate, simplifying the network structure and reducing the model parameters by one third. This paper set up measuring sites to acquire the concentration data of PM 2.5 in Jiangbei New District of Nanjing City, China, and analyzed its spatiotemporal characteristics [20], as well as proposing a PM 2.5 concentration prediction model based on GRU network and LSTM [21]. The purpose of this research is to get a superior prediction model through experimental comparison, and to ensure this model has good performance in prediction accuracy and generalization ability, as well as to provide technical guidance for exploring the changing law of PM 2.5 content and the adjustment and decision-making of overall air quality management measures in Jiangbei New District of Nanjing.

Research Objects
Nanjing is the capital of Jiangsu Province in China, and its GDP ranks 11th in China's urban areas in 2019. In recent years, the environmental quality of Nanjing has been continuously improved, and the air quality is generally stable [22]. However, there have been many instances of severe haze weather, which has caused adverse effects on human health, urban traffic, etc. Due to these complications, it is clear that the ambient air quality still needs to be improved.
Jiangbei New District is an important scientific and technological innovation base and advanced industrial base in Nanjing. In 2019, its GDP ranked second in 12 districts of Nanjing. Jiangbei New District has three national economic development zones: High-tech Zone, Chemical Industrial Zone, and Science Industry Zone, as well as two provincial economic development zones: Pukou Economic Development Zone and Luhe Economic Development Zone. The developmental goal of heavy industry and the high-tech industry as some of the main objectives of this area have led to severe challenges to the environmental air quality of Jiangbei New District, which has brought about a variety of serious effects on people's health.

Data Acquisition and Preprocessing
In this research, the ambient air quality monitoring sites were set up in the High-tech Zone of Jiangbei New District (32 • 9    The Renke brand air quality transmitter was used for data monitoring. The air quality transmitter is shown in Figure 2. The transmitter uses the sensor of Sensirion Company in Switzerland, and the main parameters of each sensor are shown in Table 1. The transmitter used RS485 bus transmission protocol to realize data transmission, recorded data once a minute, and obtained 1184354 pieces of data.

Index
Accuracy Resolution R PM2. 5 10 1 0 ~ 1,0 PM10 10 1 0 ~ 1,0 Temperature 0.5 0.1 −40 ~ Relative Humidity 3 0.1 0 ~ Due to monitoring errors and signal noise, there are some outliers in the Processing the outliers can improve the accuracy of the model. The data was by Matlab R2019b (MathWorks, Natick, MA, USA). If the PM2.5 and PM10 c data of Equation (1) are satisfied, the average value will be used for replace temperature and humidity data, the change of the values is more stable, so value will still be used for replacement of the data of Equation (2).
where abs is an absolute value function, is the value of the data in the 1.5 and 1.2 are error coefficients. After preprocessing, 1,176,068 pieces of available data were obtained, a the statistical analysis results of PM2.5 concentration data are shown in the T

Index
Accuracy Resolution Range Due to monitoring errors and signal noise, there are some outliers in the original data. Processing the outliers can improve the accuracy of the model. The data was preprocessed by Matlab R2019b (MathWorks, Natick, MA, USA). If the PM 2.5 and PM 10 concentration data of Equation (1) are satisfied, the average value will be used for replacement. For the temperature and humidity data, the change of the values is more stable, so the average value will still be used for replacement of the data of Equation (2).
where abs is an absolute value function, c i is the value of the i th data in the dataset, k p = 1.5 and k TH = 1.2 are error coefficients. After preprocessing, 1,176,068 pieces of available data were obtained, among which the statistical analysis results of PM 2.5 concentration data are shown in the Table 2.

Air Quality Level
The Ministry of Environmental Protection of China divides the air quality level into five levels: excellent, good, light pollution, moderate pollution, and severe pollution. The corresponding air pollution index and the impact of human activities are shown in Table 3. Under the excellent and good air quality levels, all the normal activities are not affected. When the air quality reaches above the light pollution level, and especially when the heavy pollution level is reached, it has negative effects on human health and becomes the main reason for the higher incidence rate of disease. In addition, it also seriously affects human capital in the region, so people should do a series of protective measures. The GRU network uses sequence data as input for recursive training, and it is often used to learn the nonlinear features of sequences. It has the functions of memory and parameter sharing and is quite efficient. The GRU model mainly uses two sigmoid functions to transform the memory state information and the current input data to between 0 and 1 [19]. The closer the value is to 1, the more the data information is retained, so as to realize the memory and discarding of data. The model structure is shown in Figure 3.
The GRU network combines cell state c t and output h t in LSTM as output h t of GRU hidden layer. h t can be calculated by the following four equations: where is Hadmard product operation, x t is the time t state input, h t−1 is the state function of the last hidden layer, r t is the reset gate function, z t is the update gate function, h t is the state of the hidden layer of time t, h t is the activation state of time t hidden layer, W and b represent the weight matrix and bias vector in the neuron, respectively, and their subscripts represent the position of the weight in the network; for example, W xr is the weight matrix between the input and reset gate, b r is the offset vector of the reset door, σ represents the sigmoid function, σ and tanh function are commonly used activation functions. They can filter unimportant information, so that the output values of the two gate structures are compressed to the range of [0,1], and the expressions are shown in Equations (7) and (8), respectively.
In Figure 3, z t and r t represent update and reset doors, respectively. Update gate z t is used to filter the useful information in the previous time series to act on the current time series. The larger its value is, the more current time information is retained. On the contrary, the state information of the hidden layer h t in the last moment is more retained; that is, x t and h t−1 can be retained and updated through the update gate. Reset door r t is used to control the amount of information of the previous time series acting on the candidate state output h t . The larger its value, the more historical information is retained, and GRU is more inclined to learn long-term dependent features. The biggest improvement of GRU compared with LSTM is that it can complete the processing of current information and historical information by resetting the door, thus speeding up the operation speed.  The GRU network combines cell state ct and output ht in LSTM as output ht of GRU hidden layer. ht can be calculated by the following four equations: where ⨀ is Hadmard product operation, x t is the time t state input, h t -1 is the state function of the last hidden layer, r t is the reset gate function, z t is the update gate function, h t is the state of the hidden layer of time t, h t is the activation state of time t hidden layer, W and b represent the weight matrix and bias vector in the neuron, respectively, and their subscripts represent the position of the weight in the network; for example, W xr is the weight matrix between the input and reset gate, b r is the offset vector of the reset door, σ

Data Set
In this research, PM 2.5 concentration data was transformed from the minute level to the hour level using mean value processing with MATLAB software, and was divided into the training set, test set, and validation set according to the proportion of 7:2:1. In order to speed up the convergence of the model and improve the operation accuracy of the model, the data set is normalized to the range of 0~1 by Equation (9) [14].
where X norm is the normalized value, X is raw data, X max is the maximum value of the original data, X min is the minimum value of the original data.  [21], and the algorithm steps are as follows: Step 1: Forward propagation calculates the output value of each neuron (activation function is sigmoid function); Step 2: Back propagation calculates the gradient of the network error for each weight parameter; Step 3: Update the weight of each synapse. The model uses a three-tier network, with input dimension of 2, hidden layer dimension of 4, and batch_size is set to 3. The number of iterations is taken as 1000, and the accuracy error of training is 0.0003. The Adam (Adaptive Moment Estimation) algorithm is adopted as the optimization algorithm of the model Gradient [23]. Compared with the traditional SGD (stochastic gradient descent) method, Adam does not easily fall into the local best points, and it can avoid large Gradient fluctuations and update quickly [24].
The model adopts the deep learning framework of Pytorch (v1.8.1, Facebook, Menlo Park, CA, USA) [25] which is a derivative of Torch on Python. Pytorch can realize the dynamic neural network and provide Tensor supporting CPU and GPU [26]. In the model, the mean square error function is selected for the objective loss function, that is: where Q is the number of samples in the training set, p t is the true value of PM 2.5 concentration at time t.

Evaluation Index
In this paper, the mean relative error (MRE), root mean square error (RMSE), and Pearson correlation coefficient are selected as the evaluation criteria for the accuracy of the GRU model [27]. When the attenuation of the model's loss function tends to be stable, and the prediction accuracy of the model is no longer significantly improved, the equations are as follows: where N is the total number of prediction results, and x i and y i are the actual PM 2.5 concentration value and predicted PM 2.5 concentration value of the ith sampling point, respectively. The average relative error refers to the deviation between the predicted value and the measured value, which indicates the reliability of the model prediction result. If the value is smaller, the model is more reliable. The RMSE is the deviation between the simulated value and the measured value, which reflects the accuracy of the prediction result. If the same value is smaller, the prediction result is more accurate. The Pearson correlation coefficient is used to measure the line between the predicted value and the measured value. The corresponding relation between absolute value and correlation intensity is shown in Table 4. (SVM). SVR is a regressor that is used for predicting continuous ordered variables, which solves the problems of small samples, overfitting, high-dimensional numbers, etc. It is a regression algorithm instead of using the curve as a decision boundary, and uses the curve to find the match between the vector and position of the curve. SVR can use multiple classifiers trained on the different types of data using the probability rules, and performs lower computation compared to other regression techniques. Table 4. Correspondence between r xy and correlation strength. boundary, and uses the curve to find the match between the vector and position of the curve. SVR can use multiple classifiers trained on the different types of data using the probability rules, and performs lower computation compared to other regression techniques.

PM2.5 Spatiotemporal Characteristics
MATLAB software was used to average the preprocessed data according to 24 hours during a day and 7 days during a week. The resulting line figures are shown in Figure 4.  The Pearson correlation coefficient between any two different monitoring sites in Figure 4 is between 0.737 and 0.909. In Figure 5, the Pearson correlation coefficient is between 0.733 and 0.976. All of them showed a strong correlation, and it can be seen that the changing trend of PM2.5 concentration at the four monitoring sites are basically the same. In terms of temporal characteristics, PM2.5 concentration is the lowest at around 3-4 o'clock in the afternoon, and the highest concentration is around 7-8 o'clock in the morning during a day. Within a week, the PM2.5 concentration changed significantly, with a maximum difference of 47.18 ug/m 3 , which showed the highest on Tuesday. Except for High-tech Zone Site, which fell to the lowest on Saturday, the other three monitoring sites were the lowest on Thursday. Due to human activities, the value of PM2.5 will generally increase on Saturdays, Sundays, and Mondays. After life and work are stable, the value of PM2.5 will decrease. In terms of spatial characteristics, it can be seen from Table 2 that Shangcheng Community Site has the highest PM2.5 concentration and the largest fluctuation, NJAU-Pukou Campus Site is the lowest, and the change is the most stable, High-tech Zone Site and Luhe Countryside Site are at the same level. This is mainly due to Jiangbei New District focusing on the development of heavy and high-tech industry in recent years, resulting in increasingly serious air pollution. Compared with the other three monitoring sites, Shangcheng Community Site has more frequent staff movement, commercial activities, and more motor vehicle exhaust, which also lead to the increase of PM2.5 concentration. It shows that the air quality in Jiangbei New District has a strong correlation with the regional distribution. The Pearson correlation coefficient between any two different monitoring sites in is between 0.737 and 0.909. In Figure 5, the Pearson correlation coefficient is between 0.733 and 0.976. All of them showed a strong correlation, and it can be seen that the changing trend of PM 2.5 concentration at the four monitoring sites are basically the same. In terms of temporal characteristics, PM 2.5 concentration is the lowest at around 3-4 o'clock in the afternoon, and the highest concentration is around 7-8 o'clock in the morning during a day. Within a week, the PM 2.5 concentration changed significantly, with a maximum difference of 47.18 ug/m 3 , which showed the highest on Tuesday. Except for High-tech Zone Site, which fell to the lowest on Saturday, the other three monitoring sites were the lowest on Thursday. Due to human activities, the value of PM 2.5 will generally increase on Saturdays, Sundays, and Mondays. After life and work are stable, the value of PM 2.5 will decrease. In terms of spatial characteristics, it can be seen from Table 2 that Shangcheng Community Site has the highest PM 2.5 concentration and the largest fluctuation, NJAU-Pukou Campus Site is the lowest, and the change is the most stable, High-tech Zone Site and Luhe Countryside Site are at the same level. This is mainly due to Jiangbei New District focusing on the development of heavy and high-tech industry in recent years, resulting in increasingly serious air pollution. Compared with the other three monitoring sites, Shangcheng Community Site has more frequent staff movement, commercial activities, and more motor vehicle exhaust, which also lead to the increase of PM 2.5 concentration. It shows that the air quality in Jiangbei New District has a strong correlation with the regional distribution.
increase on Saturdays, Sundays, and Mondays. After life and work are stable, the value of PM2.5 will decrease. In terms of spatial characteristics, it can be seen from Table 2 that Shangcheng Community Site has the highest PM2.5 concentration and the largest fluctuation, NJAU-Pukou Campus Site is the lowest, and the change is the most stable, High-tech Zone Site and Luhe Countryside Site are at the same level. This is mainly due to Jiangbei New District focusing on the development of heavy and high-tech industry in recent years, resulting in increasingly serious air pollution. Compared with the other three monitoring sites, Shangcheng Community Site has more frequent staff movement, commercial activities, and more motor vehicle exhaust, which also lead to the increase of PM2.5 concentration. It shows that the air quality in Jiangbei New District has a strong correlation with the regional distribution.

Model Prediction Results
This research was conducted in the Python environment. The experimental equipment configuration is: Windows 10, Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz (Intel, Santa Clara, CA, USA), 16GB memory, 1TB hard disk.
A PM2.5 concentration prediction model was established in the PyCharm compilation environment based on the above experimental environment configuration. The model

Model Prediction Results
This research was conducted in the Python environment. The experimental equipment configuration is: Windows 10, Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz (Intel, Santa Clara, CA, USA), 16GB memory, 1TB hard disk.
A PM 2.5 concentration prediction model was established in the PyCharm compilation environment based on the above experimental environment configuration. The model used a test set to predict the PM 2.5 hour average mass concentration in the next week for the four monitoring sites. The predicted values of the PM 2.5 concentration are compared with the monitored value, and the loss functions during the iteration process were shown in Figure 5a-d.
It can be seen from Figure 5 that the predicted value of 168 hours in the next week is basically consistent with the monitored value. The model converges in the smallest gradient direction before 200 iterations, and the processing speed is fast. There is no overfitting in the model. After 1000 iterations, the loss rate output by the loss function is almost close to zero. Table 5 gives the prediction error and the correlation coefficient between the predicted value and the monitored value. It can be seen from Table 5 that the correlation coefficient between the predicted value and the monitored value output by the PM 2.5 concentration model based on GRU has reached more than 0.9. There is a strong correlation between the predicted value and the monitored value, and the MRE is around 10%. The results show that the PM 2.5 concentration prediction model constructed by GRU network has good prediction performance. Among them, NJAU-Pukou Campus Site has the highest prediction accuracy, and Luhe Countryside Site has the lowest relative prediction accuracy. According to the spatiotemporal characteristics of PM 2.5 , the model performs better when predicting lower concentrations of PM 2.5 .
The SVR model used test set to predict the PM 2.5 hour average mass concentration in the next week for the four monitoring sites. The predicted values of the PM 2.5 concentration are compared with the monitored values, as shown in the results in Figure 6a-d.
It can be seen from Figure 6 that the predicted value of 168 hours in the next week is also basically consistent with the monitored value. However, it does not predict the sharp peak of the curve well. Table 6 gives the prediction error and the correlation coefficient between the predicted value and the monitored value for SVR model. performance. Among them, NJAU-Pukou Campus Site has the highest prediction accuracy, and Luhe Countryside Site has the lowest relative prediction accuracy. According to the spatiotemporal characteristics of PM2.5, the model performs better when predicting lower concentrations of PM2.5. The SVR model used test set to predict the PM2.5 hour average mass concentration in the next week for the four monitoring sites. The predicted values of the PM2.5 concentration are compared with the monitored values, as shown in the results in Figure 6a-d. Compared with SVR model and some existing studies, such as LSTM [21], the correlation coefficient of GRU model reaches more than 0.9, but that of SVR model is only around 0.8. In addition, the MRE of GRU model is around 10%; however, that of SVR model is more than 20%. From this index, it can be seen that GRU deep learning model is more accurate in prediction and has great advantages in convergence speed, which can better fit the nonlinear problems in air quality prediction. Due to the complex causes of PM 2.5 and pollutant concentration, it has a strong correlation with people's life and production activities, etc. However, access to data regarding areas such as traffic and factory production is limited. In the next step, more relevant data need to be obtained to improve the accuracy of the model.

Conclusions
In order to analyze and predict air quality, four monitoring sites were arranged to collect PM 2.5 /PM 10 concentration data, temperature, and humidity data in Jiangbei New District of Nanjing City, and an analysis of PM 2.5 spatiotemporal characteristics was conducted. The concentrations of PM 2.5 at the four monitoring sites showed the same trend, peaking around 7 a.m. and falling to the lowest around 4 p.m. during the day. During a week, PM 2.5 concentrations reached their highest on Tuesday and, with the exception of the High-tech Zone site, which dropped to the lowest on Saturday, the other three monitoring sites all dropped to the lowest on Thursday. Among the four monitoring sites, Shangcheng Community Site had the highest concentration of PM 2.5 with a large dispersion, while NJAU-Pukou Campus Site had the lowest concentration of PM 2.5 with the most stable change. The GRU network prediction model for PM 2.5 concentration was established to predict the hourly average mass concentration of PM 2.5 at four monitoring sites. The results show that the predicted values are very close to the measured value. The Pearson correlation coefficient is greater than 0.9, which reflects the strong correlation between the two sites. MRE and RMSE are smaller, among which the prediction effect of Pukou Campus of Nanjing Agricultural University is the best, as the Pearson correlation coefficient is 0.970, MRE is 7.85, and RMSE is 9.6049. The experimental results show that the model has good prediction accuracy and generalization ability. The PM 2.5 spatiotemporal characteristics and GRU model studied in this paper show the potential for analyzing and predicting urban air quality. It could be popularized in other cities and regions, as well as provide decision support for decision-makers on air pollution assessment, early warning, and treatment. It also could be applied to improve the environmental air quality and lay a theoretical foundation for better development of urban construction. At the same time, it can provide people with air quality forecasts, help people know the air quality situation in the future in advance, and take relevant preventive measures to reduce the impact of air pollution on their health.