Roadside Air Quality Forecasting in Shanghai with a Novel Sequence-to-Sequence Model

The establishment of an effective roadside air quality forecasting model provides important information for proper traffic management to mitigate severe pollution, and for alerting resident’s outdoor plans to minimize exposure. Current deterministic models rely on numerical simulation and the tuning of parameters, and empirical models present powerful learning ability but have not fully considered the temporal periodicity of air pollutants. In order to take the periodicity of pollutants into empirical air quality forecasting models, this study evaluates the temporal variations of air pollutants and develops a novel sequence to sequence model with weekly periodicity to forecast air quality. Two-year observation data from Shanghai roadside air quality monitoring stations are employed to support analyzing and modeling. The results conclude that the fine particulate matter (PM2.5) and carbon monoxide (CO) concentrations show obvious daily and weekly variations, and the temporal patterns are nearly consistent with the periodicity of traffic flow in Shanghai. Compared with PM2.5, the CO concentrations are more affected by traffic variation. The proposed model outperforms the baseline model in terms of accuracy, and presents a higher linear consistency in PM2.5 prediction and lower errors in CO prediction. This study could assist environmental researchers to further improve the technologies for urban air quality forecasting, and serve as tools for supporting policymakers to implement related traffic management and emission control policies.


Introduction
Traffic emissions have been one of the major contributors to urban air pollution in many cities around the world [1,2], and can deteriorate ambient air quality on a wide range of spatial scales. Epidemiological studies indicate that long-term exposure to traffic-related air pollution could harm human health [3], lead to respiratory and cardiovascular diseases, and even increase mortality [4,5]. Even short-term exposure to ambient particulate air pollution could greatly increase the risk of myocardial infarction [6]. In addition, the fine particulate matter (PM 2.5 ) can also notably reduce visibility [7] and thus affects road capacity and traffic safety. Hence, it is crucial to build an efficient and accurate air quality forecasting system that could help the department of transportation to broadcast warning information to travelers, and provide guides to residents to make better plans for outdoor activities and avoid health-threatening travels. Therefore, in order to protect public health and improve traffic safety, it is necessary to develop efficient prediction models to forecast air quality.
There are two main types of models in forecasting air quality: deterministic models based on atmospheric physics and chemical mechanisms, and empirical models based on statistics and machine learning. Deterministic models, such as weather research and forecasting model coupled with chemistry (WRF-Chem) [8,9] and community multi-scale air quality model (CMAQ) [10], could help explain the formation, development, and transport of air pollutants. However, the performance of these models could be affected by the uncertainty of emissions and chemical reactions, which further weakens the ability to model spatial and temporal resolution [11]. The input parameters of the deterministic models (e.g., emission inventory) commonly contain very limited information of pollution sources, lack spatial and temporal dependencies for some air pollutants [12], and also present strong difficulty when being updated in time due to the high cost. These disadvantages could notably affect the prediction performance of the deterministic models.
Empirical models can provide more accurate predictions with relatively few parameters based on the theories of statistics and machine learning. The autoregressive integrated moving average (ARIMA) model has been widely used for time series analysis. A study of PM 2.5 forecasting in Beijing demonstrates that the ARIMA model achieves better results in the next 1-h prediction with a lower error [13]. However, when the time lag expands, the prediction results become worse. Support vector regression (SVR) has also been employed for air quality prediction, and evolutionary algorithms, as an import tool for parameter optimization [14,15], are used to improve the empirical models. For example, when the quantum-behaved particle swarm optimization (QPSO) algorithm is used to determine the input parameters for the SVR model, the hybrid QPSO-SVR model could present better prediction performance in both computational time and model accuracy [16]. Moreover, the support vector regression model considering the space-time factors outperforms the traditional SVR model in accuracy [11].
Recently, deep learning has exhibited the potentials of enhancing the methodology of resilience assessment, and demonstrated strong performances in air pollution forecasting, due to its flexible model structure and powerful learning ability [17][18][19][20]. Furthermore, with the improvement of training algorithms and computing performances, the artificial neural network (ANN) has been made possible to be trained as more dense and multi-layered neural networks, such as the deep neural network (DNN). The DNN can be widely applied for a variety of tasks, especially for modeling complex nonlinear relationships, such as stacked autoencoder (SAE) [21], recurrent neural network (RNN) [22], gated recurrent unit (GRU) [23], and long short-term memory (LSTM) neural network [24]. A study on the prediction of the hourly PM 2.5 concentrations in Beijing shows that the prediction accuracy of LSTM is significantly better than that of ARMA and SVR models [25]. A long short-term memory-fully connected (LSTM-FC) neural network also performs well in predicting the PM 2.5 concentrations in Beijing [26]. An attention-based air quality predictor (AAQP) model, known as a variant of the Seq2Seq model, shows that the prediction accuracy of AAQP in Beijing outperforms that of the traditional LSTM and the basic Seq2Seq model [27]. Also, the LSTM and Seq2seq models demonstrate worse prediction performances when the time lags become longer. Although the deep learning models mentioned above can effectively capture the spatiotemporal dependencies in air quality predictions, the characteristics of the time series itself (such as periodicity) are still insufficiently incorporated into these models. Actually, the roadside air quality is directly affected by traffic-related pollutants, and the temporal variation is strongly consistent with the diurnal pattern of traffic flow. Specifically, the diurnal variation of traffic-related particles suggests two peaks, which is strongly associated with intensive emissions during two traffic rush hours [28]. If the periodicity consistent with traffic flow patterns is fully considered in modeling, the deep learning models could be more suitable for roadside air quality forecasting with a higher accuracy. Therefore, it is necessary to develop an advanced deep learning model including periodic features of the time series.
To address this issue, we propose a novel sequence to sequence (Seq2Seq) model with weekly periodicity to predict the traffic-related PM 2.5 and CO concentrations. In this study, we analyze the hourly variations of the PM 2.5 and CO concentration data from two roadside air quality monitoring stations in Shanghai. Then, the Seq2Seq model with weekly periodicity is developed by incorporating the diurnal variation patterns and taking full account of the periodic characteristics of pollutants. Finally, the 24-h prediction results are compared with the baseline models. Among all the air pollutants, fine particulate matter (PM 2.5 ) and carbon monoxide (CO) are selected in this paper mainly for the two following reasons: Firstly, PM 2.5 is one of the most important air pollutants in megacities [29], and CO is a stable gaseous pollutant and mainly results from traffic emissions [30]. Secondly, PM 2.5 and CO represent two different kinds of air pollutants, namely particles and gaseous pollutants. Therefore, the characterization of the two air pollutants could represent different temporal patterns of traffic emissions and also assist in evaluating the prediction performance of the proposed model responding to varying pollutants [31].

Study Area and Data Description
Shanghai, a megacity located in the Yangtze River Delta, is one of the most economically developed regions in China. Although industrial emissions are still the major local contributors to the total PM 2.5 concentrations in Shanghai, vehicle emissions contribute more to particulate pollution in urban areas [32]. Therefore, it is reasonable to select Shanghai as the study area to analyze the temporal variations of air pollutants, especially for local traffic-related air pollution. The roadside air quality is more affected by traffic emissions and has been generally considered as serious pollution hotspots under common concern [33]. In this paper, we choose two roadside air quality monitoring stations for case studies. The two monitoring stations are affiliated with the Shanghai Environmental Monitoring Center (SEMC) and adjacent to urban arterial roads. The design intent of the two monitoring stations mainly focuses on analyzing the impacts of road traffic emission sources on ambient air quality. The locations of the roadside air quality monitoring stations are shown as red markers in Figure 1. The hourly pollutant data from 8 extra monitoring stations over urban areas of Shanghai are also used to verify the external validity of the air quality forecasting model subsequently proposed, and their locations are shown as purple markers in Figure 1.
The hourly data of six air pollutants from 1 March 2015 to 28 February 2017 are used in this study, such as particulate matter less than 2.5 µm (PM 2.5 ) and less than 10 µm (PM 10 ), carbon monoxide (CO), nitrogen dioxide (NO 2 ), ozone (O 3 ), and sulfur dioxide (SO 2 ). The hourly mass concentrations of air pollutants are measured by automated monitoring instruments in the air quality monitoring stations. The daily quality control and quality assurance (QC/QA) are implemented by the professional staff of SEMC [34] Among the six air pollutants, PM 2.5 (Shanghai's major pollutant) and CO (closely related to traffic emissions) are chosen as the main research objects. Other pollutants are input as auxiliary parameters in the air quality forecasting model. The meteorological data are generated by the Weather Research and Forecasting (WRF) model [8] in the Yangtze River Delta region with a grid resolution of 5 × 5 km. The grid data closest to each air quality monitoring station are selected to geographically match the meteorological and pollutant data, and then build the dataset. The valid meteorological data (e.g., temperature, humidity, air pressure, wind, and precipitation) are also adopted to further strengthen the prediction performance of the proposed model.

Autocorrelation Analysis
Autocorrelation, also known as serial correlation, refers to the calculation of the correlation between one time series and the previous ones. When the time series contain periodic components, the maximum value of the autocorrelation function will be an indicator of periodicity. The autocorrelation function is defined as follows [35]: , is the time lag, and represents the i-th value of the time series.

Long Short-Term Memory (LSTM) Networks
Air pollution data from ground-based monitoring stations are generally recorded as the timeseries data. Thus, the recurrent neural networks (RNNs) are better suited to air quality forecasting than the feed-forward neural networks (FNNs) applied in previous studies [36]. Vanilla recurrent neural networks suffer from the vanishing gradient and exploding gradient problems during long sequence training, which could be relieved by long short-term memory (LSTM) networks due to the gated mechanism [37]. Figure 2 shows the structure of LSTM networks. In each LSTM cell, there is an input gate ( ), a forget gate ( ), and an output gate ( ). The input gate, the forget gate, and the output gate control new information adding, previous information preserving, and prediction outputting, respectively. The equations for the hidden state transferring are shown in Equations (2)-(7):

Autocorrelation Analysis
Autocorrelation, also known as serial correlation, refers to the calculation of the correlation between one time series and the previous ones. When the time series contain periodic components, the maximum value of the autocorrelation function will be an indicator of periodicity. The autocorrelation function is defined as follows [35]: (y t − y)(y t+k − y), k is the time lag, and y i represents the i-th value of the time series.

Long Short-Term Memory (LSTM) Networks
Air pollution data from ground-based monitoring stations are generally recorded as the time-series data. Thus, the recurrent neural networks (RNNs) are better suited to air quality forecasting than the feed-forward neural networks (FNNs) applied in previous studies [36]. Vanilla recurrent neural networks suffer from the vanishing gradient and exploding gradient problems during long sequence training, which could be relieved by long short-term memory (LSTM) networks due to the gated mechanism [37]. Figure 2 shows the structure of LSTM networks. In each LSTM cell, there is an input gate (i t ), a forget gate ( f t ), and an output gate (o t ). The input gate, the forget gate, and the output gate control new information adding, previous information preserving, and prediction outputting, respectively. The equations for the hidden state transferring are shown in Equations (2)- (7): where x t is the input to the cell at time t, c t is the cell state, and h t is the hidden state. σ refers to the sigmoid function.
The symbol tanh denotes hyperbolic tangent function: and the operator * in Equations (6) and (7) where is the input to the cell at time , is the cell state, and ℎ is the hidden state. refers to the sigmoid function.

Sequence to Sequence (Seq2Seq) Model
The sequence-to-sequence (Seq2Seq) model was developed by Sutskever et al. for machine translation [38], and Cho et al. subsequently refined the model [39]. The Seq2Seq model is an Encoder-Decoder structure with sequential inputs and outputs, and generally recurrent neural networks (e.g., vanilla RNN, LSTM, or GRU, etc.) are employed to build the encoder and decoder. The architecture of the Seq2Seq model is shown in Figure 3. Specifically, input sequences are read and encoded to a context vector by the encoder RNN, and then output sequences are generated from the vector by the decoder RNN [38]. Here, ℎ ∈ ℝ denotes the hidden state at past timestep in the encoder, where refers to the size of the context vector. Similarly, ∈ ℝ and represent the hidden state at a future timestep in the decoder. The context vector in the decoder is a weighted sum of all hidden states with timesteps in the encoder, as shown: where and both refer to nonlinear functions. The decoder RNN is trained to predict the output at each future timestep by considering context vector and past output { , . . . , }. In RNN units, previous long-term output { , . . . , } can be stored and updated in the hidden state

Sequence to Sequence (Seq2Seq) Model
The sequence-to-sequence (Seq2Seq) model was developed by Sutskever et al. for machine translation [38], and Cho et al. subsequently refined the model [39]. The Seq2Seq model is an Encoder-Decoder structure with sequential inputs and outputs, and generally recurrent neural networks (e.g., vanilla RNN, LSTM, or GRU, etc.) are employed to build the encoder and decoder. The architecture of the Seq2Seq model is shown in Figure 3. Specifically, input sequences are read and encoded to a context vector by the encoder RNN, and then output sequences are generated from the vector by the decoder RNN [38]. Here, h t ∈ R m denotes the hidden state at past timestep t in the encoder, where m refers to the size of the context vector. Similarly, s t ∈ R n and n represent the hidden state at a future timestep t in the decoder. The context vector c in the decoder is a weighted sum of all hidden states with T timesteps in the encoder, as shown: where f and g both refer to nonlinear functions. The decoder RNN is trained to predict the output y t at each future timestep t by considering context vector c and past output {y 1 , . . . , y t −1 }. In RNN units, previous long-term output {y 1 , . . . , y t −2 } can be stored and updated in the hidden state s t −1 of the decoder RNN, and thus the output sequence at t time is presented as: where p and q both refer to nonlinear functions. where and both refer to nonlinear functions. In this study, the structure of LSTM networks is chosen for building the encoder and the decoder, and the framework of the proposed model is shown in Figure 4. The input of the LSTM encoder is a sequence with 24 time-steps, namely a 24-h time series. For a better understanding of the input sequence with a variety of parameters, the input sequence can be considered as an integration of three different sequences: the air quality data for day T, the meteorological data for day T-1, the PM2.5 or CO data for day T-7. The air quality data and meteorological data for day T-1 are chosen with reference to the selections of other input parameters for data-driven air quality forecasting models in previous studies [11,26]. The additional air quality data for day T-7 are used to reflect the weekly periodicity in our model. The LSTM networks, as the decoder, are also used to generate the output from the hidden state (a context vector). The output is a sequence of 24-h air quality (PM2.5 or CO) and presents the same length as the input sequence. In this study, the structure of LSTM networks is chosen for building the encoder and the decoder, and the framework of the proposed model is shown in Figure 4. The input of the LSTM encoder is a sequence with 24 time-steps, namely a 24-h time series. For a better understanding of the input sequence with a variety of parameters, the input sequence can be considered as an integration of three different sequences: the air quality data for day T, the meteorological data for day T-1, the PM 2.5 or CO data for day T-7. The air quality data and meteorological data for day T-1 are chosen with reference to the selections of other input parameters for data-driven air quality forecasting models in previous studies [11,26]. The additional air quality data for day T-7 are used to reflect the weekly periodicity in our model. The LSTM networks, as the decoder, are also used to generate the output from the hidden state (a context vector). The output is a sequence of 24-h air quality (PM 2.5 or CO) and presents the same length as the input sequence. where and both refer to nonlinear functions. In this study, the structure of LSTM networks is chosen for building the encoder and the decoder, and the framework of the proposed model is shown in Figure 4. The input of the LSTM encoder is a sequence with 24 time-steps, namely a 24-h time series. For a better understanding of the input sequence with a variety of parameters, the input sequence can be considered as an integration of three different sequences: the air quality data for day T, the meteorological data for day T-1, the PM2.5 or CO data for day T-7. The air quality data and meteorological data for day T-1 are chosen with reference to the selections of other input parameters for data-driven air quality forecasting models in previous studies [11,26]. The additional air quality data for day T-7 are used to reflect the weekly periodicity in our model. The LSTM networks, as the decoder, are also used to generate the output from the hidden state (a context vector). The output is a sequence of 24-h air quality (PM2.5 or CO) and presents the same length as the input sequence.

Diurnal Variation
Before air quality forecasting, we first analyze the temporal characteristics of the PM 2.5 and CO concentrations. As illustrated in Figure 5, the diurnal variation patterns of the hourly-average PM 2.5 concentrations exhibit two distinct peaks and valleys, and the peaks and valleys of the two air quality monitoring stations show slight differences. This pattern can also be found for the daily variation of the CO concentrations.

Diurnal Variation
Before air quality forecasting, we first analyze the temporal characteristics of the PM2.5 and CO concentrations. As illustrated in Figure 5, the diurnal variation patterns of the hourly-average PM2.5 concentrations exhibit two distinct peaks and valleys, and the peaks and valleys of the two air quality monitoring stations show slight differences. This pattern can also be found for the daily variation of the CO concentrations. The diurnal trend revealed in this study is also consistent with that reported in Shanghai by related research [40]. For PM2.5, the two peak concentrations separately appear at 8-9 a.m. and 7-8 p.m. For CO, the two peaks occur at 8 a.m. and 5-6 p.m., respectively. Coincidentally, there are two traffic rush hours, namely at 8 a.m. and 6 p.m., in terms of the number of trips in Shanghai [41]. It can be found that the CO peaks observed in Shanghai are close to the peaks of traffic volume, while the PM2.5 peaks are delayed by about one hour. This finding indicates that CO mainly results from traffic emissions, and thus exhibits similar diurnal patterns to the changes in traffic volume in one day. For the time lag of the PM2.5 peaks, one possible explanation is that vehicle-emitted primary particles are mainly composed of ultrafine particles [42], which are in Aitken nuclei mode, and it takes time to grow into the accumulated mode by coagulation and condensation.
Next, an autocorrelation analysis based on Equation (1) is performed to verify whether the temporal variations of the PM2.5 and CO concentrations exhibit the daily periodicity, and the results are shown in Figure 6. It can be easily recognized from Figure 6 that the CO concentrations exhibit an apparent 24-h periodicity at the two monitoring stations, while the PM2.5 concentrations hardly show any daily periodicity.
As shown in Figure 6, the autocorrelation coefficient of CO at Xuhui station is higher than that at Jing'an station, which indicates a more significant daily periodicity of CO at Xuhui station. Considering that the CO exhibits more similar daily periodicity to the traffic flow, we consider that the CO concentrations measured at Xuhui station present a more significant daily periodicity. This can also be explained by the fact that the Xuhui station is located on the separation zone between the two-way lanes, and is closer to the traffic emission sources in both directions. However, Jing'an station is located on the roadside, relatively far from the on-road traffic flow. Therefore, the air quality data from Xuhui station are more affected by the on-road traffic flow patterns than those from Jing'an station.
The phenomenon of a lack of periodicity of PM2.5 mainly lies in the generation of PM2.5 being so complicated that the periodicity is not obvious. On the other hand, the results suggest that there are no linear relationships between the time series of the PM2.5 concentrations and one lagged version of the time series. The diurnal trend revealed in this study is also consistent with that reported in Shanghai by related research [40]. For PM 2.5 , the two peak concentrations separately appear at 8-9 a.m. and 7-8 p.m. For CO, the two peaks occur at 8 a.m. and 5-6 p.m., respectively. Coincidentally, there are two traffic rush hours, namely at 8 a.m. and 6 p.m., in terms of the number of trips in Shanghai [41]. It can be found that the CO peaks observed in Shanghai are close to the peaks of traffic volume, while the PM 2.5 peaks are delayed by about one hour. This finding indicates that CO mainly results from traffic emissions, and thus exhibits similar diurnal patterns to the changes in traffic volume in one day. For the time lag of the PM 2.5 peaks, one possible explanation is that vehicle-emitted primary particles are mainly composed of ultrafine particles [42], which are in Aitken nuclei mode, and it takes time to grow into the accumulated mode by coagulation and condensation.
Next, an autocorrelation analysis based on Equation (1) is performed to verify whether the temporal variations of the PM 2.5 and CO concentrations exhibit the daily periodicity, and the results are shown in Figure 6. It can be easily recognized from Figure 6 that the CO concentrations exhibit an apparent 24-h periodicity at the two monitoring stations, while the PM 2.5 concentrations hardly show any daily periodicity.
As shown in Figure 6, the autocorrelation coefficient of CO at Xuhui station is higher than that at Jing'an station, which indicates a more significant daily periodicity of CO at Xuhui station. Considering that the CO exhibits more similar daily periodicity to the traffic flow, we consider that the CO concentrations measured at Xuhui station present a more significant daily periodicity. This can also be explained by the fact that the Xuhui station is located on the separation zone between the two-way lanes, and is closer to the traffic emission sources in both directions. However, Jing'an station is located on the roadside, relatively far from the on-road traffic flow. Therefore, the air quality data from Xuhui station are more affected by the on-road traffic flow patterns than those from Jing'an station.
The phenomenon of a lack of periodicity of PM 2.5 mainly lies in the generation of PM 2.5 being so complicated that the periodicity is not obvious. On the other hand, the results suggest that there are no linear relationships between the time series of the PM 2.5 concentrations and one lagged version of the time series.

Weekly Variation
The average PM2.5 concentrations exhibit a significant weekly variation, with a fluctuation magnitude of about 10 μg/m 3 between the peak and the valley. Besides, the weekly variations of the PM2.5 concentrations show two peaks and one valley, which successively appear on Friday, on Sunday, and on Tuesday. Furthermore, the CO concentrations also show a similar weekly trend. Unlike other cities, for example, the PM2.5 concentrations in Beijing on weekdays are higher than the weekend, while the PM2.5 concentrations in Shanghai hardly show the "weekend effect", which could be further confirmed by previous studies [43].
We also perform the autocorrelation analysis to investigate the periodic variations in the daily average concentrations of PM2.5 and CO, and the results are shown in Figure 7. It can be found that the daily average concentrations of PM2.5 and CO both exhibit a significant weekly periodicity. In addition, Figure 8 suggests that there is a linear relationship between the time series and its lagged version of seven days. This result also implies that this linear relationship could be integrated into our proposed model to improve the forecasting accuracy.

Weekly Variation
The average PM 2.5 concentrations exhibit a significant weekly variation, with a fluctuation magnitude of about 10 µg/m 3 between the peak and the valley. Besides, the weekly variations of the PM 2.5 concentrations show two peaks and one valley, which successively appear on Friday, on Sunday, and on Tuesday. Furthermore, the CO concentrations also show a similar weekly trend. Unlike other cities, for example, the PM 2.5 concentrations in Beijing on weekdays are higher than the weekend, while the PM 2.5 concentrations in Shanghai hardly show the "weekend effect", which could be further confirmed by previous studies [43].
We also perform the autocorrelation analysis to investigate the periodic variations in the daily average concentrations of PM 2.5 and CO, and the results are shown in Figure 7. It can be found that the daily average concentrations of PM 2.5 and CO both exhibit a significant weekly periodicity. In addition, Figure 8 suggests that there is a linear relationship between the time series and its lagged version of seven days. This result also implies that this linear relationship could be integrated into our proposed model to improve the forecasting accuracy.

Weekly Variation
The average PM2.5 concentrations exhibit a significant weekly variation, with a fluctuation magnitude of about 10 μg/m 3 between the peak and the valley. Besides, the weekly variations of the PM2.5 concentrations show two peaks and one valley, which successively appear on Friday, on Sunday, and on Tuesday. Furthermore, the CO concentrations also show a similar weekly trend. Unlike other cities, for example, the PM2.5 concentrations in Beijing on weekdays are higher than the weekend, while the PM2.5 concentrations in Shanghai hardly show the "weekend effect", which could be further confirmed by previous studies [43].
We also perform the autocorrelation analysis to investigate the periodic variations in the daily average concentrations of PM2.5 and CO, and the results are shown in Figure 7. It can be found that the daily average concentrations of PM2.5 and CO both exhibit a significant weekly periodicity. In addition, Figure 8 suggests that there is a linear relationship between the time series and its lagged version of seven days. This result also implies that this linear relationship could be integrated into our proposed model to improve the forecasting accuracy.   In Figures 5 and 7, the daily and weekly variations of pollutant concentrations measured at the two monitoring stations demonstrate some differences in terms of the pollution characteristics between the two monitoring stations. The CO concentrations of the Xuhui station are significantly higher than those of the Jing'an station, especially during the traffic rush hours. The PM2.5 concentrations are nearly equivalent at the two monitoring stations, and the PM2.5 concentrations of the Jing'an station are slightly higher than those of the Xuhui station. Besides, the higher CO concentrations at the Xuhui station could be explained by the fact that the station is located closer to the traffic emission sources, while the Xuhui station is located below the elevated road, which prevents the upward dispersion of traffic-emitted gaseous pollutants at the ground level. The difference in the PM2.5 concentrations between the two monitoring stations is negligible, indicating that the PM2.5 concentrations are affected by both local traffic emission sources and regional air pollution.

Forecasting Model Results
To evaluate the prediction performance of the Seq2Seq model with weekly periodicity, several machine learning models are selected as the comparison models: Seq2Seq model without weekly periodicity, Bidirectional LSTM [44], and LSTM [37]. All the models are implemented in Python with Keras, pandas, NumPy and scikit-learn in a Linux system. We select three statistical indices to evaluate the prediction performance of the models: root mean square error (RMSE), normalized mean square error (NMSE) and Pearson correlation coefficient (r), which are defined as: In Figures 5 and 7, the daily and weekly variations of pollutant concentrations measured at the two monitoring stations demonstrate some differences in terms of the pollution characteristics between the two monitoring stations. The CO concentrations of the Xuhui station are significantly higher than those of the Jing'an station, especially during the traffic rush hours. The PM 2.5 concentrations are nearly equivalent at the two monitoring stations, and the PM 2.5 concentrations of the Jing'an station are slightly higher than those of the Xuhui station. Besides, the higher CO concentrations at the Xuhui station could be explained by the fact that the station is located closer to the traffic emission sources, while the Xuhui station is located below the elevated road, which prevents the upward dispersion of traffic-emitted gaseous pollutants at the ground level. The difference in the PM 2.5 concentrations between the two monitoring stations is negligible, indicating that the PM 2.5 concentrations are affected by both local traffic emission sources and regional air pollution.

Forecasting Model Results
To evaluate the prediction performance of the Seq2Seq model with weekly periodicity, several machine learning models are selected as the comparison models: Seq2Seq model without weekly periodicity, Bidirectional LSTM [44], and LSTM [37]. All the models are implemented in Python with Keras, pandas, NumPy and scikit-learn in a Linux system. We select three statistical indices to evaluate the prediction performance of the models: root mean square error (RMSE), normalized mean square error (NMSE) and Pearson correlation coefficient (r), which are defined as: where P and O, respectively, refer to the prediction value and the observed value, and P i and O i separately denote the mean of the prediction value and the observed value.
The model results demonstrate that the Seq2Seq model with weekly periodicity shows better prediction performances, as shown in Table 1, with lower errors (RMSE and NMSE) and a higher correlation coefficient (r). In addition, all the machine learning models present lower errors in CO forecasting, compared with PM 2.5 forecasting. However, in terms of the correlation coefficient, these models perform better in PM 2.5 forecasting than CO forecasting. In Table 1, the prediction accuracy of both the PM 2.5 and CO concentrations at the Xuhui station is higher than that at the Jing'an station. From the perspective of NMSE, the Xuhui station shows better improvements in the prediction accuracy of CO than the Jing'an station, which could be partly explained by the periodicity of the air quality data at the Xuhui station discussed in Sections 3.1 and 3.2 To be specific, Figure 6 indicates that the 24-h periodicity (autocorrelation coefficient) at the Xuhui station is higher than that at the Jing'an station. In addition, Figure 8 also shows the seven-day autocorrelation coefficient of CO at the Xuhui station is significantly higher than that of the Jing'an station. The results demonstrate that the air quality data of the Xuhui station exhibit more obvious periodicity. Furthermore, the obvious periodicity mainly lies in that the Xuhui station is located in the separation zone between two-way roads, just under the elevated expressway, and could be more affected by traffic emissions. The periodicity result demonstrates that the air quality data at the Xuhui station exhibit a stronger regularity and predictability, and thus achieve a higher model prediction accuracy.
The model performances of Seq2Seq with weekly periodicity under different input parameters are presented in Table 2. It can be found that meteorological factors play an important role in improving the prediction accuracy of the proposed model. Moreover, the other four pollutants also contribute to the improvement of the model performance.
For an intuitive understanding of the model results, we select the measured and forecast results from 1 January to 28 February 2017 and then draw a comparison of the predicted and observed values, as shown in Figure 9. The predicted values of the CO concentrations present a similar periodicity with the observed data, and the temporal variation shows relative consistency between the prediction and observation values. The predicted values of the PM 2.5 concentrations show a relatively smooth time-varying trend, although there are lots of mutations in the time-series sequences of the observed data. However, the forecast results under heavily polluted conditions (e.g., peak values) need to be further improved when using the proposed model in PM 2.5 forecasting. For an intuitive understanding of the model results, we select the measured and forecast results from 1 January to 28 February 2017 and then draw a comparison of the predicted and observed values, as shown in Figure 9. The predicted values of the CO concentrations present a similar periodicity with the observed data, and the temporal variation shows relative consistency between the prediction and observation values. The predicted values of the PM2.5 concentrations show a relatively smooth time-varying trend, although there are lots of mutations in the time-series sequences of the observed data. However, the forecast results under heavily polluted conditions (e.g., peak values) need to be further improved when using the proposed model in PM2.5 forecasting.

Urban Monitoring Station Results
To verify whether the proposed model can be applied to other general circumstances, we select the eight extra urban monitoring stations in Shanghai to analyze the external validity of the proposed model. Detailed description of these stations is shown in Table 3. The prediction results of the four models at the eight urban monitoring stations are shown in Figure 10. In Figure 10, the Seq2Seq model with periodicity generally performs better in PM 2.5 forecasting than other models at most sites, and also presents lower errors and a higher correlation coefficient. However, the prediction performance of the proposed model in CO forecasting is similar to that of the traditional Seq2seq model.  Table 3. The prediction results of the four models at the eight urban monitoring stations are shown in Figure 10. In Figure 10, the Seq2Seq model with periodicity generally performs better in PM2.5 forecasting than other models at most sites, and also presents lower errors and a higher correlation coefficient. However, the prediction performance of the proposed model in CO forecasting is similar to that of the traditional Seq2seq model.  As shown in Figure 10, the weekly periodicity significantly improves the model accuracy in PM2.5 forecasting, but shows little improvements in CO forecasting. One possible explanation is that the temporal variation of the CO concentrations exhibits the weekly periodicity and obvious daily As shown in Figure 10, the weekly periodicity significantly improves the model accuracy in PM 2.5 forecasting, but shows little improvements in CO forecasting. One possible explanation is that the temporal variation of the CO concentrations exhibits the weekly periodicity and obvious daily periodicity. Although the daily periodicity has also been considered by revising the format of the input data in the baseline models, the extra weekly periodicity of CO could only provide minor improvements. Overall, according to the model performances at the state control points, the Seq2Seq model with weekly periodicity could be widely used for air quality forecasting at other monitoring stations, especially for PM 2.5 forecasting.

Conclusions
In order to address issues of neglecting temporal periodicity in previous studies, a novel Seq2Seq model with weekly periodicity is developed to forecast roadside air quality. We analyze the daily and weekly variation patterns of roadside air quality data to clearly understand the underpinnings of the proposed model. To consider the characteristics of weekly periodicity, we build the Seq2Seq model with weekly periodicity by adding an auxiliary air quality sequence from a week ago into the present model. Six kinds of air pollutants from roadside air quality monitoring stations in Shanghai are employed in the model, as well as six meteorological parameters from WRF model. To further verify the external validity of the model, eight auxiliary urban air quality monitoring stations are also used in this study.
There are several general findings concluded in this research: (1) The daily trend of CO and PM 2.5 is consistent with the trend of daily traffic volume in Shanghai, and the PM 2.5 suggests a strong hysteresis (roughly one hour). Morning and evening traffic rush hours are also high pollution-level periods. This result suggests that the daily periodicity caused by traffic should not be overlooked in modeling and forecasting air quality in roadside areas. Therefore, the weekly periodicity should be fully considered in air quality forecasting. (4) The proposed Seq2Seq model with weekly periodicity was also suitable for the eight urban monitoring stations in Shanghai. In contrast, the weekly periodicity demonstrates a more pronounced impact on PM 2.5 forecasting. For CO forecasts, the weekly periodicity-based model is not necessarily appropriate for all monitoring stations.
The major contributions of this research are listed as below: (1) The temporal patterns of traffic and air quality are fully evaluated and further summarized based on the two-year air quality monitoring data in megacities; (2) Weekly periodicity is taken into account in the deep learning-based air quality forecasting model, which strongly improves prediction accuracy; (3) The proposed Seq2Seq model with weekly periodicity is also applicable to urban air quality prediction (not only for traffic-related roadside air quality), and thus can be used by public authorities to make timely management adjustments to protect public health based on air quality predictions.
In terms of implication, the proposed Seq2Seq model with weekly periodicity shows that the weekly periodicity significantly improves the predictability of the deep learning model focusing on air quality forecasting. One limitation of this research is that only the temporal patterns of data from monitoring stations are considered, but the spatial dependencies between stations are not integrated into the model. Future studies are expected to incorporate more spatiotemporal features and further consider the topological correlations among air quality monitoring stations when developing deep learning models.