Weather-Aware Long-Range Trafﬁc Forecast Using Multi-Module Deep Neural Network

: This study proposes a novel multi-module deep neural network framework which aims at improving intelligent long-term trafﬁc forecasting. Following our previous system, the internal architecture of the new system adds deep learning modules that enable data separation during computation. Thus, prediction becomes more accurate in many sections of the road network and gives dependable results even under possible changes in weather conditions during driving. The performance of the framework is then evaluated for different cases, which include all plausible cases of driving, i.e., regular days, holidays, and days involving severe weather conditions. Compared with other trafﬁc predicting systems that employ the convolutional neural networks, k -nearest neighbor algorithm, and the time series model, it is concluded that the system proposed herein achieves better performance and helps drivers schedule their trips well in advance.


Introduction
The intelligent transport system (ITS) is currently being developed for the road transport network, and the latest information and communication technology are being worked with. Many ITS applications, such as traffic control, emergency detection, parking guidance, etc., have extensively incorporated deep learning, and they showed significant performance improvement [1][2][3]. This study reports a novel design of an intelligent traffic forecasting system using deep learning, applied alongside ITS for efficient driving and urban planning.
Many studies have used deep learning algorithms for traffic prediction. Some research studies have considered exploiting neural networks for better analysis of the spatio-temporal correlation of traffic data. The authors of [4] propose a short-term traffic forecast model using a two-dimensional long short-term memory (LSTM) network. In [5], the authors use a convolutional neural network (CNN) by converting traffic data from multiple road sections into 2D images. In [6,7], the authors adopt Google DeepMind's WaveNet network and a hierarchical temporal memory model, respectively.
Research has been conducted for predicting traffic behavior in unusual traffic conditions. In [8], the authors propose a traffic prediction method based on multi-task learning. They classify traffic data based on traffic situations and construct situation-aware prediction models. The authors of [9] suggest a pattern sensitive network with adversarial training. They train a joint distribution model to improve prediction accuracy under unusual traffic conditions. In [10], the authors adopt a regularized loss function to balance the predictions of light and heavy traffic conditions. Some studies have addressed more interesting factors affecting traffic behavior. In [11], the authors introduce a rainfall factor into the deep belief network (DBN) and LSTM to predict urban traffic flow. The authors of [12] combine the flow-based and weather-based traffic model with a DBN, and the authors of [13] propose a long-term speed prediction method based on LSTM. They analyze the correlation between the historical parameters of drivers, vehicles, roads, and traffic.
Research on other data-driven prediction models is still ongoing. In [14], the authors introduce an iterative k-nearest neighbor predictor for long-term traffic forecasting. The authors of [15,16] propose k-nearest neighbor traffic prediction models that adopt advanced similarity measures and dynamic parameter tuning, respectively, and the authors of [17] propose a short-term traffic prediction method based on an adaptive multi-kernel support vector machine.
Other researches continue to improve time-series based prediction [18]. Authors of [19] proposed a traffic prediction scheme using both historical and real-time data. They identified traffic situation with bayesian network and fused auto-regressive integrated moving average and periodical moving average model with neural networks. Authors of [20]proposed a long-term traffic forecasting model based on functional nonparametric regression. They adopted an autocorrelation analysis and a functional principal component analysis for better accuracy.
This study proposes a new traffic forecasting model using a multi-module deep neural network (DNN) framework, called the time and weather aware DNN (TW-DNN). This system is developed based on our previous design [21], to generate more reliable and complete long-range traffic data. The framework contains two blocks of neural network modules. The first block extracts various features from the input, and the second block fuses the features to obtain a traffic prediction output. Large amounts of data such as weather forecast data, road network information, and traffic speed and flow data are collected to train the system. The experiments are conducted focusing on multiple traffic situations, including rush hours, holidays, and heavy rains, and the performance is compared with that of different but equally capable systems.
The remainder of this paper is organized as follows. Section 2 introduces the fundamentals and characteristics of traffic flows. Section 3 presents the proposed intelligent traffic forecasting using multi-module DNNs.
Section 4 verifies the proposed algorithm via data-intensive measurement-data-based performance evaluation results. Finally, Section 5 concludes this study and presents future research approaches.

Fundamentals and Characteristics of Traffic Flow
Because road traffic is determined by the flow of cars, the prediction system should reflect the information properly. Let f (x, t) be the traffic flow at the t-th time interval and the x-th location. The traffic flow consists of three components; speed v(x, t), flow q(x, t), and density d(x, t). They satisfy the following relationship: Taking the changes of time and space into account, the traffic flow should meet the following conservation equation. g(x, t) below denotes the entering flow of the x-th location at t: Although they can be computed and used in real life, traffic predictions and simulations may not make satisfactorily predictions for the following reasons.

•
On the real road, traffic flow can also be affected by non-traffic factors. For example, the traffic flow during heavy rain could exhibit unusual speed and density compared to a regular traffic flow.
In [12], the authors verified the correlations between traffic flow and weather factors, including temperature, wind speed, and rainfall. Furthermore, authors of [22] examined that daily weather conditions also affected the daily traffic flow.
• Congested traffic shows a different behavior to regular traffic. Authors of [8,10] proposed traffic predictions to cope with heavy traffic situations. Day of the week and the time of the day help forecast congestion since it usually happens at rush hours and holidays. • The actual time and location interval typically do not match those of the numerical model. The authors of [23] presented the constraints of time and location intervals to verify the behavior of the model. However, the traffic data of some highways do not satisfy those constraints because of an insufficient number of installed sensors.

New Traffic Forecasting Model Using Multi-Module Deep Neural Networks
The TW-DNN, which precisely forecasts the speeds of highway networks, is introduced in this section. Before describing the system, we first introduce the major variables that will be used, as well as their formats. Traffic data consists of data regarding the speeds and single-lane flows of each predefined road section (i.e., links), and time. The set of links constructs the highway graph, where edges and vertices correspond to links and the intersections, respectively. The specific information of a link such as the number of lanes, length, highway route number, direction, and link shift (the distance of the current link from the origin of the highway) is stored in the corresponding edge. Because traffic varies with time, the temporal information such as the date, time, and holiday information is included. In addition, weather data is included; from these, we consider hourly weather as well as daily weather information. The hourly weather information represents the local weather of the link, whereas daily weather covers the weather of nine major cities of South Korea, measured and forecasted by the Korea Meteorological Administration [24]. Although hourly weather is not available, we estimate it by linear interpolation. Now, the traffic forecasting system updates the weather every hour. Table 1 shows an example of inputs for one link. Date, time, and weather data are renewed at the corresponding update time. All numerical data are normalized to be within [0, 1.0]. TW-DNN forecasts all traffic values for a given time range [t start , t end ] with a resolution of a time interval (or window) ∆t. As shown in Figure 1, there are two parts functioning together depending on the prediction periods, as explained below.

1.
The initial traffic predictor predicts the initial speeds and single-lane flows for the first R-time intervals, [t start , t start + (R − 1)∆t], using the given link information, date, time, and weather.

2.
The iterative traffic predictor computes the speeds and single-lane flows for each link type (e.g., ordinary link, terminus link, and interchange link) of the following intervals [t start + R∆t, t end ] sequentially, i.e., starting from t start + R∆t, followed by t start + (R + 1)∆t, ..., t end . The predictor works in a manner similar to that of a discrete simulation model; it updates the input by the current values to proceed to the next time interval. Both predictors contain two blocks of neural network modules. Each module serves with the different input data classes, i.e., traffic data, link information, date, time, and weather. The neural network with multiple modules has the following characteristics [25].
• Because the number of input features is reduced and simplified, the neural network can be easily optimized. • By assigning each input feature to the proper network architecture, prior knowledge of the data can be well integrated into the predictor.
To find out the optimal network architecture of predictor, 1% of training dataset is uniformly sampled as a validation set. The validation set is not used for training and testing. It helps to select the best predictor among the group of the predictor with randomly designed network architecture.

Iterative Traffic Predictor
After the initial predictor performs its operations, the iterative traffic predictor is in charge of delivering all speeds and single-lane flows, for the time intervals [t start + R∆t, t end ], in a sequential manner. When the traffic flow of one time-window is determined, the predictor proceeds to the next window and accepts a new input. The input consists of date, time, weather, and link information, as well as the traffic data of the previous window. Notably, the speeds, single-lane flows, and link shifts (from link information) from previous windows are converted and rearranged as described below: 1.
The multi-lane flows (total flows of all lanes), single-lane densities, and multi-lane densities are computed using the speeds and single-lane flows.

2.
The traffic data and link shifts are rearranged in a 2D (two dimensional) array by spatial and temporal order. 2D array displays the temporal and spatial correlation of traffic data well. Link shifts are used to reflect the distance between links. Figure 3 shows the rearranged 2D array of link x whose nearby links are x − 1 and x + 1, respectively. The 2D array is formed to reflect the road network connections; thus, link types are classified into •

Ordinary link-A middle link connected to a single entrance and exit link, • Terminus link-A link connected to local roads at the endpoint of a highway, •
Interchange link-A link having an additional connection of an on-ramp or off-ramp to a joining link. Figure 4 shows the structure of the predictor used in the iterative traffic computation. Compared to the initial traffic predictor, they have an additional module (2D CNN) that accepts the traffic data obtained in the previous window.  Figure 5 introduces the detailed structure of the 2D CNN that extracts features from the previous traffic data, considering their spatial and temporal correlation. Inspired from [18], we split the ordinary convolution having 2 × 2 filters into 2 × 1 and 1 × 2 filters, named Conv2d-space and Conv2d-time, respectively. They separately process the two parts to prevent interference of the information, resulting in better prediction performance. The results are combined into the single feature map on the depth concatenation layer before being fed into the next convolution. At the flatten layer, the feature map is rearranged into a 1D array, which is suitable for the remaining fully connected layers. Finally, a series of fully connected layers remaps locally extracted features to yield global representations.

Experimental Setup
We tested the developed forecasting system, TW-DNN; the initial setup is described as follows. For the purpose of comparison, we employed three general traffic prediction models, CNN, k-nearest neighbor, and autoregressive integrated moving average predictors presented as CNN-only, k-nn, and ARIMA, respectively [5,14,26]. They were chosen as representatives of the neural network model, non-neural machine learning model, and time series model, respectively. Figure 6 shows the simplified view of the TW-DNN, CNN, k-nn, and ARIMA. All models iteratively forecast speeds based on the previous traffic data. In the prediction, TW-DNN and CNN-only consider many nearby links, whereas k-nn and ARIMA consider one link at a time. In CNN-only, the prediction results are output after the data traverses through convolution, pooling, and fully connected layers. Analogous to TW-DNN, the CNN-only also rearranges supplied input into a 2D array. Meanwhile, k-nn computes the speed by finding the nearest speed values from the stored dataset that are matched to the previous R windows. It selects a set of k speed data that minimize the prediction error D (squared error, or the distance), defined as where L represents an index of the stored data. Here, v candidate (x, t) and v train (x, t) denote the speed values of a candidate and measured data in the training set, respectively. The prediction output is the average of such k speed values. ARIMA predicts the speeds with the linear combination of speed data of previous R windows, as well as the difference and the noisy term of the predicted speed.
Each prediction model was responsible for forecasting speeds for a time interval of 24 h (i.e., t end − t start = 24 h). Three cases of the input were used to represent different road traffic states in the prediction: a regular day, a holiday, and a day with severe weather. These cases are selected from year 2016, and weather records for those days are loaded into the TW-DNN. All models are properly modified to fit our experiments and road network to deliver the best results.
The following are the specific values used in the experiments.
• Prediction time interval ∆t = 5 min • Historical data used for the current window R = 8, R∆t = 40 min ∆t is equal to the time interval of traffic records. The initial traffic data of all three models were obtained from the initial traffic predictor of TW-DNN. We used a rectified linear unit and an Adam optimizer [27] as a default activation function and optimizer, respectively. For k-nn, we empirically chose k as 5 to yield the best results.
The experiments used a typical highway network surrounding a big city, for which Daegu, South Korea, was selected. It consists of four bidirectional highways with 241 links, as shown in Figure 7. The network has five endpoints and 18 interchanges. The traffic data used in the training were obtained from the Korea Expressway Corporation [28]. This dataset is a collection of traffic data of all nodes and links recorded every five minutes. Traffic flow and weather data of 2015 were used as a training dataset, which was preprocessed prior to utilization in the prediction models as follows. • When the training dataset was extracted, speeds and flows larger than the 98th percentile were removed as outlier data. Experiments were performed using a Linux-based workstation having an Intel i7-7700 CPU with 32 GB RAM and a GPGPU of the Nvidia GeForce GTX 1080. The prediction models were programmed using scikit-learn, TensorFlow, and pmdarima packages. Prediction errors are expressed using standard forms such as the mean absolute error (MAE) and mean absolute percentage error (MAPE). They are computed as shown below: where A i , F i , and n denote the real (measured) value, forecast value, and amount of data, respectively. Figure 8 plots the MAPE of each prediction model. The errors of CNN-only, k-nn, and ARIMA are unusually large during dawn and rush hour, respectively. In most cases, TW-DNN produces the best results, with only a smaller error (by 4% at most), compared to the others.  Figure 9 shows the predicted speeds of each model in colors; this is called a heatmap. Blue and yellow represent low and high speeds, respectively, while white only in the real data represents missing data. Both the link (location) and time are displayed in the vertical and horizontal axes, respectively. Careful observation of the real data in Figure 9a indicates that speeds are populated between 80 and 120 km/h in general. The real traffic data show two interesting phenomena:
TW-DNN data shows a close match to the real data, while CNN-only, k-nn, and ARIMA data do not.

Case II (Holiday)
Case II corresponds to the holiday season, during which heavy traffic is present on the roads. Figure 10 shows the MAPE, where significant errors appear around 12:00. TW-DNN outperforms others, showing smaller errors by up to 6%, 3%, and 5%, compared to CNN-only, k-nn and ARIMA, respectively. The heatmap of Case II is shown in Figure 11. In real traffic, congestion occurs between 12:00 and 20:00 in some links. TW-DNN detects congestion successfully except in some links. It also forecasts the regular traffic condition similar to the real data. Although the other models do not detect the congestion, they somehow only show small error in this case.

Case III (Severe Weather Day)
Case III covers a day in which the rainfall starts at 2:00 and continues until the end of the day. TW-DNN shows the least MAPE among the models, as shown in Figure 12, and significant gaps between TW-DNN and others can be observed around 4:00 and 17:00. The phenomena are better explained in the speed heatmap. The real data show some speed slow-down for the periods of 02:00-05:00 and 15:00-19:00. TW-DNN predicts the congestion due to the rainfall, especially at dawn and rush hours. However, CNN-only, k-nn, and ARIMA cannot predict it because they do not consider weather factors at all.

Summary and Discussion
The overall MAE and MAPE of the prediction models are summarized in Table 2. The bold numbers show the best prediction errors for each case. TW-DNN has the lowest error in all three cases. How accurately the models forecast the congestion is summarized in Table 3. Congested and congestion-free are distinguished from the speed of 70 km/h. TW-DNN delivers the best result. Although heatmaps in Figures 9, 11, and 13 show the close match of prediction between TW-DNN and actual data, the prediction accuracy does not seem good enough, i.e., TW-DNN predicts lighter and shorter congestion than actual traffic data.  The novel features of our prediction model are listed to summarize the experimental results.

Concluding Remarks and Future Work
In this study, a novel multi-module DNN framework was proposed for intelligent traffic forecasting. The framework was designed to improve our previous system, realize a long prediction period range, and obtain a weather-aware traffic prediction using short-term and long-term weather forecast data. Experimental results proved that our framework exhibits an efficient performance on forecasting traffic for all kinds of driving situations such as regular days, holidays, and days of severe weather.
More work would be needed to improve the computation speed and accommodate the live traffic-related information encountered during driving. The system may need a new module in charge of interacting with the current ones to dynamically reflect such real-time traffic data.