Context-Aware Link Embedding with Reachability and Flow Centrality Analysis for Accurate Speed Prediction for Large-Scale Trafﬁc Networks

: This paper presents a novel method for predicting the trafﬁc speed of the links on large-scale trafﬁc networks. We ﬁrst analyze how trafﬁc ﬂows in and out of every link through the lowest cost reachable paths. We aggregate the trafﬁc ﬂow conditions of the links on every hop of the inbound and outbound reachable paths to represent the trafﬁc ﬂow dynamics. We compute a new measure called trafﬁc ﬂow centrality (i.e., the Z value) for every link to capture the inherently complex mechanism of the trafﬁc links inﬂuencing each other in terms of trafﬁc speed. We combine the features regarding the trafﬁc ﬂow centrality with the external conditions around the links, such as climate and time of day information. We model how these features change over time with recurrent neural networks and infer trafﬁc speed at the subsequent time windows. Our feature representation of the trafﬁc ﬂow for every link remains invariant even when the trafﬁc network changes. Furthermore, we can handle trafﬁc networks with thousands of links. The experiments with the trafﬁc networks in the Seoul metropolitan area in South Korea reveal that our unique ways of embedding the comprehensive spatio-temporal features of links outperform existing solutions.


Introduction
Accurate prediction of speed on traffic networks helps improve traffic management strategies and generate efficient routing plans. However, precisely estimating the traffic speed in advance has been a non-trivial task since various factors determine traffic flows in many ways.
Various approaches have been used for traffic speed prediction using statistical methods [1][2][3][4][5][6][7] and machine learning with neural networks with deep hidden layers [8][9][10][11][12][13][14][15][16][17][18]. However, the existing solutions are mostly limited to estimating traffic speed for either a single road link or a small-scale sub-network with only a handful of traffic links (e.g., a crossroad). In reality, a road system is a large-scale connected graph with the traffic links affecting each other's traffic flow over time in a more complicated fashion that cannot be explained easily with a simple statistical model. The works without a broader view of the traffic links may not adequately unravel the hidden, but critical speed prediction factors.
In this paper, we adapt the node embedding techniques introduced by Hamilton et al. [19]. We represent the relationship between links with a feature vector whose size is invariant even when certain changes are made to the traffic networks' structure, such as the addition or deletion of roads.
We analyze how the traffic flows in and out of a link through reachable multi-hop paths computed with the Floyd-Warshall algorithm [20]. How the links impact each other given the traffic flow analysis is quantified through a novel metric we refer to as Traffic Flow Centrality (TFC). We combine the data related to TFC with the external conditions around each link, such as climate and time information (e.g., time of day, the indication of holidays). We use a recurrent neural network algorithm to correlate between the composite feature's temporal transition and the traffic speed of each link. Our method captures the traffic flow dynamics through sub-networks around every link without embedding the entire adjacency matrix. Therefore, our method is much more space efficient, and it can handle large-scale traffic networks with thousands of links. We also do not face the problem of considering irrelevant information such as the hollow points around traffic networks when they are represented either with a sparse adjacency matrix or raster graphics [10,18]. Our solution assesses the impact of the remote links in addition to the adjacent neighbors on traffic flows. Hence, with the broader contextual view and the exclusion of unnecessary information, we expect to outperform the works that based their speed prediction only on the limited view of adjacent links.
This paper is structured as follows: In Section 2, we first review the related work. In Section 3, we introduce the method for inferring traffic speed given the temporal transitions of links' composite contextual features that are modeled with a novel link embedding technique. In Section 4, we benchmark the performance of our approach against existing works, and we conclude in Section 5.

Related Works
In this section, we put our work in the context of various related research works on traffic speed prediction. Recently, artificial neural networks with deep hidden layers have gained popularity, as they are effective at modeling the non-linear traffic speed dynamics. Some of the notable works have employed FNNs (Fuzzy Neural Networks) [8,9], DNNs (Deep Neural Networks) [11,21], RNNs (Recurrent Neural Networks) [13,14], DBNs (Deep Belief Networks) [12,15], and the IBCM-DL (Improved Bayesian Combination Model with Deep Learning) [17] models. These works have shown more accurate speed prediction than the approaches that are based on classic statistical methods such as ARIMA (Auto-Regressive Integrated Moving Average) models [1][2][3], SVR (Support Vector Regression) [4,5], and K-NN (K-Nearest Neighbor) [6,7].
When the models are obtained by learning the pattern on a single specific link [1][2][3][13][14][15]17], the distinct features of other links may not be adequately accounted for. Thus, modeling the traffic speed pattern per individual link was discussed in [22]. Nonetheless, the work by Kim et al. [22] still did not reflect the substructure of the traffic network around each link.
The works presented in [10,16,23,24] extracted spatial features from a visual representation of traffic networks. In particular, Zheng et al. [23] used a two-dimensional traffic flow that is embedded in Convolutional Neural Networks (CNN). They also used a Long Short-term Memory (LSTM) [25] algorithm to model long-term historical data. Similarly, Du et al. [16] represented the passenger flows among different traffic lines in a transportation network into multi-channel matrices with deep irregular convolutional networks. Guo et al. [10] analyzed the congestion propagation patterns based on the traffic observations recorded at fixed intervals in time and fixed locations in space. They represented the traffic observation in raster data. Then, all the raster data were fed into a 3D convolutional neural network to model the spatial information. They used a 3 × 3 convolutional kernel that includes the hollow points where no road lies and no traffic flows. The hollow points in the convolutional kernel may accidentally reflect stale traffic flows. Instead, Du et al. [16] used an irregular convolution kernel that refers to the traffic flow values from the adjacent traffic lines to fill in the values of the hollow points. These methods commonly exploit the CNN architecture that effectively models visual imagery [26][27][28]. Furthermore, a family of RNN algorithms such as GRU [29] and LSTM [25] was used so that repetitive temporal patterns can be discovered. However, these works did not capture the relation between the flow points on the two-dimensional spaces such as junctions, crossroads, and overpasses. Therefore, these models are susceptible to errors by correlating between irrelevant traffic flows. For instance, they may confuse overpasses and overlapping roads as crossroads. Furthermore, these works did not address the impact of the external conditions on the traffic flow. Learning the correlation between traffic flows and weather parameters was useful for flow prediction, as presented in [12,30]. However, they overlooked the impact of the substructure around traffic links on traffic flows.
More recently, modeling the traffic flow based on the graph representation of the traffic networks has emerged. The works presented in [18,[31][32][33] combined GNNs (Graph Neural Networks) [34,35] and RNNs to capture the temporal flow transition patterns given adjacency matrices that explicitly reflect the complex interconnections. These methods do not have to unnecessarily deal with the information irrelevant to the traffic flow, such as the hollow points in the visual traffic networks that Du et al. [16] had to consider forcefully.
ST-TrafficNet [33] used Caltrans PeMS (Performance Measurement System) data from around 20 links between intersection points and predicted traffic speed with stacked LSTM using a spatially-aware multi-diffusion convolution block. This PeMS data were from 350 loop detectors at 5 min intervals from 1 January 2017 to 31 May 2017. This model reflects spatial influence through multi-diffusion convolution with forward, backward, and attentive channels.
TGC-LSTM was used by Cui et al. [18] to predict the traffic speed on four connected freeways in the Greater Seattle Area. They used publicly available traffic state data from 323 sensor stations over the entirety of 2015 at 5 min intervals. With their model, traffic speed was predicted with the RMSE (Root Mean Squared Error) as low as 2.1. However, the GNN architecture has to be restructured whenever the traffic networks undergo some changes. This is because TGC-LSTM uses the entire adjacency matrix as an input to the GNN instead of embedding the features of individual traffic links. Upon any change to the traffic networks, we have to re-train from scratch with the newly updated GNN architecture. Furthermore, since TGC-LSTM uses a very large adjacency matrix, both the time and space complexity of modeling the network structure becomes high. However, more importantly, the larger the adjacency matrix is, the more sparse it becomes. Therefore, TGC-LSTM still faces the problem of incorporating unnecessary data such as the hollow points captured in a regular convolution kernel, as discussed in [10]. The shortcomings of these GNN-based approaches motivated us to devise a new method for embedding the characteristics of the traffic network.
We adapt the node embedding techniques introduced by Hamilton et al. [19]. We represent the relationship between links on the traffic network with a feature vector whose size is invariant even when any part of the network structure changes. We analyze how the traffic routes through a link via reachable lowest cost multi-hop paths that are computed with the Floyd-Warshall algorithm [20]. We compute every link's relative impact on other links based on its inbound/outbound traffic flow patterns and its neighbors' collective conditions. We refer to the relative cross-link impact value as Traffic Flow Centrality (TFC). We combine the features related to TFC with the external conditions around each link, such as climate and time information. We use a recurrent neural network algorithm to learn how such a composite feature change over time determines the traffic speed of each link.
Our method does not involve the process of embedding the entire adjacency matrix. Therefore, our solution is more space efficient and can easily handle large-scale traffic networks with thousands of links. Furthermore, it avoids incorporating irrelevant information such as the hollow points that can be present in traffic networks when they are represented with a sparse adjacency matrix or raster graphics [10,18]. Our solution considers the conditions of the remote links beyond the adjacent neighbors. By ruling out irrelevant information and having the broader contextual view, we expect to outperform the works that base their speed prediction myopically on the conditions of the adjacent links.
The advantage of our work, named TFC-LSTM, is summarized in Table 1, which shows the comparison between existing related works we have discussed so far. The "Traffic Network Structure" refers to the usage of the abstract representation of interconnections between links. The "Surrounding Conditions" refer to the consideration of external situations around links such as climate and time information. The "Traffic Flow Reachability Analysis" refers to the process of analyzing the pattern of traffic flowing in and out of links through reachable paths. The "Centrality Analysis" refers to the usage of the link's relative influence on others. The "Chains of Neighbors" column indicates the consideration of remote neighbors besides the adjacent ones when capturing the substructure around a link.

Chains of Neighbors
"X" means yes, and "O" means no.

Methodology
We outline the overall procedure for predicting the link's speed on the traffic networks, as shown in Figure 1. Given raw data such as adjacency, distance, and speed in a data tensor, we compute reachability information such as hop count, distance, and cost of traffic flowing between a pair of source and destination links through a chain of neighboring links. Paired with other external features such as weather conditions, time of day, and day of the week, we expect that encoding the dynamics of the temporal traffic flows on the neighboring links would significantly improve the prediction of links' speed. How we generate the embedding of complex context-aware spatio-temporal features is explained in greater detail in Section 3.1. We model the correlation between the input feature and each link speed with a Long Short-Term Memory (LSTM) algorithm. We use 512 perceptrons in the hidden layer, ReLU for the activation function [36,37], and Adam for the optimizer [38,39].

Link Embedding
Feature representation of a link is done in the following three steps: First, we abstract the traffic system as a link adjacency matrix A. A traffic network illustrated in Figure 2A is abstracted further as a directed graph structure, as shown in Figure 2B, with every directional link being uniquely identified. Figure 3A shows the link adjacency matrix for the sample traffic system in Figure 2A. A l i l j denotes whether traffic can flow from link l i to link l j . For instance, according to the example in Figure 2A, A l 2 l 3 = 1 since traffic can flow from l 2 to l 3 . Otherwise, A l 2 l 3 equals 0. We exclude the links through which no traffic can reach any other parts of the traffic system. For instance, l1 and l16 are disconnected from the rest of the traffic system. Figure 3B illustrates a matrix of the distance between every pair of adjacent links. Given A l i l j , Figure 3C shows the speed of the traffic flowing from l i to adjacent link l j .  Second, we conduct a traffic flow reachability analysis. For every link l, we extract all inbound multi-hop paths through which traffic traverses towards l. Oppositely, we compute all outbound multi-hop paths through which the traffic originated from l diffuses. Given A l i l j , we run the Floyd-Warshall algorithm [20] to obtain the minimum time it takes to travel from every source link l i to all other destination links l j , as shown in Figure 3F. We generate a matrix of hop counts and minimum distances from every source link l i to all other destination link l j , as shown in Figure 3D,E, respectively. For instance, the traffic on l 5 (at Hop 3) takes l 6 (at Hop 2) to reach l 9 at time t 2 in Figure 4B as opposed to taking l 4 (at Hop 1) at time t 1 in Figure 4A due to the heavy congestion on l 4 . After we obtain the paths, we discard the links through which traffic cannot reach the source link before the beginning of the next time window. For example, if the traffic on l 9 is too slow to reach the remote link l 15 by the end of the current time window, then we discard l 15 from the reachable outbound path from l 9 , as shown in Figure 5.  In the final step, we compute the Z value for every link, which we refer to as the traffic flow centrality. Given the matrix of minimum time to travel from source link l i to destination link l j ( Figure 6A), we count the number of inbound and outbound reachable paths (R), (P in and P out )for every link using Equation (1), as illustrated in Figure 6B. We also compute the speed and distance between every pair of adjacent links on the reachable paths. Equation (2) defines (ρF) as the weighted sum of the average traffic speed (v) on every link i on the multi-hop reachable paths. The weight of each intermediate link i is the inverse of the product between the fanout f and the distance d to i, where f specifically represents the number of alternate paths on a junction. The weight represents the impact on a given link the current traffic is either destined to or stemmed from. We expect the impact to be sensitive to the f and d values. For instance, we can capture the circumstance where the traffic on links with higher f and d values are less likely to move towards a target link l than the traffic on a link with lower f and d values. This is because traffic on the link with higher f and d values is more likely to veer away by taking different turns on the junctions or halt the transition at any point on the path. As an example, computing the inbound and the outbound ρF values for the link l 9 is illustrated in Figure 6C,D. The aggregation steps above are to account for the traffic flow dynamics around every link. Note that the links adjacent to each other are also inter-dependent on each other with regards to computing their Z values. Due to the inter-dependence, we have to compute Equations (4) and (5) iteratively until the condition on Equation (6) holds. We obtain a converged Z value when Equation (6) is satisfied. We re-scale the ρF values to L x through normalization as specified in Equation (3). Suppose we have a set of adjacent inbound and outbound neighbors N in and N out , respectively, for a given link. Then, we take the sums of L x /e hop of every neighbor in N in and N out . The division by e hop reflects that the impact of a link's L x value on its immediate neighbor is inversely proportional to the hop distance between them. We take the natural logarithmic function on the sums as shown in Equation (4) and normalize the value as defined in Equation (5). Whenever Equation (4) repeats, L n is substituted by Z obtained in the previous iteration. The value in Equation (6) is for determining the converged Z. We empirically set the value that leads to the most accurate link speed prediction. The iterative computation of the Z value for every link in a traffic network is illustrated in Figure 7.
The traffic flow centrality is to capture the inter-link relationship, and we expect it to be one of the key factors for accurately predicting traffic speed. Intuitively, it is highly probable that a link would experience traffic swarming in and causing congestion, especially when a large portion of its immediate neighbors also experience high inbound traffic through the reachable paths. Furthermore, a link with several surrounding links that spread out traffic quickly is likely to disseminate its traffic more easily. We embed the 7 features, (V, P in , P out , ρF in , ρF out , Z in , Z out ), into a vector for every link, where V is the traffic speed. We make this link embedding more context-aware by simply concatenating an additional vector of external conditions surrounding the link. The external conditions include temperature, precipitation, time of day, day of the week, and an indication of whether a given day is a public holiday. The steps for generating the final input matrix we feed into a neural network for speed prediction is illustrated in Figure 8.
No matter how many adjacent neighbors a link has, the length of the input vector remains invariant, thus making our solution resilient to changes such as the addition or deletion of neighboring links. We do not take the entire adjacency matrix as an input to the prediction engine based on recurrent neural networks. Therefore, our approach becomes space-efficient. Our input vectors contain only the essential information instead of the initial raw adjacency matrix that is sparse. With every link expressed with highly distinct features, we expect it to yield better prediction results. Note that the input vectors are computed at every time window. In the following section, we introduce the method for modeling the transition of the features over time.

Modeling the Temporal Patterns with Recurrent Neural Networks
Given the input matrix generated at a time window, we predict the traffic speed of every link in the next time window using recurrent neural networks, such as GRU and LSTM. With the hidden layers, we can model a complex non-linear relationship between the input and the output values. Specifically, the example of feeding in the input features of every link and predicting its speed through an LSTM network is illustrated in Figure 9. The states summarized in the previous time window are fed into the block of hidden layers at the subsequent time window. Besides the previous states, we feed in the updated input feature vectors of every link. By training this network, we can model the spatial information's temporal transition, such as the transition of the traffic system's structure, traffic flow dynamics, and external conditions. We also make the correlation between the traffic speed of each link and the temporal transition of the most comprehensive set of features to date.

Evaluation
In this section, we evaluate our approach using the data from TOPIS (Seoul Traffic Operation and Information service) https://topis.seoul.go.kr/, which contain hourly average speed information for the 4670 major traffic links in the Seoul metropolitan area. We obtained data at one hour intervals for 8760 h worth of data from 00:00 on 1 January 2018 to 23:00 on 31 December 2018. Temperature and precipitation information of each link was obtained from the nearest weather station. We retrieved the climate readings from KMA's https://data.kma.go.kr/ AWS(Automatic Weather System). We took 60% of the data as a training set, 20% as a validation set, and the rest as the test set. The attributes of the dataset used in this paper are listed in Table 2 with basic statistics, units, measurement intervals, and data types. We conducted our experiments on NVIDIA DGX-1 with an 80-Core (160 threads) CPU, 8 Tesla V100 GPUs with 32 GB of exclusive memory and 512 GB of RAM. NVIDIA DGX-1 was operated with the Ubuntu 16.04.5 LTS server, and the machine learning tasks were executed through the Docker containers. The machine learning algorithms were implemented with Python (v3.6.9), Tensorflow (v1.15.0), and Keras (v2.3.1) libraries. We used ReLU for the activation function [36,37] and Adam for the optimization function [38,39]. The learning rate was empirically set to 0.001. The early stoppage was configured by setting the patience value to 200 through Keras. To measure the prediction performance, we employed the metrics as follows: (1) MAE (Mean Absolute Error) (Equations (7)); (2) RMSE (Root Mean Squared Error) (Equation (8)); and (3) MAPE (Mean Absolute Percentage Error) (Equation (9)). y t andŷ t are the predicted speed and the actual speed, respectively. n is the number of test cases. The value for checking the convergence of the Z value, as defined in Equation (6), was empirically set as shown in Table 3. With = 0.005, we were able to achieve the lowest MAPE.

Measurement of Prediction Performance
We measured the prediction performance between various approaches denoted as DNN, TFC-DNN, GRU, TFC-GRU, LSTM, and TFC-LSTM. The prefix TFC-stands for Traffic Flow Centrality, and it represents the following seven novel features that we introduced in Section 3, i.e., V, P in , P out , ρF in , ρF out , Z in , and Z out . We evaluated the effectiveness of the information about external conditions such as climate and date information separately. DNN, GRU, and LSTM are the artificial neural network architectures we employed for machine learning. Table 4 contains the prediction performance of each approach. For each model, we picked the empirically best hyper-parameter settings, such as the number of hidden layers, perceptrons per layer, and the number of time windows for the recurrent neural networks. For double hidden layers, we had 64 and eight perceptrons for the first and the second layer, respectively. For a single hidden layer, we used 512 perceptrons. We cited the representative existing works that fall under the prediction model categories that do not use our techniques. We did not compare the methods that could not deal with the traffic networks that scale to thousands of links. MSE values during the training and validation stage are plotted on the graphs in Figure 10. MSEs converged at epoch = 400.
The consideration of every link's external conditions was effective only when used by LSTM and TFC-LSTM. On the other hand, we observed that using the features related to traffic flow centrality consistently led to improvement over the baseline approaches. However, when both the external conditions and the features related to traffic flow centrality were used with LSTM, i.e., TFC-LSTM, we achieved the lowest MAPE of 10.39. The DNN-based approaches without the information about the external conditions performed poorly compared to the recurrent neural network models because it does not model the temporal transitions of the link features.

The Effect of Reachable Path Length Cutoff
We noticed that a link in the Seoul traffic system could be reached from any other links within an hour regardless of the distance, even during rush hours with heavy traffic, according to the Floyd-Warshall algorithm we used for retrieving the lowest cost inbound and outbound paths between any OD pairs. It turns out that the Floyd-Warshall algorithm retrieved several unrealistic reachable paths between any OD pairs by ignoring specific restrictions on some of the traffic links. For example, the algorithm generated some routes that included wild U-turns that are not allowed on some roads. As a result, path lengths measured as the number of hops tended to be excessively long. Thus, we were unnecessarily taking into account the state of the links for which it is practically impossible to influence the state of the remote links.
One possible solution to this problem is to have a shorter time window than an hour. However, TOPIS only makes the hourly data available to the public. Therefore, we revised the algorithm to limit the path length instead. When overall traffic flows at the lowest average speed, we limited the reachable inbound and outbound path lengths to 15 hops. On the other hand, for the period when overall traffic flows at the highest average speed, we relaxed the length limit to 45 hops. Furthermore, the link's low speed may be attributed to the congestion on the neighboring links in the vicinity. Thus, reachability from other links to the low-speed link is also limited. Therefore, it is sufficient to consider only the links in proximity. Compared to the period of lowest average traffic speed ("Lowest Speed Period" in Table 5), the reachability of the traffic from one link to the others is higher during the highest average traffic speed period. Thus, for such a period ("Highest Speed Period" in Table 5), we considered the states of the other links within a wider range. Our simple revision of the algorithm is empirically proven to be effective, as shown in Table 5. It shows the prediction performance for different path length cutoff settings. We observed that our approach performed most effectively with an MAPE of 10.39 when the range of neighboring links to consider was proportional to the traffic's average speed.

The Discussion on Scalability
One of the merits of our approach is the capability to predict the traffic speeds even for a very large-scale traffic network such as the road system of Seoul. The larger the traffic network is, the higher the opportunity is to predict speed accurately. This is because we can consider more conditions around the links that are often overlooked by the existing works. The works that only consider the conditions of adjacent neighbors [18] exhibited a decline in prediction accuracy compared to what was originally reported and performed worse than our approach. This is because their approaches were unableto capture the farther away links' highly probable influence.
Naively feeding in the raw adjacency matrix was the most space inefficient approach [40,41]. We could not even compare the prediction performance with such approaches, as they quickly encountered out-of-memory errors when dealing with Seoul's large-scale road system. Our approach is agnostic of the scale of the network and even the structure change. Regardless of the scale and any changes, we ran the aggregation functions to compute the fixed-length input feature vector concerning the traffic flow centrality. Therefore, we did not need to restructure the neural network architecture upon changes to the traffic system (i.e., addition/deletion of links, change of adjacent links).
However, we dealt with a fragment of the entire national traffic system in South Korea. The traffic networks in Seoul are connected to the systems in other districts such as Gyeonggi Province and Incheon metropolitan area. Due to the fragmented view, we accidentally identified the links that bridge between different traffic networks as dead-ends, as shown in Figure 11. We could not accurately reflect the traffic flow centrality for these dead-end links. The Z out value was zero for these dead-end links because there is no way out. The Z in value is not credible as the inbound traffic from other regions was not accounted for. The inaccurate Z values on the dead-end may negatively impact the neighboring links' Z values.
For this issue, we applied the average Z value of the whole system as the Z value of the dead-end links, which still cannot be viewed as an ideal solution. This motivated us to venture into applying our approach to predicting speed prediction for all the links in the entire national traffic system. By expanding the view of the traffic network, we expect the accuracy to improve further. This is planned to be done soon after we are given integrated traffic information from all Korean regions. Figure 11. Bridge links between traffic networks subject to analysis and out-of-range traffic network accidentally being recognized as dead-end links.

Conclusions
We presented a novel method for describing the dynamic circumstances around any given traffic links. Specifically, by using a new measure called traffic flow centrality, we were able to concisely express the dynamics of the traffic flow in and out of any given link. Through this measure, we can reflect on the inherently complex ways in the interconnected traffic links affecting each other. Combined with the information about the external conditions surrounding the links, e.g., climate and time of day, the new features and their temporal patterns are used for predicting traffic speed. According to the information available from TOPIS (Seoul Traffic Operation and Information service), training with the LSTM algorithm given our comprehensive spatio-temporal features yielded the lowest prediction error with an MAPE of 10.39. Our experiment also shows that our solution is easily applicable to large-scale traffic systems. We could predict the traffic speed on the road network with thousands of links, unlike the existing works without any efficient feature embedding approaches.
As future work, we plan to apply our solution to the nation-wide traffic system.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; nor in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: OD Origin and Destination TFC Traffic Flow Centrality (Z value)