Short-Term Trafﬁc Forecasting: An LSTM Network for Spatial-Temporal Speed Prediction

: Trafﬁc forecasting remains an active area of research in the transport and data science ﬁelds. Decision-makers rely on trafﬁc forecasting models for both policy-making and operational management of transport facilities. The wealth of spatial and temporal real-time data increasingly available from trafﬁc sensors on roads provides a valuable source of information for policymakers. This paper adopts the Long Short-Term Memory (LSTM) recurrent neural network to predict speed by considering both the spatial and temporal characteristics of real-time sensor data. A total of 288,653 real-life trafﬁc measurements were collected from detector stations on the Eastern Freeway in Melbourne/Australia. A comparative performance analysis among different models such as the Recurrent Neural Network (RNN) that has an internal memory that is able to remember its inputs and Deep Learning Backpropagation (DLBP) neural network approaches are also reported. The LSTM results showed average accuracies in the outbound direction ranging between 88 and 99 percent over prediction horizons between 5 and 60 min, and average accuracies between 96 and 98 percent in the inbound direction. The models also showed resilience in accuracies as the prediction horizons increased spatially for distances up to 15 km, providing a remarkable performance compared to other models tested. These results demonstrate the superior performance of LSTM models in capturing the spatial and temporal trafﬁc dynamics, providing decision-makers with robust models to plan and manage transport facilities more effectively.


Introduction
The new technologies and advances in data analytics are providing innovative ways to sense and manage transport networks. The fast pace of breakthroughs in these technologies, including artificial intelligence and machine learning, is relentless and continues to unfold on many fronts. By determining when and how to take advantage of these technologies, policymakers and decision-makers have unique opportunities to enhance people's access to services and facilities, improve travel time reliability for road users, and enhance the economic and infrastructure productivity of vital infrastructure assets [1]. Short-term traffic forecasting has been an active area of research for more than three decades and has been discussed widely in the literature using a variety of theoretical models based on either simulated or field data [2]. Field data can be most useful for model development [1,3] but can be difficult to collect such that the spatial and temporal characteristics are captured. In the absence of reliable field data, many studies reported in the literature have instead used simulated data generated from well-calibrated and validated traffic simulation models [4]. Although a large number of studies have used advanced computational models to develop accurate forecasting methods that deal with the complexity of massive amounts of data, the literature on comparative evaluations of different models based on the same set of data is scarce. The ability to undertake such comparative evaluations based on the same set of Future Transp. 2021, 1 22 field data to be used for short-term traffic forecasting will help identify models, which in turn can improve road user experience and enhance operational capabilities. This paper demonstrates the feasibility of using advanced AI techniques for shortterm traffic forecasting, taking into consideration the spatial and temporal characteristic of the data using the Long Short-Term Memory (LSTM) prediction network. This model is developed using historical data extracted from 8 upstream and downstream detector stations in Eastern Freeway, Melbourne, Australia.

Contribution
The objective of this paper is to develop and evaluate robust models that can be used by policy and decision-makers to plan and manage transport facilities in an effective manner. While the topic of traffic forecasting has received considerable attention in the past, the literature that examines the spatial and temporal interactions of traffic phenomena remains scarce. Hence, this paper considers the locations between each detector station and evaluates the effectiveness of the proposed model considering the influence of locations as well as the temporal evolution of traffic. Another key contribution, which has not been investigated before, is the feasibility of replicating models that have been developed for a particular location to other locations. This type of analysis is important when the input data of one location is missing. From a practical perspective, the ability to forecast and predict traffic conditions helps decision-makers plan ahead rather than rely on existing reactive systems that do not have any predictive capability. Another contribution of this paper is the development and evaluation of spatial-temporal traffic forecasting models using large field data sets (288,653 observations) that have been validated from multiple locations on the road network in Melbourne.
This paper is organised as follows: Section 2 provides a scan of the literature on the topic. Section 3 discusses the model formulation and modelling architectures used in the study. Section 4 describes a case study that includes data collection and pre-processing, and model evaluation, including results and findings. Finally, Section 5 presents conclusions and future research directions.

Literature Review
The reform of urban mobility still presents major challenges to policymakers around the world. Despite years of investment in road infrastructure, the traditional approach of focusing on building out of congestion through additional road capacity did not meet with much success. In recent times, more focus has been given to managing the demand for travel. Among the many demand management solutions, pro-active or predictive traffic forecasting has been identified as an essential component of improving traffic conditions [5]. In short-term traffic forecasting, estimating traffic conditions usually focuses on speed and flow predictions. Speed prediction, in particular, is directly related to proactive traffic control system development [6,7]. However, estimating vehicle speed is problematic because it is affected by driver behaviour, road environment and the spatial and temporal complexities of traffic conditions. Solving this complex and uncertain speed prediction problem has been the focus of many studies in the literature [8,9]. Methodologies used for speed prediction can be classified into two categories or approaches: parametric and non-parametric [10]. These are discussed next.

Parametric Approaches
The parametric approaches, also known as model-based methods, are based on certain theoretical assumptions of the predetermined model structure. The model parameters can be computed using empirical data [11,12]. Most commonly used parametric time-series analysis approaches include linear models such as the Autoregressive Integrated Moving Average model (ARIMA) [13], seasonal ARIMA, that is, the SARIMA model [14], the exponential smoothing model [15,16] and ARIMA with a Kalman filter (KF) [17,18]. In an early reference [19], the authors found the ARIMA model was capable of representing freeway time-series data accurately. Numerous extended models of ARIMA have emerged due to its superiority in forecasting traffic dynamics [20,21]. These parametric approaches require high-quality data where the sequence needed to be stable and accurate. As most real-life traffic data are unstable and stochastic, this has limited their use and applicability in complex traffic prediction applications [22].

Non-Parametric Approaches
Non-parametric approaches distinctly predict traffic conditions as these models do not consist of fixed model structures and parameters [23]. Specifically, non-parametric methods contain relaxed assumptions for inputs and are hence are more capable of processing missing data, noisy data and outliers [24]. With the recent advancements in machine learning, many models have shown great potential in solving non-linear problems even when using complex and multi-source field data. Such models include Support Vector Regression (SVR) [25][26][27] and Artificial Neural Networks (ANNs) [28,29].
For example, Ref. [30] compared four parametric approaches (Constant Speed (CS), Constant Acceleration (CA), SUMO simulator model SUMO, and Intelligent Driver Model (IDM)) and two non-parametric approaches (Gaussian Mixture Regression (GMR) and the Artificial Neural Network (ANN) technique) to forecast speeds on highways. The results showed non-parametric models consistently outperformed other models for both short and long term predictions. Compared to parametric approaches, non-parametric approaches can result in higher accuracies [31], particularly, neural networks that showed the best prediction results [32]. In another comparative study [33], Back-Propagation Neural Networks (BPNNs) and traditional approaches were compared, and the results showed the BPNN to be superior and more responsive to dynamic conditions. Therefore, for complex, non-linear problems like traffic prediction, ANN models are more accurate with stronger capabilities for learning the association of multi-variant inputs and output patterns [34]. In another study [30] that compared different neural network architectures, the authors' General Regression Neural Network (GRNN) was shown to be simple, stable, and descriptive of dynamic system characteristics [35].
Recently, novel architectures of neural networks such as the Long Short-Term Memory (LSTM) and Recurrent Neural Networks (RNNs) have attracted research interest as effective tools for traffic prediction [11,36,37]. For example, an RNN combined with LSTM traffic prediction [11] was found to provide higher performance compared to classical methods. Focusing on speed of individual vehicles, Ref. [36] used an RNN with LSTM and developed a car-following model to predict acceleration on roads. Other researchers have also developed LSTM-based RNN approaches for speed prediction models under various urban driving conditions with credible and accurate results [38].
Unlike previously mentioned studies which lacked the spatial consideration of the traffic state correlations, and with the availability of enriched traffic data, this paper extends more recent work that exploits the spatial and temporal traffic state features to develop robust traffic forecasting models. Some of the previous notable research where spatial-temporal characteristics were explored includes different approaches. For example, Ref. [39] developed a model based on the LSTM-NN, using graph convolution to mine spatial-temporal features. Also, Ref. [40] converted network traffic to images, where they also used predictive models to estimate future scenarios. Furthermore, Ref. [41] incorporated spatial and temporal patterns in short-term forecasting in traffic volume based on Modular Neural Networks (MNNs). However, it is difficult to establish from these studies, which were developed using different data sets, which models perform better and under which set of conditions. This limitation is addressed in this paper by considering six NN models that were trained, tested and validated based on a unique set of field observations with varying spatial and temporal state features by considering the locations between sensor stations and evaluating the effectiveness of the proposed models for both time and space analysis.

Model Development
The road facility is divided spatially into sections bounded by detector stations. The data for each detector location is then categorized into different time horizons {5, 10, 15, 30, 45 and 60} minutes. The goal is to predict the spatial and temporal information of the next time stamp (t + n) where n ranges from 5 to 60 min into the future with consideration to the spatial location for each detector, as shown in Figure 1. were trained, tested and validated based on a unique set of field observations with varying spatial and temporal state features by considering the locations between sensor stations and evaluating the effectiveness of the proposed models for both time and space analysis.

Model Development
The road facility is divided spatially into sections bounded by detector stations. The data for each detector location is then categorized into different time horizons {5, 10, 15, 30, 45 and 60} minutes. The goal is to predict the spatial and temporal information of the next time stamp (t + n) where n ranges from 5 to 60 min into the future with consideration to the spatial location for each detector, as shown in Figure 1.

Long Short-Term Memory (LSTM)
The RNN architecture described before provides good accuracy but does not perform well for long-term memory, as the RNNs are unable to use information from a distant past. To address this problem, Long Short-Term Memory (LSTM) models are considered extensions of the RNN to overcome these issues. Also, LSTM models can learn patterns with long dependencies when compared with traditional RNNs, that are not able to function for long term patterns. Therefore, the LSTM has generally been found to outperform RNNs in time series data forecasting [38].
LSTM models have a different structure from RNNs, as shown in the architecture presented in Figure 2. The following formulae are used to calculate the predicted values: Forget gate ( ) = ( Cell Candidate ( ) = ( + ℎ −1 + ) Output gate ( ) = ( where σg is the gate activation function, , , are input weight matrices, , , and are recurrent weight matrices, is the input, ℎ −1 is the output at the previous time (t − 1), and , , and are bias vectors. The forget gate determines

Long Short-Term Memory (LSTM)
The RNN architecture described before provides good accuracy but does not perform well for long-term memory, as the RNNs are unable to use information from a distant past. To address this problem, Long Short-Term Memory (LSTM) models are considered extensions of the RNN to overcome these issues. Also, LSTM models can learn patterns with long dependencies when compared with traditional RNNs, that are not able to function for long term patterns. Therefore, the LSTM has generally been found to outperform RNNs in time series data forecasting [38].
LSTM models have a different structure from RNNs, as shown in the architecture presented in Figure 2. The following formulae are used to calculate the predicted values: Forget gate where σ g is the gate activation function, W i , W f , W c and W o are input weight matrices, R i , R f , R c and R o are recurrent weight matrices, X t is the input, h t−1 is the output at the previous time (t − 1), and b i , b f , b c and b o are bias vectors. The forget gate determines how much of the prior memory values should be removed from the cell state. Similarly, the input gate specifies new input to the cell state. Then, the cell state ct and the output ht of the LSTM at time t is calculated as follows: where denotes the Hadamard product (element-wise multiplication of vectors).
Where ⊙ denotes the Hadamard product (element-wise multiplication of vectors). For this work, the LSTM was implemented in Matlab [43]. First, the data for temporal prediction were arranged for each detector as two-column values; the first column corresponded to speed at time (t), and the second column corresponded to the expected output (t + n) where n ranges from 5 to 60 min into the future. For spatial prediction, the data were also arranged as two-column values. However, the first column represented the speed at time (t) for the detector (r) and the second column represented the expected output of the speed at the time (t) for the following detector (r + 1).
Then, the data were partitioned into the training and test data. The model was trained on the first 60% of the sequence and tested on the last 40%. To prevent the model from overfitting, the training/testing data were standardised to have zero mean and unit variance. After that, the LSTM network is created using four layers: Sequence Input Layer (number of features = 1), LSTM Layer (number of hidden units = 300), Fully Connected Layer (number of responses = 1) and a Regression Layer. The model parameter settings, as reported in Table 1. The tanh and sigmoid functions are used for state and gate activation functions, respectively. The LSTM experiments are implemented with Matlab R2019b with the Deep Learning Toolbox functions of trainNetwork, training options, and pre-dictAndUpdateState.  For this work, the LSTM was implemented in Matlab [43]. First, the data for temporal prediction were arranged for each detector as two-column values; the first column corresponded to speed at time (t), and the second column corresponded to the expected output (t + n) where n ranges from 5 to 60 min into the future. For spatial prediction, the data were also arranged as two-column values. However, the first column represented the speed at time (t) for the detector (r) and the second column represented the expected output of the speed at the time (t) for the following detector (r + 1).
Then, the data were partitioned into the training and test data. The model was trained on the first 60% of the sequence and tested on the last 40%. To prevent the model from overfitting, the training/testing data were standardised to have zero mean and unit variance. After that, the LSTM network is created using four layers: Sequence Input Layer (number of features = 1), LSTM Layer (number of hidden units = 300), Fully Connected Layer (number of responses = 1) and a Regression Layer. The model parameter settings, as reported in Table 1. The tanh and sigmoid functions are used for state and gate activation functions, respectively. The LSTM experiments are implemented with Matlab R2019b with the Deep Learning Toolbox functions of trainNetwork, training options, and predictAndUpdateState.

Case Study Using Field Data from the Eastern Freeway (Melbourne/Australia)
This section presents a case study that demonstrates the proposed model's applicability and its robustness and effectiveness using field data collected from the Eastern Freeway in Melbourne/Australia. This section first describes the data collection and characteristics of the field data and then presents a comparative evaluation of the model's performance, including a detailed analysis of prediction results.

Data Collection
The real-life data used for the model development was collected from inductive loops embedded along the Eastern Freeway in Melbourne/Australia ( Figure 3). This road facility is an 18-km freeway which is bounded by East Link at Nunawading (eastbound) and Alexandra Parade (westbound). The data used in this study included speed observations collected for 31 days from 1 July 2016 to 31 July 2016 for both eastbound (EB) and westbound (WB) directions. The data was aggregated every 1-min interval across all lanes at each site for each 24-h duration. Four different locations of detectors were chosen for each direction (8 locations in total), distributed across the mainline carriageway covering the entire freeway. It should be noted here that this is the most recent data available to the research team. This is not because of a lack of field data but because of the time and effort it takes to clean new field data and label it for such AI applications. The data described in this paper has been extensively cleaned with considerable time spent on ensuring that only quality data for the incident and non-incident conditions are used in model development and evaluation. Locations of detectors are shown in Figure 3 and Table 2.

Case Study Using Field Data from the Eastern Freeway (Melbourne/Australia)
This section presents a case study that demonstrates the proposed model's applicability and its robustness and effectiveness using field data collected from the Eastern Freeway in Melbourne/Australia. This section first describes the data collection and characteristics of the field data and then presents a comparative evaluation of the model's performance, including a detailed analysis of prediction results.

Data Collection
The real-life data used for the model development was collected from inductive loops embedded along the Eastern Freeway in Melbourne/Australia (Figure 3). This road facility is an 18-km freeway which is bounded by East Link at Nunawading (eastbound) and Alexandra Parade (westbound). The data used in this study included speed observations collected for 31 days from 1 July 2016 to 31 July 2016 for both eastbound (EB) and westbound (WB) directions. The data was aggregated every 1-min interval across all lanes at each site for each 24-h duration. Four different locations of detectors were chosen for each direction (8 locations in total), distributed across the mainline carriageway covering the entire freeway. It should be noted here that this is the most recent data available to the research team. This is not because of a lack of field data but because of the time and effort it takes to clean new field data and label it for such AI applications. The data described in this paper has been extensively cleaned with considerable time spent on ensuring that only quality data for the incident and non-incident conditions are used in model development and evaluation. Locations of detectors are shown in Figure 3 and Table 2.        The data was aggregated at 1-min intervals across all lanes at each site. Considerable effort was given to pre-processing the data, including the removal of missing or corrupted data. For example, if the data was continuously missing for more than 10 min, the traffic values were removed. However, if the missing data was only for a short duration, for example, between (12 July 2016 17:48 and 12 July 2016 17:50), then the value at 12 July 2016 17:49 was estimated as: The reason for choosing these four detectors for each direction is that they had the most reliable data with few missing observations for the 31-day duration. They were also chosen to capture variable spatial locations and traffic characteristics along the freeway for this research; the total number of valid observation for the eight detectors was 288,653 observations. For each detector, the data was divided into three data sets: Training (60%) and testing and validation (40%). The number of speed data tested for each detector (after pre-processing) is presented in Table 3. The training set was used to "calibrate" the neural network model in order to learn the traffic patterns. Using this data, the model learned from the relationships between the different variables in the input values. The testing data The data was aggregated at 1-min intervals across all lanes at each site. Considerable effort was given to pre-processing the data, including the removal of missing or corrupted data. For example, if the data was continuously missing for more than 10 min, the traffic values were removed. However, if the missing data was only for a short duration, for example, between (12 July 2016 17:48 and 12 July 2016 17:50), then the value at 12 July 2016 17:49 was estimated as: The reason for choosing these four detectors for each direction is that they had the most reliable data with few missing observations for the 31-day duration. They were also chosen to capture variable spatial locations and traffic characteristics along the freeway for this research; the total number of valid observation for the eight detectors was 288,653 observations. For each detector, the data was divided into three data sets: Training (60%) and testing and validation (40%). The number of speed data tested for each detector (after pre-processing) is presented in Table 3. The training set was used to "calibrate" the neural network model in order to learn the traffic patterns. Using this data, the model learned from the relationships between the different variables in the input values. The testing data (which was not used for model training and calibration) was used to validate the results and ensure that the neural network was not memorising the patterns and the outputs from the training data set. Figures 5 and 6 provide samples of speed data patterns for the eastbound and westbound locations. (which was not used for model training and calibration) was used to validate the results and ensure that the neural network was not memorising the patterns and the outputs from the training data set.     (which was not used for model training and calibration) was used to validate the results and ensure that the neural network was not memorising the patterns and the outputs from the training data set.

Model Evaluation
To evaluate Long Short-Term Memory (LSTM) prediction robustness, five machine learning systems were evaluated using the same data set. These included: Recurrent Neural Networks (RNNs), General Regression Neural Networks (GRNNs), Modular Neural Networks (MNNs), Deep Learning Backpropagation (DLBP) neural networks and Radial Basis Function Networks (RBFNs). These models have been widely used for future traffic forecasts, as shown in the example papers provided in the literature review section above. The models reported in this paper were developed using NeuralWorks Professional, which is an Artificial Neural Network commercial package and development system [45].
The Backpropagation Neural Network is the most popular learning algorithm used to capture non-linear relationships and self-learning. The typical back-propagation network always has an input layer, an output layer and more than one hidden layer, which is referred to as "Deep Learning". Each layer is fully connected to the succeeding layer. The implementation of the algorithm simply includes an input training pattern (feedforward), backpropagated error and weight adjustment. The parameters used for this experiment included 3 hidden layers with 4, 6, and 2 neurons. The transfer function is Tanh with a learning coefficient output (α = 0.15). The learning rule is Ext DBD with 100,000 iterations and a momentum of 0.4.
The training for the Radial Basis Function Network (RBFN) network uses a radial basis function instead of a linear function with more neurons needed in the hidden layer compared to the multi-layer BP neural network. In general, an RBFN is any network which has an internal representation of hidden neurons (pattern units) which are radially symmetric. In order for a pattern unit to be radially symmetric, it should include the following criteria: a center, a distance measure and a transfer function. A center is vector in the input space and which is typically stored in the weight vector from the input layer to the pattern unit. The distance measure determines how far an input vector is from the center, such as the Euclidean distance measure. In terms of the transfer function, it determines the output by mapping the output of the distance function, such as the Gaussian function. The following parameters are used in this experiment: proto (50), summation function (Euclidean), momentum (0.4), learn rule (Ext DBD) and transfer (Tanh).
Modular Neural Network (MNN) models include modules and a gating network. The modules are referred to as "local experts" which approach the problem from various angles. The gating network is an integrated unit that allocates different features of the input space to the different local expert networks. The parameters chosen in this experiment were: hidden layers (1) with (14)  The General Regression Neural Network (GRNN) is a general purpose network paradigm based on linear regression theories but extends the regression to avoid assuming a specific functional form (such as linear) for the relationship between the inputs and outputs. The following parameters were used in this experiment: pattern neurons (50), summation function (Euclidean), radius of influence (R 0.250), σ scale (1), σ exponent (0.5) and Tau time constant (1000).
RNNs are feedforward neural networks that perform well with time series forecasting data. The type of RNN used is a Werbos RNN, in which the weights are updated by using the standard back-propagation algorithm. The parameters used for this experiment were: hidden layers (1) with (5) neurons, activation function (tanh), learn rule (ext DBD) and epoch (770). Finally, the reader is referred to a number of other references [46,47] that provide further details about the use of the Neuralware platform for automated incident detection and to a number of other studies that are relevant to the use of simulation tools [48][49][50][51][52][53][54] and how they can be used for evaluation of different transport management strategies to enhance efficiency and reduce emissions.
Also, it is essential to establish metrics that allow the comparison of the different methods. In this paper, the Mean Absolute Percentage Error (MAPE) is used to calculate Future Transp. 2021, 1 30 the accuracy of the model prediction for different time horizons in the future. MAPE calculates the average absolute difference between the predicted output from the model (Y1) and the expected true output (Y) [55].

Westbound Direction-Temporal Prediction Accuracies
The experimental results for temporal variations in the westbound direction are provided in Table 4 and Figure 7. The cells in the table that are highlighted in green show the highest accuracies obtained, while the cells highlighted in orange show the lowest. The results show that the LSTM generally has superior performance when compared to the RNN, GRNN, MNN, RBF and the DLBP neural network.
For detector 14003, the results show that LSTM provides a forecasting accuracy of 94.24% for a 5-min prediction horizon and 94.97% for a 60-min prediction horizon. For the LSTM, the accuracy does not deteriorate substantially as the prediction horizon increases, with the accuracy remaining high at 94.97% for the 60-min horizon. This shows the ability of the system to capture the complexity of longer speed prediction horizons. On the other hand, the MNN provided the least accurate predictions out of the six models tested. For detector 14031, the same trend can be noticed with accuracies not deteriorating substantially for the LSTM with longer prediction horizons. For the 10-min horizon, the accuracy is 97.55%, while it remained steady at 98.96% for the 60-min horizon. Also, it can be noted that the LSTM provides better accuracy for 10 min and 60 min prediction horizons while the GRNN, MNN and RBF outperformed the LSTM for the other prediction horizons. This may be attributed to the data patterns for detector 14031, which showed more congestion at certain times during the months of July compared with other detectors. For longer prediction horizons (45 and 60 min), the DLBP neural network had the lowest performance of 95.22% and 94.14%, respectively.  For detector 14003, the results show that LSTM provides a forecasting accuracy of 94.24% for a 5-min prediction horizon and 94.97% for a 60-min prediction horizon. For the LSTM, the accuracy does not deteriorate substantially as the prediction horizon increases, with the accuracy remaining high at 94.97% for the 60-min horizon. This shows the ability of the system to capture the complexity of longer speed prediction horizons. On the other hand, the MNN provided the least accurate predictions out of the six models tested. For detector 14031, the same trend can be noticed with accuracies not deteriorating substantially for the LSTM with longer prediction horizons. For the 10-min horizon, the accuracy is 97.55%, while it remained steady at 98.96% for the 60-min horizon. Also, it can be noted that the LSTM provides better accuracy for 10 min and 60 min prediction horizons while the GRNN, MNN and RBF outperformed the LSTM for the other prediction horizons. This may be attributed to the data patterns for detector 14031, which showed more congestion at certain times during the months of July compared with other detectors. For longer prediction horizons (45 and 60 min), the DLBP neural network had the lowest performance of 95.22% and 94.14%, respectively.  For detector 14051, the LSTM outperforms all other 6 models with an accuracy ranging from 95.61% to 99.94%. The DLBP neural network and the MNN provided the least accuracies for shorter prediction horizons up to 30 min, while the RBF had the lowest accuracy of 89.72% compared to 99.94% of the LSTM for 45 min into the future. The RNN provided the least accurate predictions for 60-min horizons with 83.16% accuracy compared to 99.43% for the LSTM.
For detector 14063, the LSTM provided a good level of accuracy for all time horizons ranging from 97.68% to 98.77%. The DLBP neural network achieved the lowest accuracy for 15, 30 and 45 min predictions compared to the six models, whereas the RBF had the lowest accuracy for 60-min horizons (91.77%) compared with the LSTM with the highest accuracy of 98.77%.

Westbound Direction-Spatial Prediction Accuracies
For the spatial analysis, the main observation is that the accuracy of prediction is not affected as the distance between detector locations increases ( Table 5 and Figure 8). The LSTM provides the highest accuracy for all spatial ranges, with the accuracy ranging from 96.03% for a 5.4 km separation to 98.24% accuracy for an 18.6 km separation of detector locations. The DLBP neural network provided the lowest accuracies for short distances, while the MNN and RBF performed worse when the spatial separation increased between any two detector locations. lowest accuracy for 60-min horizons (91.77%) compared with the LSTM with the highest accuracy of 98.77%.

Westbound Direction-Spatial Prediction Accuracies
For the spatial analysis, the main observation is that the accuracy of prediction is not affected as the distance between detector locations increases ( Table 5 and Figure 8). The LSTM provides the highest accuracy for all spatial ranges, with the accuracy ranging from 96.03% for a 5.4 km separation to 98.24% accuracy for an 18.6 km separation of detector locations. The DLBP neural network provided the lowest accuracies for short distances, while the MNN and RBF performed worse when the spatial separation increased between any two detector locations.

Eastbound Direction-Temporal Prediction Accuracies
The experimental results for the temporal variations in the eastbound direction are provided in Table 6 and Figure 9. The cells in the table that are highlighted in orange show the highest accuracies obtained, while the cells highlighted in green show the lowest. The results show that the LSTM generally has superior performance when compared to the RNN, GRNN, MNN, RBF and DLBP neural network.

Eastbound Direction-Temporal Prediction Accuracies
The experimental results for the temporal variations in the eastbound direction are provided in Table 6 and Figure 9. The cells in the table that are highlighted in orange show the highest accuracies obtained, while the cells highlighted in green show the lowest. The results show that the LSTM generally has superior performance when compared to the RNN, GRNN, MNN, RBF and DLBP neural network.
Detector 14010 shows that the LSTM provides higher prediction accuracies of 98.44% for 5-min prediction horizons and 99.10% for 60-min horizons. For the LSTM, the accuracies do not deteriorate as the prediction horizon increases. For example, the accuracy is 97.97% for the 10-min horizon and remains steady at 99.10% for 60-min prediction horizons. This shows the ability of the system to capture the complexity of a longer speed prediction horizon. For the eastbound direction, the RBF had the least accurate predictions out of the six models.
For detector 14032, the results for the LSTM were the lowest compared to all detectors from WB and EB. This is maybe due to the data patterns for this detector which experienced heavy congestion at certain times during the months of July compared to other detectors. For 10-min horizons, the accuracy is 84.74% compared to 88.37% for the 60-min horizon. Also, it can be noted that the RNN provides better accuracy for 10-min and 15-min prediction horizons. However, the LSTM provides better accuracy overall for detector 14032. For short prediction, the GRNN and the DLBP neural network provided the least accurate predictions, while for longer prediction horizons (45 and    Detector 14010 shows that the LSTM provides higher prediction accuracies of 98.44% for 5-min prediction horizons and 99.10% for 60-min horizons. For the LSTM, the accuracies do not deteriorate as the prediction horizon increases. For example, the accuracy is 97.97% for the 10-min horizon and remains steady at 99.10% for 60-min prediction horizons. This shows the ability of the system to capture the complexity of a longer speed prediction horizon. For the eastbound direction, the RBF had the least accurate predictions out of the six models.
For detector 14032, the results for the LSTM were the lowest compared to all detectors from WB and EB. This is maybe due to the data patterns for this detector which experienced heavy congestion at certain times during the months of July compared to other detectors. For 10-min horizons, the accuracy is 84.74% compared to 88.37% for the 60-min horizon. Also, it can be noted that the RNN provides better accuracy for 10-min and 15min prediction horizons. However, the LSTM provides better accuracy overall for detector 14032. For short prediction, the GRNN and the DLBP neural network provided the least accurate predictions, while for longer prediction horizons (45 and

Eastbound Direction-Spatial Prediction Accuracies
For the eastbound direction, the results are similar for westbound reported before, where the accuracy of prediction is not affected as the distance changes. As can be seen in Table 7 and Figure 10, the LSTM provides the highest accuracy for most distance ranges, whereas the RNN provides the least accuracy. Overall, the LSTM provides the highest

Eastbound Direction-Spatial Prediction Accuracies
For the eastbound direction, the results are similar for westbound reported before, where the accuracy of prediction is not affected as the distance changes. As can be seen in Table 7 and Figure 10, the LSTM provides the highest accuracy for most distance ranges, whereas the RNN provides the least accuracy. Overall, the LSTM provides the highest accuracy ranges for detectors that are furthest apart (15.45 km) with an accuracy result of 99.95%.   Figure 10. Graphical representation of spatial prediction results for the eastbound direction.

Conclusions and Future Directions
This paper developed and evaluated robust models for spatial and temporal predictions of traffic conditions at different locations on a real-world road facility in Melbourne, 80 Figure 10. Graphical representation of spatial prediction results for the eastbound direction.

Conclusions and Future Directions
This paper developed and evaluated robust models for spatial and temporal predictions of traffic conditions at different locations on a real-world road facility in Melbourne, Australia. These models were developed using the Long Short-Term Memory (LSTM) network using a large data set of traffic observations, comprising 288,653 data points. The findings showed that the LSTM architecture provided the highest predictive intelligence accuracy for both temporal and spatial predictions. In particular, the models showed resilience as the prediction horizons increased spatially for distances up to 15 km and temporally up to 60 min providing a remarkable performance compared to the other architectures tested. Another important contribution of this work is demonstrating the feasibility of transferring a model which has been developed for one location for use at other multiple locations across the same facility, which is important for decision making when the input data of one location is missing or not available.
Future work will explore incorporating more inputs such as flow, occupancy as well as other important external factors such as weather to further refine the prediction models. In addition, more detector locations and spatial distributions along the entire road network will be utilised for future prediction analysis, including generation of new data sets for edge cases based on calibrated traffic simulation models and digital twins of the road facilities under consideration. Finally, future studies should also include additional advanced architectures and explore the influence of network parameters and sensitivity analyses of results.