A Review of Traffic Flow Prediction Methods in Intelligent Transportation System Construction

Liu, Runpeng; Shin, Seong-Yoon

doi:10.3390/app15073866

Open AccessReview

A Review of Traffic Flow Prediction Methods in Intelligent Transportation System Construction

by

Runpeng Liu

and

Seong-Yoon Shin

^*

Department of Computer Science and Information Engineering, Kunsan National University, Gunsan 54150, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(7), 3866; https://doi.org/10.3390/app15073866

Submission received: 10 March 2025 / Revised: 24 March 2025 / Accepted: 27 March 2025 / Published: 1 April 2025

(This article belongs to the Special Issue Future Information & Communication Engineering 2024)

Download

Browse Figures

Review Reports Versions Notes

Abstract

With the continuous development of intelligent transportation systems (ITSs), traffic flow prediction methods have become the cornerstone of this technology. This paper comprehensively reviews the traffic flow prediction methods used in ITSs and divides them into three categories: statistics-based, machine learning-based, and deep learning-based methods. Although statistics-based methods have lower data requirements and machine learning methods have faster calculation speeds, this paper concludes that deep learning methods have the best overall effect after a comprehensive analysis of the principles, advantages, limitations, and practical applications of each method. Deep learning methods can overcome many limitations that traditional statistical methods and machine learning methods cannot surpass, such as the ability to model complex nonlinear relationships. Experimental results show that hybrid neural networks are significantly superior to traditional methods in terms of their prediction accuracy and generalization abilities. By combining multiple models and techniques, hybrid neural networks can improve the accuracy of traffic flow prediction under different conditions. Although deep learning methods have achieved remarkable success in short-term prediction, challenges still exist, such as the generalization of models in different traffic scenarios and the difficulty of long-term traffic flow prediction. Finally, this paper discusses future research directions and anticipates the future development of ITS technology.

Keywords:

intelligent transportation systems; statistical analysis theory; machine learning; deep learning

1. Introduction

1.1. Theme Background and Significance

In modern society, road traffic holds significant importance in both the economy and daily life. However, with the rapid development of the automotive industry, there is a lack of sufficient awareness regarding the limitations and scarcity of traffic resources. The surge in travel demands has placed immense pressure on transportation systems. Although the construction of new roads can alleviate congestion, most cities have turned to advanced technologies due to limitations in urban space and costs. In recent years, many countries worldwide have invested substantial resources into the research and development of intelligent transportation systems (ITSs), which have played a critical role in alleviating congestion and enhancing the traffic flow efficiency.

As a key component of smart cities, ITSs play a vital role in improving the environmental quality and enhancing the quality of life of residents. ITSs utilize real-time monitoring, intelligent traffic signal control, and other technological means to directly reduce traffic congestion, lower accident rates, and relieve urban traffic pressures. Additionally, ITSs can decrease vehicle idling, thereby reducing exhaust emissions and indirectly improving the urban environmental quality. Most importantly, through services such as intelligent navigation systems and real-time public transportation information, ITSs enhance the convenience and comfort of residents’ travel [1,2].

In summary, ITSs play an irreplaceable role in achieving the development goals of smart cities. However, an ITS is a complex management system, encompassing multiple subsystems, such as rail transit management, intelligent traffic signal management, and vehicle navigation. To ensure the efficient operation of the ITS, traffic flow prediction technology is of paramount importance.

Traffic forecasting technology is the cornerstone of intelligent transportation systems. Its core principle is to model urban road networks, highways, and rail transit systems and, by integrating historical traffic data, predict traffic conditions for specific future time periods. By accurately predicting traffic flow data, traffic management authorities can more effectively guide vehicles, mitigate urban traffic congestion, and improve the vehicle throughput efficiency. Accurate traffic flow forecasting also helps to schedule vehicle operations in rail systems, preventing public safety issues that may arise from large crowds gathering for extended periods, thus ensuring the safety of citizens’ travel. Furthermore, precise traffic forecasting can reduce the energy consumption associated with travel, contributing to the construction of low-carbon and environmentally friendly cities. Therefore, the further enhancement of traffic flow forecasting technologies is imperative.

1.2. Research Status

The core of traffic flow prediction tasks lies in analyzing historical data to forecast future traffic flow trends, thus providing decision-making support to traffic management authorities. This requires a substantial number of data, many of which are collected through sensors, cameras, and other monitoring technologies, and they exhibit both temporal and spatial dependencies. Temporal dependency is mainly reflected in patterns such as the similarity in peak travel times during weekday rush hours and holidays. Spatial dependency is manifested in how the current traffic situation is influenced by the historical traffic conditions of upstream, downstream, and adjacent road monitoring points. Additionally, the traffic flow is also affected by external factors such as the weather conditions, holidays, and the distribution of points of interest, all of which significantly impact the accuracy of traffic flow predictions [3]. The related data are highly nonlinear and complex, making them difficult for traditional statistical and machine learning methods to accurately capture and model.

In recent years, the development of deep learning technologies has introduced powerful capabilities in feature extraction and pattern recognition, showing great potential in the field of traffic forecasting. Deep learning models such as deep neural networks, recurrent neural networks, and long short-term memory networks are capable of processing complex spatiotemporal data, leading to significant improvements over traditional methods. Moreover, the adaptive ability of deep learning allows it to maintain prediction accuracy even when faced with substantial noise and multidimensional features in traffic data.

Academia, both domestically and internationally, has conducted extensive and in-depth research in the field of traffic forecasting, developing a range of methods tailored to different application scenarios. These include traffic flow prediction for multi-intersection systems [4], highway traffic volume forecasting [5], and urban road traffic flow prediction [6], among others. Commonly used techniques include statistical analysis methods, machine learning approaches, and deep learning methods, which will be discussed in detail in the main body of this paper.

1.3. The Purpose of This Study

In future developments, smart cities will emerge as the predominant trend in urban growth. With the continuous advancement and application of technology, smart cities will play an increasingly significant role in urban management, social governance, and economic development. As a critical component of smart cities, the development level of intelligent transportation systems (ITSs) will directly impact urban traffic conditions and residents’ quality of life. The primary objective of this research is to promote innovation and development in traffic flow prediction technologies while providing scientific and reliable decision-making support for urban traffic management. By integrating state-of-the-art data processing techniques and machine learning algorithms, this study will analyze the advantages and disadvantages of various technologies, exploring the challenges that they present and their future development trajectories. This research is anticipated to provide more precise data support for key areas such as urban traffic planning, traffic signal control, and emergency response, thereby effectively mitigating traffic congestion, enhancing the public transportation efficiency, and ensuring traffic safety. Furthermore, this study will offer novel research perspectives for related fields such as transportation science, computer science, and data analysis.

1.4. Brief Layout of the Paper

The structure of this paper is clear and straightforward. To provide readers with a better understanding of the importance of traffic flow prediction in smart cities and the specific implementation methods of such predictions, this paper first elaborates on the significance of traffic forecasting systems within the context of smart cities. This includes, but is not limited to, mitigating traffic congestion, reducing accident rates, improving the urban environmental quality, and enhancing residents’ quality of life. Next, the paper outlines the implementation methods for traffic flow prediction, starting with earlier prediction approaches based on statistical analysis theories and progressing to the latest applications of deep learning methods. A summary of the datasets required for related simulations is also provided. The paper then presents a comparative analysis of methods from different eras through case studies to highlight the most effective approach at present. Finally, the paper concludes by summarizing the advantages and disadvantages of all methods discussed, offering forward-looking perspectives for the future development of traffic flow prediction.

2. Prediction Method Based on Statistical Analysis Theory

2.1. Introduction to Traditional Statistical Methods

Traditional statistical methods have been in use for an extended period in traffic flow prediction, particularly in the early stages of intelligent transportation system development. While deep learning and machine learning techniques have gradually gained prominence in recent years, traditional statistical methods still hold significant value in short-term traffic forecasting or scenarios with limited data. These methods are widely adopted for short-term traffic prediction due to their relatively simple theoretical foundations, lower computational complexity, and stable prediction results. The essence of traditional statistical methods lies in constructing mathematical models based on historical traffic data to capture characteristic patterns from time series, thereby enabling the prediction of future traffic flows.

2.2. Specific Methods Listed

(1): Historical average model (HA)

The historical average method is a basic forecasting technique commonly employed to predict data with periodic or seasonal patterns. It forecasts future values by computing the average of the same time period from historical data, assuming that the data follow a certain cyclical pattern. The method is simple, and its parameters can be estimated online using the least squares (LS) method. It is capable of handling variations in traffic flows over different times and periods to some extent. However, its static prediction performance is limited, as it does not reflect the inherent uncertainty and nonlinear characteristics of dynamic traffic flows. In particular, it is unable to address the impact of random disturbances or respond to unforeseen events, such as accidents, within the traffic system [7].

(2): Autoregressive Integrated Moving Average model (ARIMA)

Unlike other time series methods that require fixed initialization for simulation, the ARIMA model is composed of three components: Autoregression (AR), Differencing (I), and Moving Average (MA). By fitting the historical data, the model generates parameters and makes predictions. It views the traffic flow at a given time as a more general non-stationary stochastic process, typically with three or six model parameters [8].

Based on a substantial number of continuous data, the ARIMA model offers high prediction accuracy, making it particularly suitable for stable traffic flows. However, the model primarily approaches prediction from a pure time series analysis perspective and does not account for the flow relationships between upstream and downstream road sections. Therefore, it is recommended to use this model in conjunction with other models for more comprehensive predictions [9].

(3): Kalman filter model

Kalman filtering theory (KF), introduced by Kalman in 1960, has found application in various domains, including time series modeling in statistics and economics. Kalman filtering is typically employed in non-stationary stochastic environments and offers relatively low computational complexity. However, it is only applicable to linear systems and is highly sensitive to noise. For state estimation and prediction in nonlinear systems, the use of extended Kalman filtering or nonlinear filtering methods is necessary. Given that the model is based on linear estimation, its performance may deteriorate when the prediction interval is less than 5 min and when the randomness and nonlinearity of traffic flow variations are more significant [10].

2.3. Disadvantages

Although prediction methods based on statistical analysis theory have demonstrated good performance on small datasets with short observation periods and limited time series, the predictive capability of these statistical models is limited in traffic forecasting applications. This is due to their simple and transparent computational structure, their exclusive focus on time series data, and their lack of consideration for the complexity of spatiotemporal relationships. To address this challenge, some studies have developed extended time series methods, such as ST-ARIMA [11] and VARMA [12], which incorporate the interaction between space and time in a novel manner. However, because these methods rely on explicit parameterized functions with strong assumptions during the modeling process, they are not well suited for the simulation of real-world traffic scenarios.

3. Traditional Machine Learning Models

3.1. Introduction to Machine Learning Methods

In recent years, machine learning methods have enabled significant advances in various fields, such as industrial control, finance, and healthcare. This tool has also proven highly effective in the transportation domain, particularly in the development and application of intelligent transportation systems (ITSs). In the context of traffic flow prediction, machine learning methods, compared to traditional statistical techniques, are better able to adapt to the evolving nature of traffic networks. Through the use of historical data, pattern recognition, and the establishment of dynamic nonlinear models, machine learning methods play a vital role in enabling precise traffic flow predictions.

3.2. Specific Methods

Traditional machine learning methods applied in traffic prediction can be broadly categorized into three types: feature extraction-based methods, Gaussian process modeling methods, and state space modeling methods [13].

(1): Feature extraction-based methods are primarily employed to train regression models to solve practical traffic prediction problems. Their main advantage lies in their simplicity and ease of implementation. However, these methods also suffer from limitations, such as focusing only on time series data and neglecting the complexities of spatiotemporal relationships. Cheng et al. [14] proposed an adaptive K-Nearest Neighbors (KNN) algorithm that treats spatial features of the road network as adaptive spatial neighbor nodes, time intervals, and spatiotemporal iso-value functions. The algorithm was evaluated using speed data collected from highways in California and urban roads in Beijing.
(2): Gaussian process methods utilize multiple kernel functions to capture the internal features of traffic data, while considering both spatial and temporal correlations. These methods are useful and practical for traffic forecasting but come with higher computational complexity and greater data storage demands when compared to feature extraction-based methods.
(3): State space modeling methods assume that the observations are derived from a hidden Markov model, which is adept in capturing hidden data structures and can naturally model uncertainty within the system. This is an ideal characteristic for traffic flow prediction applications. However, these models can be challenging when it comes to establishing nonlinear relationships. Therefore, they are not always the best choice when modeling complex dynamic traffic data, particularly in long-term forecasting scenarios. Tu et al. [15] introduced a congestion pattern prediction model, SG-CNN, based on a hidden Markov model and compared it with the well-known ARIMA baseline model for traffic prediction. Experimental results demonstrated that the SG-CNN model exhibited strong performance.

4. Deep Learning Models

4.1. Introduction to Deep Learning Technology

Deep learning technology, an extension of machine learning, involves numerous processing layers, which enable the learning of features at high levels of abstraction. In recent years, various sensor devices have been utilized to capture urban traffic parameters, generating large volumes of data with diverse types and characteristics. At the same time, deep learning techniques have demonstrated their capacity for high-dimensional data mining, extracting complex spatiotemporal dependencies from intricate traffic datasets, thus improving the accuracy of traffic predictions.

Essentially, deep learning bypasses the need for human involvement in the feature selection process. As a widely researched area in recent years, deep learning has achieved notable success in fields such as image recognition, object detection, and natural language processing. Compared to traditional machine learning, deep learning offers a broader range of applications, significantly expanding the field of artificial intelligence. The core of deep learning lies in constructing machine learning models with multiple hidden layers, which are trained on vast amounts of data to derive more meaningful features, thereby enhancing the accuracy of prediction and classification tasks. Prominent and effective deep learning models currently include convolutional neural networks (CNNs), autoencoders, recurrent neural networks (RNNs), graph convolutional networks (GCNs), attention mechanisms with Transformers, and hybrid neural networks in deep learning.

4.2. Multilayer Perceptron Network (MLP)

Among the many deep neural networks, the MLP was among the first to be applied in short-term traffic flow forecasting. The MLP is the simplest type of deep neural network, and a typical MLP consists of an input layer, hidden layers, an output layer, and nonlinear activation functions, forming a multilayer feedforward artificial neural network, as shown in Figure 1. This method only involves matrix operations in the fully connected layer, resulting in high computational efficiency and a low computational cost. In the MLP, the input layer receives the input data and performs feature normalization, while the hidden layers process the input signals. The output layer makes decisions or predictions based on the processed information. Figure 1 illustrates a single-neuron perceptron model, where the activation function φ (Equation (1)) is a nonlinear function that maps the summation function (xw + b) to the output value y. In Equation (1), the terms x, w, b, and y represent the input vector, the weighted vector, the bias, and the output value, respectively [16]. Figure 2 shows the structure of the MLP model.

y = φ (x w + b)

(1)

The multilayer perceptron (MLP) is capable of mapping multidimensional data to a one-dimensional output, solving the nonlinear issues that a single-layer network cannot address. The MLP is trained using the backpropagation (BP) algorithm, and, by applying a simple MLP, high prediction accuracy can be achieved for short-term traffic flow forecasting. In the context of short-term traffic prediction, the MLP is used to model the mapping relationship between the historical time series data (input) and the predicted results (output), enabling accurate traffic flow predictions. Slimania et al. [17] utilized the MLP, SARIMA, and support vector regression (SVR) algorithms to predict the traffic flow over a 42-day period on a road segment in Morocco, incorporating factors such as whether a given day was a holiday. Their experiment demonstrated that the MLP outperformed the other two models in terms of prediction accuracy. Aljuaydi et al. [18] proposed a multivariate machine learning-based highway traffic flow prediction model for non-recurrent events, using a dataset that included five features, namely the traffic flow, speed, density, road accidents, and rainfall, as well as two evaluation metrics, the root mean square error (RMSE) and mean absolute error (MAE). The MLP model used as a benchmark produced favorable forecasting results.

4.3. Convolutional Neural Networks (CNNs)

Convolutional neural networks (CNNs) are a class of deep learning models primarily used for the processing and analysis of data with a grid-like structure, such as images and audio. The design inspiration for CNNs stems from the understanding of the animal visual system in biology. CNNs have achieved remarkable success in various domains, including image recognition, image classification, object detection, and semantic segmentation. A typical CNN consists of convolutional layers, pooling layers, and fully connected layers. In the convolutional layers, the network applies a series of filters to extract features from the input data, generating feature maps. The pooling layers serve to reduce the spatial dimensions of the feature maps, thereby decreasing the computational complexity, often utilizing operations such as max pooling or average pooling. Finally, the fully connected layers map the high-level features to the final output, performing tasks such as classification or regression. These characteristics enable CNNs to effectively capture the features of traffic data, thereby excelling in traffic flow prediction tasks. The corresponding structure is shown in Figure 3. Since convolution is computationally intensive, especially in deep CNNs, they require a large amount of GPU computing power; thus, this method has a high computational cost, but it is also highly scalable and can handle more complex tasks after adding convolutional layers and parameters. Taking the basic CNN as an example, the data transfer is shown in Equation (2). In the formula, Y is the prediction result;

X \in R^{N \times T \times F}

is the input of the CNN; N is the number of nodes; T is the length of the time series; F is the number of features; FC is the fully connected layer; Pool is the pooling layer; ReLU is the activation function; and Conv is the convolutional layer.

Y = FC (Pool (Re LU (Conv (X))))

(2)

Fast R-CNN, introduced by Ross [19], incorporates multiple innovations to improve both the training and testing speeds, while also enhancing the detection accuracy. Bogaerts et al. [20] proposed a deep neural network using a CNN-LSTM architecture for multi-step prediction. The CNN-LSTM framework is capable of identifying spatial and temporal relationships within traffic data sourced from GPS trajectories. Moreover, they introduced a data reduction technique and compared it with the advanced TF algorithm, yielding improved prediction results.

4.4. Autoencoder (AE)

An autoencoder is an unsupervised neural network model that performs feature extraction by learning compressed representations of the input data. An autoencoder (AE) works by training each layer of the network progressively, with each layer encoding and decoding the output of the previous layer. This process allows the model to gradually learn multi-level abstract features of the data. The objective of training each autoencoder layer is to minimize the reconstruction error, which is the difference between the input data and the reconstructed data [21]. The specific model structure is shown in Figure 4. An AE consists of an encoder and a decoder. The decoding–encoding process is computationally intensive and has a high computational cost. At the same time, it is also highly scalable. It has variants, such as the variational autoencoder (VAE) and denoising autoencoder (DAE). There are numerous variants, with sparse autoencoders being particularly effective in feature extraction. Sparse autoencoders can identify fewer but more meaningful features from the input data and can be combined with other deep learning techniques, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), to further enhance the performance. Sparse autoencoders are widely applied in the traffic domain. Taking the traditional autoencoder as an example, its encoding process is formulated as in Equation (3), while the decoding process is given by Equation (4). Let θ denote the network’s weight and bias parameters; then, the loss function is defined as in Equation (5). W₁ represents the weight of the neuron, and b₁ represents the bias of the neuron.

h = f (x) = f (W_{1} X + b_{1})

(3)

X^{d} = g (x) = g (W_{2} h + b_{2})

(4)

J_{A E} (θ) = J (X, X^{d}) = - \sum_{i = 1}^{n} (x_{i} \log (x_{i}^{d}) + (1 - x_{i}) \log (1 - x_{i}^{d}))

(5)

4.5. Recurrent Neural Networks (RNNs)

Recurrent neural networks (RNNs) are deep learning architectures specifically designed to handle sequential data. Unlike traditional neural network models, RNNs possess the ability to maintain the continuous transmission of information across time steps, thereby enabling the retention of historical input data. Within its network structure, the RNN employs a specific connection mechanism that ensures that the output at each time step is influenced not only by the current input but also by the previous state. This characteristic significantly enhances the accuracy and robustness of the model in time series predictions. The basic RNN model can be considered a time series prediction tool with memory capabilities. In the field of traffic data analysis, where the data inherently contain complex spatiotemporal dependencies, RNNs and their various variants have been widely applied for traffic flow prediction [22].

RNNs do not rigidly memorize fixed-length sequences; instead, they store information from previous time steps through their hidden states. A typical RNN structure is cyclical and consists of an input layer, an output layer, and a neural network unit. In this architecture, the RNN’s neural network unit not only establishes connections with the input and output layers but also possesses an internal loop that facilitates the flow of information across different time steps in the network. Although the computational complexity of recurrent neural networks (RNNs) is predominantly determined by the sequence length, the overall cost is not excessively high. However, the sequential dependency—where the computation at each time step relies on the preceding state—significantly impedes parallelization. Moreover, due to the vanishing gradient problem, RNNs face substantial challenges when applied to long sequences, thereby limiting their scalability in large-scale sequence tasks. An example is shown in Figure 5, where x_i represents the input at the i-th time step, and y_i corresponds to the output generated by x_i. The computation formula for the RNN model is as follows:

\begin{matrix} y_{i} = g (V h_{i}), \end{matrix}

(6)

h_{i} = f (U x_{i} + W h_{i - 1}) .

(7)

4.6. Long Short-Term Memory (LSTM) Recurrent Neural Networks

Due to the susceptibility of traditional RNNs to issues such as vanishing and exploding gradients, long short-term memory recurrent neural networks (LSTM-RNNs) were developed to address these limitations. LSTM-RNNs introduce unique gating mechanisms and memory cells, which enable them to effectively capture long-term dependencies within sequential data. Compared to traditional RNNs, the introduction of LSTM has paved new pathways for the development of RNN models. Particularly in the field of natural language processing, LSTM-RNNs have demonstrated superior performance over traditional RNNs, rapidly gaining widespread recognition and practical application. The specific model structure is shown in Figure 6.Compared with RNNs, LSTM introduces an additional gating mechanism, which effectively alleviates the gradient explosion problem. However, this mechanism also significantly increases the computational overhead, making its computational complexity higher than that of an ordinary RNN. In addition, as the sequence length increases, the computational cost of LSTM will increase rapidly, which further limits its scalability in large-scale sequence modeling tasks.

The memory cell of an LSTM network is primarily composed of an output gate, input gate, and forget gate.

The forget gate determines which information should be forgotten or retained from the cell state, and its computation is as follows:

f_{t} = σ (W_{f} \cdot [h_{t - 1}, X_{t}] + b_{f}) .

(8)

The input gate consists of two components: a sigmoid layer that determines which values need to be updated and a tanh layer that creates a new candidate value vector, which will be added to the cell state. The computation formula for the input gate is as follows:

\begin{matrix} i_{t} = σ (W_{i} \cdot [h_{t - 1}, X_{t}] + b_{i}) \end{matrix},

(9)

{\tilde{C}}_{t} = \tanh (W_{C} \cdot [h_{t - 1}, X_{t}] + b_{c}) .

(10)

The output gate determines the value of the hidden state and contains key information about the observed sequence. The computation formula for the output gate is as follows:

o_{t} = σ (W_{o} \cdot [h_{t - 1}, X_{t}] + b_{o}),

(11)

h t = o t ⊙ t a n h (C t) .

(12)

Traffic flow prediction based on RNNs has been widely applied. Zheng et al. [23] employed LSTM networks to predict traffic flows, analyzing the impacts of various factors on the prediction performance of LSTM. They further improved the prediction accuracy by incorporating an attention mechanism-based Conv-LSTM module. Zhao et al. [24] proposed a traffic flow prediction model based on LSTM networks, integrating the temporal and spatial interactions of the road network. Unlike traditional prediction methods, the LSTM network considers the spatiotemporal dependencies of the traffic system through a two-dimensional network consisting of multiple memory units. Comparative experiments with other typical prediction models demonstrated that the proposed novel LSTM-based traffic prediction model delivered superior performance.

4.7. Gated Recurrent Unit (GRU)

In RNNs, the GRU is a variant that is simpler than the LSTM network while still effectively addressing long-term dependencies and the vanishing gradient problem. The GRU retains key components of the gating mechanism—specifically, the update gate and the reset gate, which regulate the flow of information. However, it omits the memory cell found in LSTM networks. Compared to LSTM, the GRU has fewer parameters, resulting in improved computational efficiency. In many cases, it achieves comparable, or even superior, performance on various tasks. As a lightweight alternative to LSTM networks, the GRU simplifies the gating mechanism, thereby reducing the computational overhead, making it slightly more efficient than LSTM. However, as the sequence length increases, the computational cost of the GRU also increases rapidly, limiting its scalability in long sequence modeling tasks. Its structural model is shown in Figure 7.

Update Gate: The purpose of the update gate is to determine the extent to which the hidden state at the current time step should retain information from the previous time step. The formula is as follows:

z_{t} = σ (W_{z} \cdot [h_{t - 1}, x_{t}]) .

(13)

Reset Gate: The reset gate determines to what extent the hidden state from the previous time step should be disregarded. The formula is as follows:

r_{t} = σ (W_{r} \cdot [h_{t - 1}, x_{t}]) .

(14)

Li et al. were the first to apply GRUs to traffic flow prediction and demonstrated through experiments that the GRU-based approach outperformed the ARIMA model in short-term traffic flow forecasting [25]. R. Dey et al. systematically evaluated three variants of the GRU in RNNs by preserving the structure and reducing the parameters in the update and reset gates. This approach achieved reduced computational costs while maintaining the performance [26]. Chung et al. proposed a Gated Feedback RNN (GF-RNN) method, which extends the conventional approach of stacking multiple recurrent layers by introducing a gating mechanism. This method utilizes GRUs to address the vanishing gradient problem. By incorporating gating mechanisms between recurrent layers, each layer independently controls the flow and feedback of information, thereby enhancing the network’s ability to handle long-term dependencies and improving the overall performance [27].

4.8. Graph Convolutional Neural Networks (GCNs)

The CNN is only applicable to modeling tasks with Euclidean data, and the nodes in a traffic network usually present an irregular graph structure. The Euclidean and non-Euclidean data structures are shown in Figure 8. Therefore, the GCN [28], which models a non-Euclidean space, has gradually replaced the CNN and become a research hotspot in the field of short-term traffic flow prediction [29]. Since GCNs rely on graph convolution operations, they incur high computational costs. Moreover, due to the non-Euclidean nature of graph structures, the efficient parallelization of computations is challenging, which constrains their scalability in large-scale graph processing tasks. The core formula of the GCN is shown in Equation (11). Here,

X \in R^{N \times C}

is the signal matrix, including N signals with C-dimensional features;

Θ \in R^{C \times F}

is the graph convolution parameter matrix; and

Z \in R^{N \times F}

is the signal matrix obtained after the convolution operation.

Z = {\tilde{D}}^{- 1 / 2} A {\tilde{D}}^{- 1 / 2} X Θ

(15)

Graph convolution operations can be broadly classified into two categories: spatial domain-based and spectral domain-based graph convolutions. Spatial domain methods define graph convolution as the aggregation of feature information between adjacent nodes in the graph. Spectral domain methods, on the other hand, begin by processing graph signals and introduce filters to derive graph convolutions. This operation is interpreted as the process of removing noise from the graph signals. Both methods offer distinct advantages when handling graph data, and the most appropriate approach can be selected based on the specific requirements of the task.

In most existing methods, graph convolutional networks (GCNs) use a fixed adjacency matrix to model spatial dependencies in traffic networks. However, in real-world scenarios, the spatial dependencies vary over time. Hu et al. proposed a spatiotemporal graph convolutional network (GLSTGCN) for traffic prediction. They designed a graph learning module to capture dynamic spatial relationships in the traffic network and employed a gated mechanism with dilated causal convolution networks to capture long-term temporal correlations in traffic data. Experimental results demonstrated the superior performance of the GLSTGCN [30].

To address the challenges of air traffic flow prediction, which involves complex spatial dependencies, nonlinear temporal dynamics, and weather influences, Li et al. proposed a cross-attention diffusion convolutional recurrent neural network (CA-DCRNN). This model utilizes diffusion convolution to capture spatial dependencies and an encoder–decoder architecture with predefined sampling to handle temporal dependencies [31]. Zhao et al. combined a GCN with GRU networks to capture spatiotemporal dependencies, and their experimental results on two public datasets showed that the GCN network exhibited strong spatial information capturing capabilities [32].

4.9. Attention Mechanism and Transformer

An attention mechanism is a tool that imitates the human visual or cognitive system and is widely used in the field of deep learning. It was initially widely used in the field of NLP [33] and later in fields such as computer vision. It assigns different weights to each element in the input sequence according to their importance. This weight is dynamically calculated based on the relevance between each element and the target.

The Transformer is a deep learning framework primarily used for tasks such as natural language processing [34]. Differing from the traditional CNN and RNN architectures, the Transformer completely abandons these structures and is built on an attention mechanism. The model consists of self-attention and feed-forward neural networks, addressing the sequential computation limitations of RNNs, allowing for any two positions in the sequence to be treated equivalently and thereby overcoming the long-term dependency issue inherent in RNNs [34]. The self-attention mechanism in the Transformer architecture imposes exceptionally high computational costs. However, due to its reliance on matrix operations, the architecture is well suited for efficient parallel computation on GPUs, thereby exhibiting high scalability in large-scale training tasks. A diagram of the Transformer model is shown in Figure 9.

Yan et al. [35] utilized the multi-head attention mechanism and stacked layers to enable the Transformer to learn dynamic and hierarchical features in traffic data. They proposed the integration of global and local encoders to extract and fuse global and local spatial features, facilitating effective traffic flow prediction and providing a basis for traffic management strategies.

Fan et al. [36] introduced a novel spatiotemporal graph sandwich Transformer (STGST) for traffic flow prediction. In the STGST, two Transformers equipped with temporal encoding and one spatial Transformer with structural and spatial encoding are designed to model long-range temporal and deep spatial dependencies, respectively. By structuring these two types of Transformers in a sandwich configuration, the model captures rich spatiotemporal interactions. Experimental studies demonstrate that the STGST outperforms state-of-the-art baseline models.

4.10. Hybrid Neural Networks

Standalone models often focus on extracting either temporal or spatial features. To model both spatial and temporal correlations in traffic data simultaneously, deep learning hybrid models integrate statistical analysis, machine learning, and deep learning techniques for spatiotemporal data processing. These hybrid models aim to overcome the limitations of single models, enhancing both the accuracy and robustness of traffic flow predictions. By combining various models and techniques, hybrid models can more comprehensively account for the spatiotemporal characteristics of traffic flows, providing stronger support for traffic management and planning. These hybrids primarily include categories such as CNNs, RNNs, LSTM, and GCNs. The computational complexity, interaction dynamics, and adaptability of the combined components collectively influence the overall network’s computational overhead.

In recent years, the decomposition–reconstruction (DR) hybrid model has gained significant attention from researchers. Chen et al. classified DR-based models and thoroughly studied their applications in this field. A hybrid model based on DR for short-term traffic state prediction is shown in Figure 10 [37]. Guo et al. [38] proposed an attention mechanism-based spatiotemporal graph convolutional network (ASTGCN) model, which simultaneously models the three temporal attributes of traffic flows while using graph convolutional models and standard convolutions to extract spatial and temporal features. Furthermore, this model employs a spatiotemporal attention mechanism to effectively capture the dynamic spatiotemporal correlations in traffic data. Experimental results on the PEMS dataset show that the prediction performance of the ASTGCN surpasses that of other baseline models.

5. Traffic Prediction-Related Datasets

Data are extremely important for deep learning models. The quality of the samples in the dataset largely determines the prediction performance and generalization of the model. In Table 1, we summarize the most commonly used public and real-world datasets in the field of short-term traffic flow prediction research and list their basic characteristics.

5.1. Stationary Traffic Data

A fixed traffic dataset is a collection of traffic state information, such as the road traffic flow, speed, and lane occupancy, which is captured in real time by stationary traffic data collection devices like cameras and lidar sensors. Since the data are collected using specialized equipment, they are generally more accurate and continuous.

(1): PeMS: PeMS is a traffic flow database for California, containing real-time data from more than 39,000 independent detectors on California highways. Among them, the detector data are mainly from highways and metropolitan areas. These data include information such as vehicle speeds, flows, congestion, etc., which provide an important basis for traffic management, planning, and research. The minimum time interval of the data is 5 min, making them highly suitable for short-term prediction. The historical average method can be used to automatically fill in missing data. In the task of traffic flow prediction, the most commonly used sub-datasets are PeMS03, PeMS04, PeMS07, PeMS08, PeMS-BAY, etc.
(2): The METR-LA traffic dataset contains traffic information collected from detectors on freeway loops in Los Angeles County. The Los Angeles Freeway Dataset contains traffic data detected by 207 detectors from 1 March to 30 June 2012, with a sampling interval of 5 min.
(3): The Loop dataset contains mainly loop data from the Seattle area, covering data from four highways: I-5, I-405, I-90, and SR-520. The Loop dataset contains traffic status data from 323 sensor stations, with a sampling interval of 5 min [39].
(4): The Korean urban area dataset (UVDS) contains data on major urban roads collected by 104 VDS sensors, with traffic characteristics such as traffic flows, vehicle types, traffic speeds, and occupancy rates [40].

5.2. Mobile Traffic Data

As the demand for real-time and dynamic traffic information continues to increase, traditional fixed traffic data collection technologies and information processing methods, despite being relatively mature, face issues such as low coverage, high maintenance costs, and poor reliability. As a result, mobile traffic data collection technologies have gained significant attention. Common data collection methods include floating car data collection, drone-based collection, and crowdsourcing techniques. While such data require processing, they are easier to obtain, highly accurate, and widely used in traffic flow prediction.

(1): TaxiBJ Dataset: TaxiBJ is a dataset of Beijing taxi data that includes trajectory and meteorological data from over 34,000 taxis in the Beijing area over a period of 3 years. The data are converted into inflow and outflow traffic for various regions. The sampling interval for the dataset is 30 min, and it is primarily used for traffic demand prediction.
(2): Shanghai Taxi Dataset: This dataset, proposed by the Smart City Research Group at the Hong Kong University of Science and Technology, contains GPS reports from 4000 taxis in Shanghai on 20 February 2007. The vehicle data are sampled at 1 min intervals and include information such as the vehicle ID, timestamp, longitude and latitude, and speed.
(3): SZ-taxi Dataset: The SZ-taxi dataset consists of taxi trajectory data from Shenzhen, covering the period from 1 January to 31 January 2015. The dataset focuses on the Luohu District of Shenzhen and includes data from 156 main roads. Traffic speeds for each road are calculated every 15 min in this dataset.
(4): NYC Bike Dataset: The NYC Bike dataset records bicycle trajectories collected from the New York City Bike Share system. The dataset includes data from 13,000 bicycles and 800 docking stations, providing detailed information about bike usage and movement across the city.

5.3. Common Evaluation Indicators

Traffic flow prediction, as a typical regression problem, is commonly evaluated using performance metrics such as the MAE, RMSE, and MAPE to assess the accuracy of the model’s predictions.

Mean Absolute Error (MAE): The MAE is the average of the absolute differences between the observed and true values. It is the most intuitive method to represent the error by calculating the average absolute error between the predicted and actual traffic flow values. Smaller MAE values indicate better model performance, where n represents the number of traffic prediction test samples. The formula is as follows:

MAE = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}| .

(16)

Root Mean Square Error (RMSE): The RMSE is the square root of the mean of the squared differences between the observed and true values. By squaring the errors before averaging, larger deviations from the true values incur a greater penalty, making this metric sensitive to large errors. A smaller RMSE indicates better model performance, where n represents the number of traffic prediction test samples. The formula is as follows:

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}} .

(17)

Mean Absolute Percentage Error (MAPE): The MAPE measures the relative error between the observed and true values as a percentage. The values range from 0 to 1 and reflect the accuracy of predictions in proportional terms. A smaller MAPE indicates better model performance, where n represents the number of traffic prediction test samples. The formula is as follows:

MAPE = \frac{1}{n} \sum_{i = 1}^{n} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}| .

(18)

6. Method Comparison and Analysis

6.1. Model Training and Validation

In the process of model training, the dataset is typically divided into a training set and a validation set. The model is optimized by minimizing the discrepancy between its predicted outputs and the ground truth on the training set, using a predefined loss function. This loss is then propagated backward to update the model parameters through optimization techniques. After training on the training set, the model’s performance is assessed on the validation set, allowing for adjustments to the model architecture and hyperparameters to enhance the predictive accuracy.

For deep learning models, key hyperparameters that require tuning often include the learning rate, batch size, and number of training epochs, among others. To evaluate the model performance, various metrics are commonly employed. In short-term traffic flow prediction, widely used evaluation metrics include the MAE, MAPE, and RMSE. Once training and validation are completed, the final model is tested on an independent test set to assess its generalization performance and obtain its ultimate predictive accuracy.

6.2. Experimental Results and Method Comparison

This study uses the traffic flow characteristics of PEMS08 and the speed data of the METR-LA dataset for experiments, with a sampling period of 5 min. Table 2 shows the prediction errors of the ARIMA, MLP, LSTM, GRU, and ASTGN models on different datasets.

Smaller MAE and RMSE values indicate superior model performance. As shown in the table, the ARIMA model exhibits suboptimal predictive performance due to its limited data mining capabilities. Deep learning models, such as LSTM and the GRU, significantly enhance the predictive accuracy. The hybrid neural network model ASTGCN achieves the highest overall accuracy by incorporating both temporal and spatial correlations.

Traditional statistical methods struggle to capture complex traffic flow patterns, model both short- and long-term dependencies, and represent spatial relationships. In contrast, deep learning leverages multilayer neural networks (e.g., CNN, LSTM, Transformer) to automatically learn intricate data patterns, capture nonlinear dependencies, and enhance the predictive accuracy. CNNs and GNNs further enable spatial modeling, improving urban traffic flow forecasting.

Machine learning methods rely on manual feature engineering, increasing the costs and limiting their generalizability. Deep learning autonomously extracts hierarchical features from raw data, enhancing the model stability and adaptability.

In conclusion, deep learning outperforms traditional methods in terms of predictive accuracy and computational efficiency, enabling high-precision real-time forecasting. For data scenarios where statistical methods like ARIMA are inadequate, deep learning remains an effective approach for short-term traffic flow prediction.

6.3. Method Comparison

As discussed earlier, while deep learning methods achieve superior predictive performance, this does not imply that other approaches lack merit. Table 3 provides a systematic comparison of the advantages and disadvantages of various traffic flow prediction methods, offering a comprehensive overview of their core characteristics. By outlining key aspects such as the data requirements, computational complexity, and interpretability, this table facilitates a clearer understanding of the distinctions among different methodologies. Moreover, it serves as a valuable reference for researchers in identifying the most suitable approach for specific application scenarios, thereby enhancing the methodological selection process in traffic flow prediction research.

7. Conclusions and Future Perspectives

This paper reviews the main methods of traffic flow prediction in the construction of intelligent transportation systems. In our previous article, we introduced three major types of traffic flow prediction methods: statistics-based, machine learning-based, and deep learning-based methods. After discussing the principles, advantages, limitations, and applications of these methods in intelligent transportation systems, we found that, with the rapid development of big data and artificial intelligence technologies, deep learning models have shown obvious advantages in traffic flow prediction, especially when dealing with large-scale, nonlinear, and high-dimensional data, and their prediction accuracy and generalization abilities are significantly better than those of traditional methods [41]. These models play a vital role in the development of ITSs, not only improving the efficiency and accuracy of traffic management but also helping to optimize resource allocation, alleviate traffic congestion, and enhance the public’s travel experiences.

Although the short-term traffic flow prediction method based on deep learning has obvious advantages over the other two methods, it also faces certain limitations and challenges. One of the main problems is data acquisition and processing. This study observes that the datasets utilized in existing research predominantly originate from urban road traffic flow data. Given the substantial differences in traffic conditions between urban and rural areas, relying on a single type of dataset may result in the model’s inability to adequately capture the traffic patterns in rural regions, thus compromising its predictive accuracy for rural data. To address this problem, in the future, we can adopt the transfer learning method and develop an adaptive transfer learning framework. By pre-training the model on urban data and then fine-tuning it with a small amount of rural data, the prediction accuracy can be improved. Wei et al. [42] adopted a multimodal transfer learning method to transfer knowledge from data-rich cities to data-scarce cities, effectively solving the problem of label scarcity.

In addition, the generalization ability of the model in different traffic scenarios, especially in the context of extreme weather conditions, large-scale events, or other unique events, still needs to be improved. A potential solution is to expand the diversity and sources of datasets by merging different types of data (such as traffic flow, speed, travel time, density, weather, population, and image data). The fusion of multimodal data can promote improvements in model performance and expand its applicability [43]. Future research may focus on developing joint learning frameworks that enable stakeholders from diverse data sources and traffic scenarios to collaboratively train a global model, thereby achieving the deep integration of multimodal data and cross-domain knowledge sharing [44]. Under conditions of extreme weather, large-scale events, or other unique circumstances, such joint learning approaches can be used to assimilate heterogeneous data, including traffic flow, speed, and meteorological information and population metrics, effectively mitigating the challenges posed by uneven data distributions and significantly enhancing the model’s generalization capabilities and prediction accuracy across various traffic scenarios.

In addition, with the continuous development of the field of traffic prediction, an increasing number of traffic prediction models are being proposed. However, the field of traffic flow prediction lacks consistent experimental settings and standardized public datasets. Different scholars use different datasets in their research, and each dataset is affected by various external factors, which complicates the ability to effectively compare the actual prediction performance. Establishing standardized experimental settings and public datasets will greatly accelerate the development of the field of traffic flow prediction and improve the comparability and repeatability of different studies.

In addition, long-term traffic flow forecasting is also a significant problem. At present, research on short-term and medium-term traffic flow prediction has achieved significant progress, while research on long-term traffic flow prediction is still relatively limited. Unlike short-term traffic flow prediction, long-term prediction usually lasts for months or even years. Long-term traffic flow prediction faces challenges such as unstable results, high data and model complexity, and complex spatiotemporal dependencies. The question of how to further improve the accuracy of long-term traffic flow prediction models based on existing research is one of the most important research directions for the future.

In summary, this study not only provides a new perspective and methodological contribution to the field of traffic flow prediction, but also points out the future research directions. With the continuous development of intelligent transportation systems and the increasing complexity of urban traffic management, traffic flow prediction technology will continue to play an important role. We anticipate that more innovative research results will be obtained in the future to jointly promote the development and application of traffic flow prediction technology.

Author Contributions

Conceptualization, R.L. and S.-Y.S.; methodology, R.L. and S.-Y.S.; writing—original draft preparation, R.L. and S.-Y.S.; supervision, S.-Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Qingcai, C.; Wei, Z.; Rong, Z. Smart city construction practices in BFSP. In Proceedings of the 2017 29th Chinese Control and Decision Conference (CCDC), Chongqing, China, 28–30 May 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 2714–2717. [Google Scholar]
Shen, X.; Zhang, Q. Thought on Smart City Construction Planning Based on System Engineering. In Proceedings of the International Conference on Education, Management and Computing Technology (ICEMCT-16), Hangzhou, China, 9–10 April 2016; Atlantis Press: Dordrecht, The Netherlands, 2016; pp. 1156–1162. [Google Scholar]
Lana, I.; Del Ser, J.; Velez, M.; Vlahogianni, E.I. Road traffic forecasting: Recent advances and new challenges. IEEE Intell. Transp. Syst. Mag. 2018, 10, 93–109. [Google Scholar]
Daeho, K. Cooperative Traffic Signal Control with Traffic Flow Prediction in Multi-Intersection. Sensors 2020, 20, 137. [Google Scholar]
Ahn, J.; Ko, E.; Kim, E.Y. Highway traffic flow prediction using support vector regression and Bayesian classifier. In Proceedings of the 2016 International Conference on Big Data and Smart Computing (BigComp), Hong Kong, China, 18–20 January 2016; IEEE: Piscataway, NJ, USA, 2016. [Google Scholar]
Dai, G.; Ma, C.; Xu, X. Short-term traffic flow prediction method for urban road sections based on space–time analysis and GRU. IEEE Access 2019, 7, 143025–143035. [Google Scholar]
Smith, B.L.; Demetsky, M.J. Traffic flow forecasting: Comparison of modeling approaches. J. Transp. Eng. 1997, 123, 261–266. [Google Scholar]
Chen, C.; Hu, J.; Meng, Q.; Zhang, Y. Short-time traffic flow prediction with ARIMA-GARCH model. In Proceedings of the 2011 IEEE Intelligent Vehicles Symposium (IV), Baden-Baden, Germany, 5–9 June 2011; IEEE: Piscataway, NJ, USA, 2011. [Google Scholar]
Makridakis, S.; Hibon, M. ARMA models and the Box–Jenkins methodology. J. Forecast. 1997, 16, 147–163. [Google Scholar]
Li, Q.; Li, R.; Ji, K.; Dai, W. Kalman filter and its application. In Proceedings of the 2015 8th International Conference on Intelligent Networks and Intelligent Systems (ICINIS), Tianjin, China, 1–3 November 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 74–77. [Google Scholar]
Min, X.; Hu, J.; Chen, Q.; Zhang, T.; Zhang, Y. Short-term traffic flow forecasting of urban network based on dynamic STARIMA model. In Proceedings of the 2009 12th International IEEE Conference on Intelligent Transportation Systems, St. Louis, MO, USA, 4–7 October 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 1–6. [Google Scholar]
Min, W.; Wynter, L. Real-time road traffic prediction with spatio-temporal correlations. Transp. Res. Part C Emerg. Technol. 2011, 19, 606–616. [Google Scholar]
Yin, X.; Wu, G.; Wei, J.; Shen, Y.; Qi, H.; Yin, B. Deep learning on traffic prediction: Methods, analysis, and future directions. IEEE Trans. Intell. Transp. Syst. 2021, 23, 4927–4943. [Google Scholar]
Cheng, S.; Lu, F.; Peng, P.; Wu, S. Short-term traffic forecasting: An adaptive ST-KNN model that considers spatial heterogeneity. Comput. Environ. Urban Syst. 2018, 71, 186–198. [Google Scholar]
Tu, Y.; Lin, S.; Qiao, J.; Liu, B. Deep traffic congestion prediction model based on road segment grouping. Appl. Intell. 2021, 51, 8519–8541. [Google Scholar]
Ke, K.C.; Huang, M.S. Quality prediction for injection molding by using a multilayer perceptron neural network. Polymers 2020, 12, 1812. [Google Scholar] [CrossRef]
Slimani, N.; Slimani, I.; Sbiti, N.; Amghar, M. Machine Learning and statistic predictive modeling for road traffic flow. Int. J. Traffic Transp. Manag. 2021, 3, 17–24. [Google Scholar]
Aljuaydi, F.; Wiwatanapataphee, B.; Wu, Y.H. Multivariate machine learning-based prediction models of freeway traffic flow under non-recurrent events. Alex. Eng. J. 2023, 65, 151–162. [Google Scholar]
Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Bogaerts, T.; Masegosa, A.D.; Angarita-Zapata, J.S.; Onieva, E.; Hellinckx, P. A graph CNN-LSTM neural network for short and long-term traffic forecasting based on trajectory data. Transp. Res. Part C Emerg. Technol. 2020, 112, 62–77. [Google Scholar]
Palm, R.B. Prediction as a Candidate for Learning Deep Hierarchical Models of Data; Technical University of Denmark: Lyngby, Denmark, 2012; Volume 5, pp. 19–22. [Google Scholar]
Kashyap, A.A.; Raviraj, S.; Devarakonda, A.; Nayak, K.S.R.; KV, S.; Bhat, S.J. Traffic flow prediction models–A review of deep learning techniques. Cogent Eng. 2022, 9, 2010510. [Google Scholar]
Zheng, H.; Lin, F.; Feng, X.; Chen, Y. A hybrid deep learning model with attention-based conv-LSTM networks for short-term traffic flow prediction. IEEE Trans. Intell. Transp. Syst. 2020, 22, 6910–6920. [Google Scholar]
Zhao, Z.; Chen, W.; Wu, X.; Chen, P.C.; Liu, J. LSTM network: A deep learning approach for short-term traffic forecast. IET Intell. Transp. Syst. 2017, 11, 68–75. [Google Scholar]
Fu, R.; Zhang, Z.; Li, L. Using LSTM and GRU neural network methods for traffic flow prediction. In Proceedings of the 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), Wuhan, China, 11–13 November 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 324–328. [Google Scholar]
Dey, R.; Salem, F.M. Gate-variants of gated recurrent unit (GRU) neural networks. In Proceedings of the 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, USA, 6–9 August 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1597–1600. [Google Scholar]
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Gated feedback recurrent neural networks. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; PMLR: Westminster, UK, 2015; pp. 2067–2075. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Han, X.; Gong, S. LST-GCN: Long Short-Term Memory embedded graph convolution network for traffic flow forecasting. Electronics 2022, 11, 2230. [Google Scholar] [CrossRef]
Hu, N.; Zhang, D.; Xie, K.; Liang, W.; Hsieh, M.Y. Graph learning-based spatial-temporal graph convolutional neural networks for traffic forecasting. Connect. Sci. 2022, 34, 429–448. [Google Scholar]
Zuo, D.; Li, M.; Zeng, L.; Wang, M.; Zhao, P. A Cross-Attention Based Diffusion Convolutional Recurrent Neural Network for Air Traffic Forecasting. In Proceedings of the AIAA SCITECH 2025 Forum, Orlando, FL, USA, 6–10 January 2025; p. 2120. [Google Scholar]
Yao, Z.; Xia, S.; Li, Y.; Wu, G.; Zuo, L. Transfer learning with spatial–temporal graph convolutional network for traffic prediction. IEEE Trans. Intell. Transp. Syst. 2023, 24, 8592–8605. [Google Scholar]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
Ashish, V. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, I. [Google Scholar]
Yan, H.; Ma, X.; Pu, Z. Learning dynamic and hierarchical traffic spatiotemporal features with transformer. IEEE Trans. Intell. Transp. Syst. 2021, 23, 22386–22399. [Google Scholar]
Fan, Y.; Yeh, C.C.; Chen, H.; Wang, L.; Zhuang, Z.; Wang, J.; Dai, X.; Zheng, Y.; Zhang, W. Spatial-Temporal Graph Sandwich Transformer for Traffic Flow Forecasting. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Turin, Italy, 18–22 September 2023; Springer Nature: Cham, Switzerland, 2023; pp. 210–225. [Google Scholar]
Chen, Y.; Wang, W.; Hua, X.; Zhao, D. Survey of decomposition-reconstruction-based hybrid approaches for short-term traffic state forecasting. Sensors 2022, 22, 5263. [Google Scholar] [CrossRef]
Guo, S.; Lin, Y.; Feng, N.; Song, C.; Wan, H. Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 922–929. [Google Scholar]
Cui, Z.; Ke, R.; Pu, Z.; Wang, Y. Deep bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction. arXiv 2018, arXiv:1801.02143. [Google Scholar]
Bui, K.H.N.; Yi, H.; Cho, J. Uvds: A new dataset for traffic forecasting with spatial-temporal correlation. In Proceedings of the Asian Conference on Intelligent Information and Database Systems, Phuket, Thailand, 7–10 April 2021; Springer International Publishing: Cham, Switzerland, 2021; pp. 66–77. [Google Scholar]
Lv, Y.; Duan, Y.; Kang, W.; Li, Z.; Wang, F.Y. Traffic flow prediction with big data: A deep learning approach. IEEE Trans. Intell. Transp. Syst. 2014, 16, 865–873. [Google Scholar]
Wei, Y.; Zheng, Y.; Yang, Q. Transfer knowledge between cities. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1905–1914. [Google Scholar]
Bui, K.H.N.; Cho, J.; Yi, H. Spatial-temporal graph neural network for traffic forecasting: An overview and open research issues. Appl. Intell. 2022, 52, 2763–2774. [Google Scholar]
Li, T.; Sahu, A.K.; Talwalkar, A.; Smith, V. Federated learning: Challenges, methods, and future directions. IEEE Signal Process. Mag. 2020, 37, 50–60. [Google Scholar]

Figure 1. Single-neuron perceptron model.

Figure 2. Structure of the MLP.

Figure 3. CNN model diagram.

Figure 4. AE model diagram.

Figure 5. RNN model diagram.

Figure 6. LSTM model diagram.

Figure 7. GRU model diagram.

Figure 8. Euclidean and non-Euclidean data structure diagrams.

Figure 9. Transformer model diagram.

Figure 10. DR model diagram.

Table 1. Dataset summary.

Type	Dataset	Features	Time Interval	Traffic Data	Source
Stationary traffic data	PeMS	Flow, speed	5 min	39,000	https://github.com//Davidham3/ASTGCN (accessed on 1 February 2025)
	METR-LA	Speed	5 min	207	https://github.com//liyaguang/DCRNN (accessed on 1 February 2025)
	Loop	Speed	5 min	323	https://github.com/zhiyongc/Seattle-Loop-Data (accessed on 1 February 2025)
	UVDS	Flow, speed	5 min	104	[17]
Mobile traffic data	TaxiBJ	Flow	30 min	34,000 vehicles	https://github.com/TolicWang/DeepST/tree/master/data/TaxiBJ (accessed on 1 February 2025)
	SZ-taxi	Flow, speed	15 min	156 roads	https://paperswithcode.com/dataset/sz-taxi (accessed on 1 February 2025)
	NYC Bike	Flow	60 min	13,000 bicycles	https://paperswithcode.com/dataset/nycbike1 (accessed on 1 February 2025)
	Shanghai Taxi Dataset	Speed	1 min	4000 vehicles	https://gitcode.com/open-source-toolkit/659ba (accessed on 1 February 2025)

Table 2. Comparison of model performance on different datasets.

Model	PEMS08		METR-LA
Model	MAE	RMSE	MAE	RMSE
ARIMA	33.04	50.41	2.04	5.89
MLP	26.73	35.81	1.84	4.62
LSTM	27.34	37.43	1.86	4.66
GRU	26.95	36.61	1.86	4.71
ASTGCN	19.63	27.50	1.31	3.54

Table 3. Comparison of the advantages and disadvantages of different methods.

Category	Model		Advantages	Disadvantages
Traditional statistical models	HA		The model algorithm is simple, runs quickly, and has a small execution time overhead.	The model predicts future data based on the mean of historical data. Simple linear operations cannot represent the deep nonlinear relationships of spatiotemporal series. It is not ideal for the prediction of data with strong disturbances.
	ARIMA		The model algorithm is simple and based on a large amount of uninterrupted data; this model has high prediction accuracy and is particularly suitable for stable traffic flows.	Its modeling capabilities for nonlinear trends or very complex time series data are relatively weak. When encountering non-stationary data, it must first be stabilized; otherwise, its applicability will be limited.
	Kalman filter model		The algorithm is efficient, can quickly process large amounts of data, is adaptive, and can automatically adjust the prediction results based on historical data. It is suitable for situations where the sensor data have large noise. It can smooth data during real-time prediction and is suitable for traffic flow prediction in short time intervals.	Because of its iterative nature, it requires continuous matrix operations; the algorithm has a large time overhead and cannot adapt to nonlinear changes in traffic flows.
Machine learning models	KNN		High portability, simple principles, easy to implement. High model accuracy and good adaptability to nonlinear and non-homogeneous data. Since KNN does not require a complex training process, it is suitable for application scenarios with smaller data scales.	The processing efficiency of large-scale datasets is low, the model converges slowly, and the running time is long; thus, it may not meet the requirements for the real-time prediction of road traffic flows.
Deep learning models	MLP		The training process of the MLP is relatively simple, and there are many optimization methods. It is suitable for regression tasks, such as traffic flow prediction based on static features.	The training process of the MLP may take a long time and require considerable computing resources. When the number of data or features is small, the MLP may face overfitting problems. Performance is usually limited when processing non-stationary data.
	CNN		CNNs can effectively extract the spatial characteristics and temporal dependencies of traffic flow data and have strong generalization abilities and high prediction accuracy. Therefore, they perform well in traffic prediction tasks based on road network topologies.	Due to the complex structure of the model, the training process requires a large amount of computing resources and time. At the same time, the structure of the CNN cannot capture time dependencies well and it has a limited ability to process sequence data.
	AE		The AE can learn the laws and patterns of historical traffic data to predict future traffic flows and improve the efficiency and accuracy of traffic management. The AE is not used directly for prediction but is usually used as an auxiliary tool for feature extraction in combination with other deep learning models.	For complex data distributions, training autoencoders can be difficult and requires the careful tuning of the network structure and hyperparameters.
	RNN-based models	RNN	The RNN can capture the dynamic change characteristics of time series data and effectively use historical traffic flow information to predict future trends and is suitable for processing time-dependent traffic data. It performs well in short-term traffic flow prediction.	It can easily encounter gradient vanishing or gradient exploding problems during training, making it difficult to effectively learn long-term dependencies.
		LSTM	LSTM’s unique gating mechanism can effectively alleviate the long-term dependency problem of traditional RNNs, enabling the model to better capture the long-term trends and periodic changes in time series and improve the prediction accuracy. It is suitable for short-term and long-term traffic flow prediction, especially when the data are highly periodic.	The LSTM model is relatively complex, takes a long time to train, and requires a large amount of historical data for training to achieve better results.
		GRU	The GRU is simpler than LSTM, but it can also solve problems such as long-term dependency and gradient disappearance. It is suitable for short-term and medium-term traffic flow prediction tasks, especially when computing resources are limited.	Its prediction performance is affected by the data quality and feature selection. In complex scenarios, its prediction ability may be slightly inferior to that of LSTM.
	GCN		It is suitable for processing non-Euclidean structured data, can capture the relationships between nodes, and has good generalization abilities. It is suitable for traffic flow prediction tasks with strong spatial correlation, especially traffic flow modeling based on urban road networks.	The computational resource requirements are high and overfitting may occur when processing large graphs.
	Transformer		The parallel computing capabilities of the Transformer greatly improve the computing efficiency, and its self-attention mechanism enables the model to consider all positions in the sequence at the same time, thereby better capturing long-distance dependencies. It is suitable for long-term traffic flow prediction tasks.	Requires a large amount of historical data for training, and the computational cost is high.
	Hybrid neural network		By integrating the strengths of multiple neural network models, hybrid neural networks enable the comprehensive learning and precise prediction of traffic flow data. This approach is particularly well suited for complex traffic forecasting tasks, including large-scale urban traffic flow modeling and intelligent transportation management systems, where capturing both spatial and temporal dependencies is essential for accurate and robust predictions.	The model is highly complex, and the training and optimization process is relatively difficult, requiring longer computing times and larger amounts of computing resources.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, R.; Shin, S.-Y. A Review of Traffic Flow Prediction Methods in Intelligent Transportation System Construction. Appl. Sci. 2025, 15, 3866. https://doi.org/10.3390/app15073866

AMA Style

Liu R, Shin S-Y. A Review of Traffic Flow Prediction Methods in Intelligent Transportation System Construction. Applied Sciences. 2025; 15(7):3866. https://doi.org/10.3390/app15073866

Chicago/Turabian Style

Liu, Runpeng, and Seong-Yoon Shin. 2025. "A Review of Traffic Flow Prediction Methods in Intelligent Transportation System Construction" Applied Sciences 15, no. 7: 3866. https://doi.org/10.3390/app15073866

APA Style

Liu, R., & Shin, S.-Y. (2025). A Review of Traffic Flow Prediction Methods in Intelligent Transportation System Construction. Applied Sciences, 15(7), 3866. https://doi.org/10.3390/app15073866

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Review of Traffic Flow Prediction Methods in Intelligent Transportation System Construction

Abstract

1. Introduction

1.1. Theme Background and Significance

1.2. Research Status

1.3. The Purpose of This Study

1.4. Brief Layout of the Paper

2. Prediction Method Based on Statistical Analysis Theory

2.1. Introduction to Traditional Statistical Methods

2.2. Specific Methods Listed

2.3. Disadvantages

3. Traditional Machine Learning Models

3.1. Introduction to Machine Learning Methods

3.2. Specific Methods

4. Deep Learning Models

4.1. Introduction to Deep Learning Technology

4.2. Multilayer Perceptron Network (MLP)

4.3. Convolutional Neural Networks (CNNs)

4.4. Autoencoder (AE)

4.5. Recurrent Neural Networks (RNNs)

4.6. Long Short-Term Memory (LSTM) Recurrent Neural Networks

4.7. Gated Recurrent Unit (GRU)

4.8. Graph Convolutional Neural Networks (GCNs)

4.9. Attention Mechanism and Transformer

4.10. Hybrid Neural Networks

5. Traffic Prediction-Related Datasets

5.1. Stationary Traffic Data

5.2. Mobile Traffic Data

5.3. Common Evaluation Indicators

6. Method Comparison and Analysis

6.1. Model Training and Validation

6.2. Experimental Results and Method Comparison

6.3. Method Comparison

7. Conclusions and Future Perspectives

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI