Urban Safety: An Image-Processing and Deep-Learning-Based Intelligent Traffic Management and Control System

With the rapid growth and development of cities, Intelligent Traffic Management and Control (ITMC) is becoming a fundamental component to address the challenges of modern urban traffic management, where a wide range of daily problems need to be addressed in a prompt and expedited manner. Issues such as unpredictable traffic dynamics, resource constraints, and abnormal events pose difficulties to city managers. ITMC aims to increase the efficiency of traffic management by minimizing the odds of traffic problems, by providing real-time traffic state forecasts to better schedule the intersection signal controls. Reliable implementations of ITMC improve the safety of inhabitants and the quality of life, leading to economic growth. In recent years, researchers have proposed different solutions to address specific problems concerning traffic management, ranging from image-processing and deep-learning techniques to forecasting the traffic state and deriving policies to control intersection signals. This review article studies the primary public datasets helpful in developing models to address the identified problems, complemented with a deep analysis of the works related to traffic state forecast and intersection-signal-control models. Our analysis found that deep-learning-based approaches for short-term traffic state forecast and multi-intersection signal control showed reasonable results, but lacked robustness for unusual scenarios, particularly during oversaturated situations, which can be resolved by explicitly addressing these cases, potentially leading to significant improvements of the systems overall. However, there is arguably a long path until these models can be used safely and effectively in real-world scenarios.


Introduction
Urban transportation is considered the lifeblood of the world's economy, with a rapid increase of all sorts of vehicles and a stably increasing population in need of mobility, posing challenges to cities, with one of the major problems being the increase of traffic and the associated issues. According to the World Health Organization (https://www.who. int/publications/i/item/9789241565684, Last accessed on 16 November 2021), each year, 1.35 million people are killed and 20 million are wounded on roadways around the world. Road crash injuries are estimated to be the eighth leading cause of death globally, with an estimated cost among the fatal and wounded victims of approximately USD 1.8 trillion from 2015-2030, equivalent to a yearly tax of 0.12% on global GDP (Chen et al. [1]). Furthermore, according to INRIX (https://inrix.com/press-releases/2019-traffic-scorecard-us/, Last accessed on 16 November 2021), traffic congestion cost the U.S. economy nearly USD 88 billion in 2019 alone.
Intelligent Traffic Management and Control (ITMC) systems have emerged as a vital element of traffic management solutions, with the research community developing mechanisms to increase their accuracy, efficiency, and effectiveness. Traffic state forecast and intersection signal control are two main components that ITMC incorporates. Commonly, two genres of methods can be found, with the first component based on statistical methods and data-driven approaches, enabling the formulation hypotheses and the derivation of assumptions in a macroscopic and microscopic perspective for traffic flow. However, these approaches cannot handle unstable traffic conditions and complex road settings (Elhenawy and Rakha [2]). To overcome the nonlinearity of traffic, data-driven approaches, such a Support Vector Machines (SVMs), K-Nearest Neighbors (KNNs), Bayesian methods, and Neural Networks (NNs), enable one to overcome the limitations of statistical methods with promising results (Huang et al. [3]). However, for a model to achieve a good performance, a large amount of time series data is required, with the efficiency largely depending on how much a model can capture the spatial-temporal features of the traffic states. Moreover, corrupted or missing data pose difficulties for models, limiting the capacity to provide a useful and reliable forecasting result. Recently, deep-learning-based methods have addressed some of these limitations due to their ability to process large amounts of data efficiently and to capture hidden or unknown traffic dynamics (Bao et al. [4]).
On the other hand, efficient intersection traffic signal control, particularly in oversaturated conditions, requires actions to be taken based on the current traffic dynamic variables of the corresponding and neighboring intersections by the proper implementation of the policies. The most widely used method to tackle this problem is the FT controllers, which use historical data to determine the appropriate timing of traffic signals. However, this approach cannot meet the current traffic stochastic demands and handle unexpected traffic situations (Osorio and Wang [5]). Due to the limitations of the Fixed Time (FT) controllers, Webster's method was introduced, where inductive detectors are employed to observe the actual traffic conditions and efficiently extend or terminate the green signal time by measuring the gap between vehicles. However, accumulative information is neglected, reducing the overall performance (Eriskin et al. [6]).
The Sydney Coordinated Adaptive Traffic System (SCAT) (Sims and Dobinson [7]) and Split, Cycle, and Offset Optimization Technique (SCOOT) (Hunt et al. [8]) adopt adaptive systems to suppress the drawbacks of the previous methods by gathering the data of the traffic flow in real-time at each intersection to control the timing of traffic lights effectively. The SCAT systems count vehicles at each stop line to gather traffic information, and the SCOOT applies a set of advanced detectors located upstream of the stop line. Using these detectors, the SCOOT provides a higher resolution of the current traffic conditions, such as traffic flow and the number of cars in the queue before they reach the stop line. The SCAT and SCOOT both use centralized control schemes, with systems being run locally, and the coordination between intersections is achieved by communication among the neighbors. For example, when an intersection releases several vehicles, it informs the next intersection about the time and number of vehicles to expect at a particular time. However, the performance of such methods heavily depends on the detector's position and reliability. Recently, deep-learning models have been applied to self-adaptive traffic signal control, exhibiting substantially better performance in terms of accuracy and robustness (Bouktif et al. [9]).
The success of image-processing and the associated deep-learning technologies in ITMC in comparison with statistical methods can be realized from Figure 1, not only in terms of the quantity of articles published, but also in the quality of the forums in which they are published. In 2020, in the Scopus database, out of 483 published articles, 87 were based on deep-learning or image-processing methods, and they showed much better performances compared to state-of-the-art methods.
This review focuses on image-processing and deep-learning-based approaches to ITMC. Although there is a considerable number of relevant articles on intelligent transportation/traffic management and control (Nagy et al. [10], Pasquale et al. [11], Mirchandani et al. [12]), to the best of the authors' knowledge, there are a limited number of works on ITMC based on image-processing and deep-learning-based. Within the literature, the new emerging image-processing and deep-learning techniques are at an early stage of development, but with an increasing number of relevant implementations among the research community. In the year 2011, there were in total 246 documents published on the Scopus database, whereas in 2020, that amounted to almost double, reaching 483, as summarized in Figure 1. Therefore, this study is valuable both for machine-learning and ITMC researchers and decision-makers, who could identify the potential advantages of ITMC in their practice. The structure of this article as follows: Section 2 presents the methodologies used to identify and select the documents to be analyzed. Section 3 indicates different traffic state prediction/forecasting approaches and models with their corresponding structure, limitations, and performances. Section 4 is devoted to intersection-traffic-signal-control methods/policies, with a primary focus on their limitations and performances. Both in Sections 3 and 4, particular emphasis is given to image-processing and deep-learning-based approaches, including a brief overview of commonly employed methods. In Section 5, the developed search of this review is described and complemented with significant research challenges found in the literature. Finally, Section 6 provides insights regarding the objectives drawn and points out the main conclusions.

Methodology of the Systematic Review
This section describes the research methodology used to locate, gather, and appraise the state-of-the-art works under study. The main requirement was to sort out the important recent works on intelligent traffic-management-and-control methods/systems based on image-processing and deep-learning approaches. The following complementary questions were considered: • Which task of ITMC was addressed? • Which dataset was used? Was it tested on different datasets? • Which architecture/optimizer was utilized (developed or adapted)? • What metrics were used for evaluation?
• Is the approach adopted or developed able to achieve real-time performance? • For intersection-signaling schemes, what type of simulation environments (microscopic or macroscopic) were utilized? How were the policies evaluated?

Selection Criteria
The selection of the articles followed these criteria: (i) The studies should focus on intelligent traffic-management-and-control methods/systems based on the following approaches: image-processing and deep-learning techniques. The studies must identify problems, potential solutions, novelties, and limitations from which recommendations can be established. (ii) The studies should be peer-reviewed studies (research articles and literature reviews), best practices manuals (existing guidelines for ITMC), or policies. (iii) The research studies should include quantitative or qualitative research methods. (iv) All studies should be in English.

Databases and Search Steps
This literature review was conducted from March to July 2021 using Scopus, Science Direct, and Google Scholar. The authors extended the search to Google Scholar to include policies and best practices manuals. The search was performed with the following keywords in various combinations: "intelligent transportation", "intelligent traffic management and control", "image processing and deep learning-based intelligent traffic management and control", "short-term traffic forecasting", "image processing and deep learning-based short-term traffic forecasting", "intersection traffic signal control", "image processing and deep learning-based intersection traffic signal control".
In the first step, we selected studies addressing the keywords: "intelligent transportation", "intelligent traffic management and control", "short-term traffic forecasting", and "intersection traffic signal control" in various combinations. As a refinement step, we excluded duplicated articles and focused on image-processing and deep-learning approaches. The remaining articles were analyzed based on their titles and abstracts, with 332 articles retrieved. The 144 fully fledged research articles and reviews were sorted out to carry out in-depth studies through a complete reading of each document. We assessed the articles through the criteria that they should contain one of the following aspects: (1) research with actual users through qualitative or quantitative methods and the main method proposed and results obtained should be fully described; (2) specific guidelines or recommendations relating to the architecture, optimization, and metrics; (3) a review of existing literature regarding ITMC, as well as available policies that could contribute to the signaling schemes. In the final step, data from each document were organized in terms of type of study, primary focus, datasets used, adopted performance metrics, and limitations. Figure 2 illustrates these processes according to a PRISMA diagram.

Traffic State Prediction
Traffic state prediction aims to forecast future traffic variables such as flow or speed in the road network based on historical or observed traffic data and other supporting information relevant to the demand. Most of the models used for forecasts found in the literature deal with parametric or nonparametric approaches. The most popular parametric approaches are the Autoregressive Integrated Moving Average (ARIMA) models. Among the nonparametric techniques, various models have been proposed, such as NNs, SVMs for regression, and KNNs. Figure 3 represents the basic building blocks of an NN-based traffic state prediction model.

Autoregressive Integrated Moving Average
One of the most used and classical models for time series forecast is ARIMA (Box et al. [14]), which is based on the principle that future time series values can be generated from a linear function of past observations and white noise terms. The main advantage of ARIMA is its flexibility in following data patterns and higher forecast accuracy in the short term (Irhami and Farizal [15]). However, it requires noise-free datasets for model construction and has limitations in capturing nonlinear features.

Support Vector Machines
Contrary to ARIMA, SVMs can handle nonlinear and high-dimensional problems. An SVM-based classifier tries to maximize the hyperplane separation between two classes by solving a linearly constrained quadratic programming problem. It is robust to overfitting while providing high generalization performance (Li and Xu [16] Mingheng et al. [17]). However, the SVM models perform better in forecasting medium-duration incident cases than high-duration incident cases (Yu et al. [18]).

K-Nearest Neighbors
KNN is a data-driven model, being extremely sensitive to the data quality. Nevertheless, KNN is able to forecast traffic state by exploring the correlation among the data as instance-based learning, avoiding searching in all historical data. For the short-term traffic state prediction under special events, KNN has the potentiality to find the most similar historical patterns and ignore other dissimilar ones of the datasets. However, in common with most other traditional machine-learning approaches, KNN faces the curse of dimensionality problem in network-wide traffic prediction (Yu et al. [19]).

Multilayer Feedforward Neural Networks
The MLFNN is a simple feedforward NN consisting of a layer of input units, one or more hidden units, and one layer of output units. The most pioneering contribution of short-term traffic forecasting using MLFNNs can be found in the works of Smith and Demetsky [20], Gilmore and Abe [21], Florio and Mussone [22], and Dougherty and Cobbett [23].
Smith and Demetsky proposed a simple MLFNN for short-term volume prediction with one hidden layer, trained using real-world data (open data at VDOT https://www. virginiaroads.org/datasets/traffic-volume/explore, Last accessed on 16 November 2021), which exhibited a lower performance compared with the nearest-neighbor model. Gilmore and Abe improved the accuracy by employing two hidden layers, taking into consideration training and simulation time to increase the accuracy of the works led by Florio and Mussone, with the use of three hidden layers and preprocessed training data to mitigate significant training time problems and increase the accuracy. Dougherty and Cobbett trained an MLFNN with one hidden layer to forecast short-term traffic flow, speed, and occupancy space. The results showed that speed forecast was much less successful, although flow and occupancy forecasts exhibited promising results. Capturing both spatial and temporal features of traffic states and the usage of a correction mechanism can mitigate the problems as identified by Polson and Sokolov [24] and Huang et al. [3].
Although the accuracy was not very promising, attempts were also made to model and forecast network-wide traffic using MLFNNs (Sun et al. [25], Elhenawy and Rakha [2]). Sun et al. combined Graphical Lasso (GL) with an NN for a multilink prediction model. Elhenawy and Rakha proposed a much more accurate and robust data-driven approach by considering current traffic state data, weather conditions, visibility levels, and seasonal predictors. Moreover, their work was a milestone for the identification of traffic problems up to 2 h in advance, when compared to Kumar et al. [26], whose work was only able to extend the time horizon to a maximum of 15 min.
MLFNN-optimization strategies were also found during the studies; for example, Vlahogianni et al. [27] proposed a genetic-algorithm-based, structural-optimization strategy to help in both the proper data representation with temporal and spatial features, as well as inappropriate structure selection. Table 1 summarizes the works using MLFNNs and their focus, limitations, and performances. Due to their capabilities of modeling nonlinear functions with a simple architecture, MLFNNs have been extensively used in traffic state prediction. However, these models have some limitations in the exploration of more complex data correlations.

Radial Basis Function Neural Networks
Radial Basis Function Neural Network (RBFNN) models use Radial Basis Functions (RBFs) as the activation functions, being composed of one input layer, one hidden layer, and one linear output layer. Park et al. [28] used an RBFNN for short-term freeway traffic volume prediction, with results topping around 64.81% and 91.39%, with forecast traffic volumes being in the 10% and 20% error range, respectively. The prediction accuracy was improved by combining fuzzy C-means with an RBFNN and using a Generalized Regression Neural Network (GRNN) following the work of Park [29], Kuang et al. [30], and Buliali et al. [31]. On top of the GRNN, Buliali et al. used a Leave-One-Out Cross-Validation (LOOCV) method to determine the suitable smoothing factor in order to to avoid overfitting, achieving an RMSE of 16.4. Furthermore, Xiaobin [32] explored the use of Particle Swarm Optimization (PSO) to appropriately select the training parameters of an RBFNN, leading to a significant increase in prediction accuracy with a MAPE of 3.37%. Moreover, both the historical data of the current intersection and adjacent intersections were found to have a significant effect on the performance (Zhu et al. [33]). Table 2 indicates the works found and their primary focus, limitations, and performances. The performance of an RBFNN depends on the selection of centers and widths. The simplicity of the K-means clustering algorithm, width calculation, and the least mean squares algorithm for weight training make the method faster and efficient (Amin et al. [34]). However, the performance of the RBFNNs depends on the choice of the RBFs' parameters.

Wavelet Neural Networks
A Wavelet Neural Network (WNN) is essentially an MLFNN model where an additional wavelet function is applied to the hidden layers instead of the traditional sigmoid or tanh activation functions. It takes advantage of the multiscale decomposition of the wavelet transform and the self-learning capability of NNs to represent complex patterns. Ge and Wang [35] proposed a WNN-based short-time traffic flow prediction model that increased the accuracy and facilitated the convergence time, primarily due to the use of small training datasets. To further reduce the running time, Lin et al. [36] employed the use of a KNN to preselect the optimal training datasets for the WNN. Li and Sheng [37] and Yang and Hu [38] placed particular emphasis on the improvement of the prediction accuracy. Li and Sheng proposed a modified adaptive particle swarm optimization algorithm based on cloud theory that exhibited better performance in comparison to other baselines. Yang and Hu combined an Improved Genetic Algorithm (IGA) with a clustering search strategy and a WNN (IGA-WNN), boosting the prediction accuracy and better handling nonlinear cases. Table 3 indicates the works found using WNNs and their focus, limitations, and performances. Similar to RBFNNs, WNNs require less training effort and the obtained models have a better representation ability than MLFNNs. A significant drawback of the WNN is the limited input dimensions. Constructing a WNN requires a large computational effort in the input decomposition, in particular with the higher dimensionality of the input vector.

Time-Delay Neural Networks
Time-Delay Neural Network (TDNN) models are generally defined as multilayer NNs where the time-shifting approach is used to capture the temporal dynamics of time series data by encoding on delayed inputs or states. Lingras and Mountford [39] and Zhong et al. [40] applied a Genetic Algorithm (GA) in the design of a TDNN for short-term traffic forecasting aimed to handle large coverage areas, obtaining 10% average errors. To improve the accuracy, Wang et al. [41] integrated spatial and temporal autocorrelations of road traffic networks using a Space-time-Delay Neural Network (STDNN) using a low learning rate, achieving a MAPE of 13.7. Khandani and Mikhael [42] included a pretransformed layer with a TDNN using Discrete Cosine Transform (DCT), combined with a mixed transform strategy, to improve the model learning process and increase accuracy significantly. Table 4 summarizes the works found using TDNNs and their primary focus, limitations, and performances.
TDNNs are a simple way to represent correlations between past and present values in a feedforward model, requiring lower computational effort when compared to other models. However, longer a training time and difficulties in capturing temporal dynamics are some of the significant drawbacks of TDNNs.

Recurrent Neural Networks
The Recurrent Neural Network (RNN) models are powerful and robust because of their internal memory and ability to remember the input they receive, which allows them to predict future events. Hence, they are helpful in modeling sequence data such as time series. In the literature, a good amount of works on traffic state prediction were found based on the standard RNN, Long Short-Term Memory (LSTM), and the Gated Recurrent Unit (GRU), which are briefly described in the following sections. Unlike traditional NNs, RNNs are designed by feeding the output from previous steps into the input of the current state cell. They are particularly suitable for predicting future scenarios utilizing the sequential inner characteristics of the data. Ulbricht [43] pioneered the use of RNNs for traffic forecasting, using a multi-recurrent NN, and compared the proposed model with conventional statistical methods. The proposed multi-recurrent NN exhibited improved performance. In order to improve the accuracy, in particular for datasets characterized by instability, dynamic fluctuations, and unpredictability, Yun et al. [44], Dia [45], and Ishak et al. [46] proposed a time-delayed recurrent model, achieving a MAPE of around (4-6)%. Zhang [47] employed autocorrelation and cross-correlation analysis to construct more adequate models, and with careful parameters, optimization improved the overall accuracy. Bohan and Yun [48] applied LSTM, a GRU, and a Bidirectional RNN on the same datasets (GPS data), showing the feasibility of recurrent neural networks to achieve adequate traffic flow forecasting. Table 5 summarizes the works found using RNNs, their focus, limitations, and performances.
One major drawback of the standard RNNs is the exploding and gradient vanishing problems, which cause difficulties in training the models.

Long Short-Term Memory NNs
The Long Short-Term Memory (LSTM) model was proposed to overcome the gradient vanishing problem in traditional RNNs, which prevents the Vanilla RNN from capturing long-term dependencies (Hochreiter et al. [49]). The LSTM model employs a gating mechanism that allows deciding when and how to update its memory state. In the work of Ma et al. [50], an LSTM was applied to automatically determine the optimal time lags and overcome the backpropagation error decay problem. However, they considered only the temporal dependencies to be captured, resulting in relatively high errors and less robustness. Khan et al. [51] addressed incomplete data by utilizing a masking and imputation scheme, achieving a MAPE of 2.10% for annual daily forecasting. Moreover, Jia et al. [52] combined rainfall data in addition to speed data as the input and further improved the robustness and accuracy. Zhao et al. [53] took into consideration the spatiotemporal correlation in traffic using a 2D network, effectively improving both robustness and accuracy. Lu et al. [54] further improved the performance of LSTM by introducing cascading Temporal-aware Convolutional Context (TCC) blocks and a Loss-Switch Mechanism (LSM) to counteract non-Gaussian disturbances effectively. Table 6 summarizes the works found using LSTMs and their limitations and performance.

Gated Recurrent Unit NNs
The Gated Recurrent Unit (GRU), a variation of the LSTM model, was introduced by Cho et al. [55]. Although the performances of LSTM and the GRU are similar in many applications, GRU networks contain fewer parameters and are faster to train. Fu et al. [56] were one of the first to apply a GRU on the PeMS [57] datasets for traffic forecasting, showing slightly better performance and faster convergence than LSTM. To improve the accuracy, Zhao et al. [58] proposed a data fusion method to fuse the information of two different datasets and applied a GRU for travel time prediction. Bartlett et al. [59] considered the computational cost and network structure optimization and proposed three recurrent neural network models, with the GRU model outperforming the others, achieving an RMSE of 9.26%. To further enhance the accuracy and robustness, Pu et al. [60] integrated a decay mechanism as extra gates of the GRU model to handle the missing value problem. Model transferability and reproducibility can be improved by considering both temporal and local features in traffic flow. An attention-based GRU model was proposed by Khodabandelou et al. [61], achieving an MAE of 1.26 for a 1 h data sampling rate. Table 7 indicates works found using GRUs and their focus, limitations, and performances.

Convolutional Neural Networks
A Convolutional Neural Network (CNN) contains layers such as convolution, max pooling, and fully connected layers apart from the input and output layers. The convolution layers in CNNs are connected locally through sliding filters, unlike traditional feedforward NNs, in which one layer is fully connected to the next layer and so on, enabling the extraction of relevant features. Ma et al. [62] proposed a CNN-based network-wide speed prediction model that can convert spatiotemporal traffic dynamics into the image space, outperforming other algorithms with an average accuracy improvement of around 42.91%. Zang et al. [63] further improved the results with the introduction of a three-channel CNN. Although they could slightly improve the training process and accuracy, the robustness was still a concern. In the work of Yu et al. [64], a Spatiotemporal Recurrent Convolutional Network (SRCN) was proposed that explores the advantages of DCNNs and LSTM. To improve the scalability and accuracy, Fouladgar et al. [65] considered a decentralized method where each node can accurately predict in real time based on the neighboring station's state utilizing a regularized euclidean loss function. Table 8 summarizes the works found using CNNs and their focus, limitations, and performances.

Deep Belief Networks
Deep Belief Networks (DBN) are multiple layers of restricted Boltzmann machines (RBMs) with nondirectional connections between the layers and are able to learn a probability distribution over the input data. Hong et al. [66] proposed a multitask grouping neural network with a regression output layer at the top and a DBN on the bottom that achieved around 91.7% accuracy in traffic flow forecasting. Tan et al. [67] introduced two DBNs, one having Gaussian visible units and hidden binary units and the remaining units being binary, with results showing an improvement in the accuracy, but less robust nonetheless. Chen et al. [68] combined a DBN with Gaussian-Bernoulli restricted Boltzmann machines and a BPNN to improve the accuracy further, but robustness was still a concern. To enhance the prediction accuracy and robustness, Koesdwiady et al. [69] correlated weather parameters and traffic flow by employing a decision-level data fusion scheme. In the work of Bao et al. [4], the weather condition was also used, and the employed Support Vector Regression (SVR) to derive an improved DBN, which showed a good improvement both in robustness and accuracy. Table 9 summarizes the works found using DBNs and their focus, limitations, and performances.

Fuzzy Neural Networks
Fuzzy Neural Networks (FNNs) combine the merits of fuzzy systems and NNs. They can learn membership functions and appropriate fuzzy rules by engaging the adaptive approximation ability of NNs. Additionally, FNN models have better interpretability compared to NN-based models. Yin et al. [70] proposed an online-training-based FNN where the fuzzy approach was used to cluster the data and used an NN to specify the inputoutput relationships. The results showed good performance, in particular for less traffic fluctuation. Quek et al. [71] introduced a Pseudo-Outer-Product FNN using the Truth-Value-Restriction method (POPFNN-TVR), but it was less capable of counteracting noisy data. Zhao [72] combined an Interval Type-2 Fuzzy Neural Network (IT2FNN) and selforganizing learning algorithm that somehow failed to achieve performance improvement. However, Li [73] was successful in improving the accuracy by introducing Dynamic Fuzzy Neural Networks (D-FNNs) for traffic flow prediction. Still, the model showed a lack of robustness and a relatively slow learning process. Tang et al. [74] mainly aimed at improving the learning ability by suggesting an FNN model with both unsupervised and supervised learning processes, by employing a K-means method and a Gaussian fuzzy membership function; on the other hand, a weighted recursive least squares estimator was used in the supervised learning process. They not only improved the learning ability, but also achieved a 5% improvement in accuracy. In the work of An et al. [75], the focus was given to robustness by proposing a Fuzzy-based Convolutional Neural Network (F-CNN) method to incorporate uncertain traffic accident information, achieving a superior performance compared to other state-of-the-art works. Table 10 summarizes the works found using FNNs with their limitations and performances. To solve the problem of the fuzziness and uncertainty of traffic states in a signalized intersection, Stacked Autoencoder (SAE) models are commonly employed. Lv et al. [76] and Yang et al. [77] are two of the pioneers who applied the SAE model to traffic forecasting. They used SAEs to learn generic traffic flow features and trained them in a greedy layerwise fashion. Although the accuracy was promising, the models lacked robustness. Xiang and Chen [78] proposed a denoising SAE model consisting of K-means clustering and deep autoencoder networks to improve the robustness and accuracy, reaching a 91.5% and 88% accuracy in simulation and empirical data, respectively (7.1% better than other decision-tree models). Table 11 indicates the works found using AEs with a focus on their limitations and performance. Real-time information can predict link travel times and is suitable for to be employed in Modular Neural Networks (MNNs). Generally, unsupervised clustering techniques and MNNs are used to classify and predict link travel times, respectively. In the work of Park et al. [79], it was found that the MNN could give the best overall results compared to other relevant models. Ishak and Alecsandru [80] proposed multimodal techniques to improve prediction performance, but the results showed a lack of robustness. Vlahogianni et al. [81] suggested an MNN consisting of temporal genetically optimized struc-tures of MLPs and showed a good improvement of accuracy with an MSE of 8.21%. Table 12 indicates the works found using MNNs and their focus, limitations, and performances. 3.11.3. Self-Organizing Neural Networks These traffic forecasting models are based on Self-Organizing map Neural Networks (SONNs) and Self-Organizing Fuzzy Neural Networks (SOFNNs). Tung and Quek [82] combined the fuzzy approach with a self-organizing neural network and proposed the Generic Self-organizing Fuzzy Neural Network (GenSoFNN) algorithm, which showed encouraging performance, obtaining an MSE of 0.244. Boto-Giralda et al. [83] proposed a SONN model based on a stationary wavelet denoising process and a fuzzy ARTMAP. Ll and Huang [84] proposed a traffic forecasting model using Autoregressive (AR) methods based on a Self-Organizing Map (SOM) neural network, significantly improving the prediction accuracy, yielding considerably better performance than other methods. Table 13 presents the works found using SONNs and their focus, limitations, and performances.

Bayesian Neural Networks
When the Bayesian Combined Predictor (BCP) uses an artificial neural network, it is called a BNN. Such a design intends to combine the strengths of neural networks and stochastic modeling. BNN models can generate a complete posterior distribution and produce probabilistic guarantees of the predictions (Petridis et al. [85]). Chan et al. [86] proposed an Adaptive Particle Swarm Optimization (APSO) utilizing Bayesian regularization to minimize the overfitting problem, showing relevant efficiency improvements in traffic forecasting. To improve the accuracy, Gu et al. [87] proposed an Improved Bayesian Combination Model with Deep Learning (IBCM-DL) to increase not only the accuracy, but also the stability. AlKheder et al. [88] focused on evaluating the impacts of adjacent intersections in terms of the traffic volume and using a BCNN; the authors were able to show improvements in both model coherency and accuracy with an average MSE of 0.003468 during weekdays. Table 14 presents the works found using BNNs and their focus, limitations, and performances.   Table 15 presents the works found using RANs and their focus, limitations, and performances. First introduced by Goodfellow et al. [92], Generative Adversarial Networks (GAN) are composed of two NNs, competing against each other in order to generate new synthetic instances of data that can pass for real data. As a GAN can learn the joint distribution of the data and more effectively address the blurry prediction issue, it can be used to learn the distribution of future traffic flows conditioned on previous traffic flows and the most likely sample from the distribution as the prediction result. Liang et al. [93] proposed a deep Generative Adversarial Architecture (GAA) for network-wide prediction consisting of two LSTMs, and the experimental results showed much better performance compared to a BNN. To further increase the accuracy, Zhang et al. [94] proposed TrafficGAN employing both the CNN and LSTM models, which achieved an MAE of 1.76 during weekdays for a 30 min prediction horizon. In the work of Liang Zhang et al. [95], a Self-Attention Generative Adversarial Network (SATP-GAN) was proposed that used Reinforcement Learning (RL), showing an improvement of 6.5% over baseline methods. Different approaches of integrating rules as inductive biases into deep-learning-based prediction models were evaluated by Li et al. [96], confirming the usefulness of GANs in achieving better performance. Table 16 presents the works found using GANs with their focus, limitations, and performances.

Hybrid Schemes
Hybrid approaches in short-term traffic flow forecasting have been also commonly employed; in fact, most recent works are based on different hybrid approaches due to their higher performances when compared to other methods.

ARIMA, BPNNs, and GARCH
In these approaches, first, the linear features of time series are captured by an ARIMA model. For nonlinear features, a BPNN is then employed. To overcome the BPNN's disadvantages of slow convergence and to avoid falling into local minima, the Simulated Annealing (SA) algorithm is used (Yang et al. [97]). The joint ARIMA and Generalized Autoregressive Conditional Heteroskedasticity (GARCH) modeling approach can improve short-term ridership forecasting accounting for dynamic volatility, providing not only the expected value, but also, the prediction interval can be obtained (Lin et al. [98], Ding et al. [99]).

KNN-LSTM
Generally, in KNN-LSTM schemes, the KNN is used to capture spatial features and LSTM to model the temporal variability of traffic flow. A two-layer LSTM network can be applied to predict traffic flow, and the final prediction results are obtained by result-level fusion with the rank-exponent weighting method. It exhibits competitive performance when compared with well-known prediction models (Luo et al. [100]). Li et al. [101] introduced a Diffusion Convolutional Recurrent Neural Network (DCRNN), achieving an MAE of 2.07 for a 1 h prediction horizon. Yu et al. [102] proposed a Spatiotemporal Recurrent Convolutional Network (SRCN) combining DCNN and LSTM, which showed superior results both in long-and short-term forecasting. Allström et al. [103] combined both parametric and nonparametric approaches in an ensemble Kalman filter, obtaining a MAPE of 6.1 for a 30 min prediction horizon. Kolidakis et al. [104] combined Singular Spectrum Analysis (SSA) with Artificial Neural Networks (ANNs) to provide proactive decisions to mitigate the economic and environmental impacts of traffic congestion. Table 17 indicates the works found using hybrid schemes and their primary focus, limitations, and performances.

Traffic Signal Control
In this section, different intersection-traffic-signal-control systems and policies are discussed. In the literature, several strategies and policies were found during our studies, such as fixed-time traffic signal control, i.e., Webster's method, the SCAT, the SCOOT, Urban Traffic Optimization by Integrated Automation (UTOPIA), ImFlow, MaxPressure, the Generalized Proportional Allocation (GPA), and P 0 . Various machine-learning algorithms and controllers were also identified, such as Q-learning, neural networks, neuro-fuzzy methods, hybrid deep Q-networks, Deep RL, and Boosted GAs. In a multi-agent deepreinforcement-learning system, traffic light duration is controlled by analyzing independent and shared rewards based on a given objective, for example waiting time and number of waiting vehicles. Figure 4 depicts the main blocks of a common deep-learning-based intersection-traffic-signal-control model.

Fixed-Time Traffic Signal Control
Repeated signal cycles with the same phase structure have been used in fixed-time signal-control methods, which are commonly employed in real-world scenarios, mainly due to their low cost of implementation. By analyzing past traffic data, these methods have their signal parameters calibrated, including phase sequences, cycle lengths, green splits, and offsets (for signal coordination). TRANSYT (Hale [106]) is the most popular of these control methods. Because traffic demand usually varies over time, the Time-Of-Day (TOD) mode is often used, which consists of a collection of distinct signal plans for different times of the day, such as peaks and off-peaks (Zheng et al. [107]). On the other hand, robust signal optimization is used to deal with traffic flow uncertainty, i.e., a scenario-based technique is used in order to ensure the performance of fixed-signal systems (Zhang et al. [108]). For undersaturated and oversaturated demands, the unifying goals of these methods are to minimize vehicle delay and maximize intersection capacity, i.e., vehicle throughput. The signal time now includes queue lengths as well. For example, Jang et al. [109] devised a signal-optimization approach for the equalization of queue growth rates across connections in oversaturated road networks. Osorio and Wang [5] proposed a probabilistic network model to analytically approximate the stationary aggregate joint queue-length distribution of subnetworks. Hence, the developed model could be used to control traffic in cities. Furthermore, spill-backs for signal timings have been considered in recent studies, which have focused on the effects of delay variability. In addition to lowering vehicle delay, signal optimization also aims to minimize delay variability and spill-back likelihood (Mohajerpoor et al. [110]).

Webster's Method
The design of fixed-time (FT) splits under known (historical) constant demand rules by Webster, 1958 [111], and Webster and Cobbe, 1966 [112], has been extensively used in the last 50 years. It is efficient as long as traffic conditions are undersaturated, but fails when queues form in network links due to increasing demand. Kouvelas et al. [113] employed Webster's procedure within a Traffic-responsive Urban Control (TUC) for real-time operation, and the test implementation showed an average increase of speed by 11.3% compared to Traffic-Actuated Signal plan Selection (TASS) in relatively unsaturated conditions. Aiming at designing traffic signal timing at oversaturated intersections, Eriskin et al. [6] proposed an elimination pairing system and compared the proposal with Webster's method. The results showed the inefficiency of Webster's method to handle oversaturated traffic. Ali et al. [114] combined fuzzy logic and the Webster optimum cycle formula, showing an increase of the average waiting time by (18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30)(31)(32)(33)(34)% relative to MaxPressure and fixedtime, respectively. Considering intersection delay, fuel consumption levels, and emissions, Calle-Laguna et al. [115] applied Webster's method to estimate the optimum cycle length and eventually found an overestimation of the method. Table 18 presents the works found using Webster's method and their primary focus, limitations, and performances.

Sydney Coordinated Adaptive Traffic System
The Sydney Coordinated Adaptive Traffic System (SCAT) is unique, consisting entirely of computers and being adaptive to traffic demand, with its communication networks providing effective, yet flexible management of the system. The SCAT not only reduces delay, but also improves flow and decreases congestion, leading to a reduction of accidents and petroleum resource use, with the significant benefits of the decrease of air pollution and improving residential amenities (Sims and Dobinson [7]).

Split, Cycle, and Offset Optimization Technique
The Split, Cycle and Offset Optimization Technique (SCOOT) was designed for general applications within computerized urban traffic control systems, responsible for the coordination of the adjustment of the signal timings. An online computer with algorithms calculates and implements the timing predictions from vehicle detectors that are analyzed to minimize congestion. It was found that the SCOOT reduces vehicle delay by an average of 12% when compared to up-to-date optimized FT plans (Hunt et al. [8]). Bretherton [116] and Hansen et al. [117] implemented the SCOOT in different simulation environments to investigate its feasibility for real-time operations. Although the simulation results showed an average delay time reduction by (12-30)%, a performance deterioration was observed with the increased network space. Table 19 presents these works and their focus, limitations, and performances.

Urban Traffic Optimization by Integrated Automation
Urban Traffic Optimization by Integrated Automation (UTOPIA) aims to respond to fluctuations in traffic patterns by adjusting signal timing following traffic demand to reduce traffic congestion, delays, and travel time. Absolute priority assignment is used to select public vehicles and private traffic optimization in all traffic conditions. A deeper analysis demonstrated that the system is capable of handling traffic in heavy traffic conditions, i.e., at peak hours, with gains arising over 35% (Mauro and Di [118]). Wahlstedt [119] and Pavelski et al. [120] simulated UTOPIA using the VISSIM platform to evaluate its potentiality for real-time implementation. The simulation results demonstrated the performance of UTOPIA in the reduction of the average delay time and queue length. Table 20 indicates these works and their focus, limitations, and performances.

ImFlow
ImFlow is a self-optimizing signal system with distributed intelligence and a structure similar to UTOPIA. The optimization is performed in two steps: (i) stage-based optimization at the network/route level based on a cost function with user-defined weights; (ii) signal-group-based optimization at the intersection level based on logical rules; a simulation proved an average reduction of the delay per bus by (26)(27)(28)(29)(30)(31)(32)(33)(34)(35))% (Wahlstedt [119]).

MaxPressure/BackPressure Traffic Signal Control
The problems with infrastructure and the cost of centralized approaches motivate the emergence of decentralized control, i.e., a local traffic controller at a given intersection that only requires information from adjacent links; therefore, the required communication infrastructure is minimal. A decentralized algorithm for traffic signal control is MaxPressure (MP), sometimes called BackPressure, which Tassiulas and Ephremides initially developed in 1990 [121], although it was first adopted in urban traffic networks by Varaiya in 2013 [122]. The MP traffic controller, which requires the measurement of queue length, has the advantages of: (i) simple computation; (ii) no need for traffic demand knowledge; (iii) and not requiring the use of a fixed cycle time; instead, time-step-based policies are the actuating method. Le et al. [123] adapted a BackPressure scheme to study its stabilizing efficiency in any traffic demands, and the simulation showed an average reduction of travel time by 20.3%. To increase MP's accuracy, particularly in high-congestion situations, Gregoire et al. [124] proposed taking into account the queue capacities for the computation of the normalized pressures. Zaidi et al. [125] proposed a multicommodity BackPressure algorithm that showed significant improvement over a Fixed Schedule (FC) controller and a single-commodity backpressure controller in terms of queue length and travel times. Levin and Boyles [126] studied reservation-based intersection-control schemes using MP and P 0 , in particular for autonomous vehicles to improve throughput. Results on the downtown Austin network showed significant performance improvement over other baselines, although they failed to prove it to be actually throughput-optimal. Table 21 indicates these works and their primary focus, limitations, and performances.

Generalized Proportional Allocation Policies
Generalized Proportional Allocation (GPA) policies are decentralized and fully scalable, as they rely on local feedback information only. They do not require any global information about the network topology, the exogenous inflows, or the routing, which makes them robust (Nilsson and Como [127]). Moreover, they consider the overhead time while switching between services (Nilsson and Como [128]). Although GPA is yet to be implemented in real time, Nilsson and Como simulated GPA using the SUMO platform to evaluate its potentiality for real-time operation. The simulation demonstrated a significant improvement in robustness, scalability, and performance relative to other state-of-the-art works. Table 22 presents these works and their focus, limitations, and performances.

P 0 Policy
The P 0 policy, first introduced by Smith [131], considers both route costs and stage pressure as a function of flows and green time. When the network is in free-flow conditions, it provides a highly accurate approximation of the maximum throughput. However, under cost imbalance conditions, i.e., when congestion appears, it has some difficulties in approximating the maximum throughput (Cantelmo et al. [132], Smith et al. [133]).

Q-Learning Controller
Many existing traffic-control systems need a predefined model of the traffic environment to achieve optimal performances. In Q-learning, no prespecified environment model is required, and the relationship among actions, states, and the environment is learned by interaction with the environment. One of the advantages of reinforcement learning is that such algorithms are truly adaptive. They can respond to dynamic sensory inputs from the environment and a dynamically changing environment through ongoing learning and adaptation. Since the one-step Q-learning algorithm updates the Q-estimates at short intervals in conjunction with each action, it is adaptable to inline real-time learning. Furthermore, Qlearning is an off-policy algorithm because it gains valuable experience while exploring actions that may later be nonoptimal. Abdulhai et al. [134] were some of the first to introduce Q-learning in heavily congested intersection traffic signal control, showing encouraging results. Wiring et al. [135] proposed an adaptive optimization algorithm based on RL and compared it against nonadaptive controllers, and better performance was observed mainly for heavy traffic using the Green Light District (GLD) simulator. To maximize throughput, Wunderlich et al. [136] proposed a Longest-Queue-First Maximal-Weight-Matching (LQF-MWM) algorithm utilizing the arbitrary assignment of high priority that outperformed other baselines in high-load conditions. A five-intersection traffic network was studied by Arel et al. [137] using a multi-agent RL approach, where an autonomous intelligent agent governed each intersection, with experimental results demonstrating the advantages of multi-agent-RL-based control over LQF. Another method to enhance the performance was suggested by Prashanth et al. [138], which, by incorporating multiple timescale stochastic approximation in a policy gradient actor-critic algorithm, obtained better performance than standard Q-learning approaches. In the work of Abdoos et al. [139,140], a relatively large network was modeled using multi-agent systems, exploring Q-learning and holonic Q-learning approaches to control signals. Experimental results demonstrated the superior performance of holonic Q-learning in preventing oversaturation, reducing average delay, and increasing throughput. Information sharing among signal controllers was explored by Aziz et al. [141] by proposing an R-Markov Average-Reward-Technique-based RL (RMART) algorithm that not only outperformed in overcrowded conditions, but also significantly reduced emissions. Genders and Razavi [142] used an asynchronous n-step Q-learning algorithm with two NN hidden layers as the agents, showing a reduction of the total mean delay by 40% without compromising throughput. Table 23 presents these works and their focus, limitations, and performances. The challenge for all Q-Learning Controllers (QLC) is managing a considerable amount of state-action space. Q-learning without enough training examples has difficulties converging to the optimal point. However, Q-learning is a beneficial method since it includes an online-learning scheme to adapt to new situations.

Neural Network Controller
Artificial Neural Network (ANN) models have been widely used in traffic signal control because of their nonlinear mapping, self-adapting, self-organizing, and self-learning capabilities compared to the traditional methods. They are suitable for modeling the nonlinear characteristics of traffic states. To address the changing traffic patterns, Hua and Faghri [143] proposed a multilayer NN-based traffic-signal-control approach for an isolated intersection that paved the way for future research using ANNs. To improve the timing of traffic signals at intersections, Spall and Chin [144] used an ANN that showed approximately 10% improvement in the mean wait time. Saito and Fan [145] focused on finding optimal signal timing by presenting a feasibility testing platform named the Optimal Traffic Signal Control System (OTSCS), applied to the Optimal Traffic Signal Timing Model (OTSTM) based on an ANN, which reduces the time to reach the optimal solution. Kim et al. [146] studied the applicability of ANNs for the cycle-length design of Adaptive Traffic Control Systems (ATCSs), reducing by 8.3% the cycle length in saturated traffic conditions. Intersection traffic signal control solely based on video images instead of conventional traffic parameters, such as delays and queue lengths, was proposed by Jeon et al. [147], achieving a 23% delay time reduction compared with other baselines. To further enhance the performance, Bernas et al. [148] proposed a neuro-evolution strategy, which, compared with other decentralized baselines, showed a superior reduction of delay time. Table 24 summarizes these works, presenting their focus, limitations, and performances. As a neural network can learn and self-adapt, a fuzzy system deals efficiently with the uncertainty and inaccuracies of real systems by using if-then rules, a hybrid approach consisting of both the neural network and fuzzy logic, generally providing excellent results. Mir and Hassan [149] proposed a neuro-fuzzy-based approach where a Fuzzy Logic System (FLS) was used for model training and an NN was used for the calculation of the green light time, proving the potentiality of an efficient traffic signal control. Dong et al. [150] combined an NN and FLS to derive an Adaptive Fuzzy Neural Network (AFNN) algorithm that reduced the delay time by 8.45% with a 24.04% increase in average fuel economy.
To further enhance the performance taking into account the traffic conditions on both the current lane and the adjacent lane, Mittal and Chawla [151] proposed a hybrid neurofuzzy-based adaptive system that, in comparison with FLS-and FT-based systems, reduced the intersection waiting time by (22.6-46.37)%. Table 25 presents these works with their primary focus, limitations, and performances. The efficiency and accuracy of traffic signal control systems can be enhanced by fusing Deep Learning (DL) and Reinforcement Learning (RL). This type of approach can deal with large amounts of data processing, systematic perception, and expression, which is crucial to the coordinated control of arterial intersections (Chen et al. [152]). Luo et al. [153] combined DL and RL by utilizing the MDP and CNN, which reduced the queue length by 42.5% relative to DQN. Considering knowledge sharing among the agents, Li et al. [154] proposed the Knowledge-Sharing Deep Deterministic Policy Gradient (KS-DDPG) algorithm, which showed significant efficiency in controlling large-scale networks and coping with fluctuations in traffic flow. The inability of DRL algorithms to meet the demands of coordination among the agents inspired Wang et al. [155] to propose a Cooperative Group-Based Multi-agent reinforcement learning-ATSC (CGB-MATSC) framework that demonstrated a significant reduction of average waiting time by 42.08% relative to FT. Kekuda et al. [156] proposed an n-step State, Action, Reward, State, and Action (SARSA) algorithm to increase the implementability in low-cost real-time systems and compared it with LQF; it showed a 5.5% reduction of the average queue length. Table 26 indicates these works and their focus, limitations, and performances. Proposed a low-cost real-time system using an n-step SARSA algorithm

Considered a risk-insensitive approach
Reduction of the queue length by 5.5% relative to LQF

• Combination of QL, NNs, and FL
Methods combining Q-learning, BP neural networks, and the fuzzy controller have shown promising efficient traffic signal control performance. In such approaches, QL and BPNNs are used to determine the optimal switching time of a particular phase and the fuzzy controller to select the optimal phase sequence (Zhao et al. [157]).

• Hybrid Deep Q-Networks
A hybrid deep Q-network combines both discrete and continuous DRL approaches to control traffic signals and simultaneously decide the proper phase and its associated duration. This type of framework can reduce the average queue length and travel time by a significant amount. Pálos and Huszák [158] [159], YOLOv3-tiny was retained and combined with OpenCV, and the traffic density was measured, which drives the signaling schemes using a trained DQN. For a multi-intersection scenario, it achieved an increase of the average speed by 18% compared with a static traffic light system. Table 27 summarizes these works concerning their primary focus, limitations, and performances. Traffic control optimizations combining Machine Learning (ML) and Genetic Algorithms (GA) are also efficient. In the work of Mao et al. [160], the Extreme-Gradient Decision-Tree (XGBT) and Genetic Algorithm (GA) were combined to reduce the total travel time by almost half when used under incident conditions.

Discussion
A systematic literature search was performed in the Science Direct, Scopus, and Google Scholar databases with the following keywords in various combinations: "intelligent transportation", "intelligent traffic management and control", "image processing and deep learning-based intelligent traffic management and control", "short-term traffic forecasting", "image processing and deep learning-based short-term traffic forecasting", "intersection traffic signal control", "image processing and deep learning-based intersection traffic signal control". One-hundred forty-four fully fledged research articles were finally selected based on the following inclusion criteria: most relevant, most cited, and most recent. For traffic state forecasting, in terms of performances, the GAN-based methods and also hybrid approaches showed better performance on state-of-the-art datasets, i.e., PeMS (Li et al. [154], Zhang et al. [95]). For intersection signal control, DRL-and DQN-based approaches showed much better efficiency and robustness (Wang et al. [155], Bouktif et al. [9]) relative to other baselines. However, no model is self-sufficient to address all the problems, and hence, plenty of scope for improvement exists. A comparative analysis followed by a very brief summary of the fundamental research challenges is presented in the following sections.

Why Deep Learning?
Traffic states are generally affected by long-term and short-term traffic features. As an example, during the weekdays, traffic flow will always show a rapid increment and decrement in the morning and evening, respectively, referred to as long-term features, because it is affected by society's behaviors. There might be uncertain fluctuations due to adverse weather, traffic accidents, and other nonrecurrent events, which are called short-term features. For a model to capture these features, a considerable amount of data must be processed efficiently.
Moreover, the corrupted or missing value problem is common in time series data, which is difficult to address by traditional machine-learning approaches. Additionally, the traffic states of the intersections are interrelated with their adjacent counterparts. An efficient intersection traffic signal control demands perceiving of the environments correctly, to take actions accordingly in a coordinated manner. Traditional machine-learning approaches have limitations in handling these demands. Deep-learning-based methods, on the contrary, have a much better ability to overcome these problems efficiently.

Comparative Analysis
One way to find out whether a method in ITMC is efficient or not is to analyze the number of documents published in recent times based on those techniques. In recent years, mainly from 2019, researchers have been applying the LSTM, GRU, CNN, GAN, DBN, FNN, BNN, RAN, and TDNN approaches in traffic state prediction and for intersection signal control. The approaches of RL, GPA, Hybrid, ANN, and Webster's method have been deployed. For traffic state prediction, out of the 71 studied articles, 39.2% of the works published between 2019 and 2021 were based on RNNs (LSTM and GRU more precisely). On the other hand, for intersection traffic signal control, 42.9% of the works out of 73 utilized reinforcement learning-based methods (DQN and DRL) within the same time horizon, as is depicted in Figure 5. However, the best way to judge the suitability of a method is to analyze it in terms of its performance. CNN-based traffic state prediction methods trained on datasets without any missing values revealed superior performance compared to other baselines. This is because CNN-based models can capture the spatiotemporal features more efficiently than other models. For example, a SRCN-based forecasting model achieved an RMSE of 4.32 on the PeMSD7 dataset (Yu et al. [64]). One major problem with this kind of method is robustness. With the arrival of nonrecurrent events such as congestion, their performance deteriorates. To achieve higher robustness, LSTM-based methods with an effective mechanism to counteract non-Gaussian disturbances showed much better performance relative to other methods, for example, in the work of Lu et al. [54].
On the contrary, a considerable amount of time series data is required for good performance, and it is challenging to find datasets without any missing value problems. The GAN-based approaches overcome these problems by providing new artificial data of the same quality as the training data. Furthermore, the CNN and LSTM embedded GAN-based methods have shown the best performance so far, by achieving an RMSE of 2.12 for prediction over a long time horizon (Zhang et al. [94]).
Multi-agent deep-reinforcement-learning-based methods are expected to be dominant over other state-of-the-art methods for intersection traffic signal control. Traffic states are highly unpredictable, and the states of an intersection depend on others. Hence, coordination among the different intersections is essential. Multi-agent deep-reinforcement-learningbased methods possess the provision to cope with this. For example, the knowledge-sharing deep deterministic policy gradient algorithm showed an average reduction of the queue length and intersection delay by 28.9% and 35.1%, respectively, relative to the MaxPressure (MP) method (Li et al. [154]).

Need for Better Datasets
Most of the works studied in the literature used personally collected datasets, as quality fully publicly available datasets are scarce. Nonetheless, PeMS from the Caltrans Performance Measurement System was mainly used by the researchers (Li et al. [101], Lu et al. [54]). The publicly available datasets found during our study are indicated in Table 28.

Reduction of Computational Complexity
Most state-of-the-art models, in particular DL, typically require millions of parameters and billions of operations to produce human-level accuracy. The memory and computational requirements, in particular the deployment of low-power embedded platforms with lower power budgets, are challenging (Maghazeh et al. [161]). Cloud-based infrastructures are a viable solution to this problem. However, privacy implications, the consumption of a significant amount of power, latency, and scalability are significant drawbacks that need to be addressed (Duan [162]).

Model Interpretability
Deep NNs have been found to be very efficient in handling the complex nature of traffic. However, the complexity of the models often makes the understanding of the prediction results difficult, and issues arise about these models' accuracy. The combination of FLS and NNs provides better model interpretability (Tang et al. [74]). However, with the increase in traffic complexity, they fail to provide optimal outputs. Hence, there are plenty of opportunities to enhance the models' interpretability.

Finding the Best Evaluation Methodologies
Different algorithms search for different trends and patterns. One algorithm may not be the best suited across all datasets. To find the best solution, it is necessary to evaluate them. Hence, evaluating how well a model generalizes to new and unseen data is very important. During this study, it was found that the F1-score, true positive rate, Mean Absolute Percent Error (MAPE), Mean Absolute Error (MAE), Root-Mean-Squared Error (RMSE), variance score, and R 2 value are often used as traffic forecasting model performance indicators. However, the Average Displacement Error (ADE), Final Displacement Error (FDE), and Maximum Distance (MaxDist) were also found to be used in some recent works.
On the other hand, modern researchers use the average waiting time, queue length, travel time, intersection delay, and fuel economy for intersection traffic signal control. Figure 6 depicts the typically used evaluation metrics in different scenarios based on the studied works. Problems related to probability prediction, the Receiver Operating Characteristic (ROC), and the Area Under the Curve (AOC) are most suitable. While for class labels prediction, evaluation metrics should be selected based on the importance of the classes. For example, if all classes are equally important, "accuracy" can be used as an evaluation metric; otherwise, the F1-score, F2-score, and Matthews Correlation Coefficient (MCC) were found to be convenient. However, which one would be most suited to a particular problem or whether it is necessary to find new evaluation techniques needs to be addressed further. To evaluate the intersection signal control methods/strategies, powerful simulation environments are utilized by researchers. These simulation tools are helpful for testing and assessing different dynamic transportation issues that are challenging to solve in the real world. On top of that, simulation environments can replace actual experiments with trustworthy representations of the subject matter in a controllable computer program and allow researchers to compare algorithms and reproduce experiments. In most of the recent works, for example, in Nilsson and Como [130], Bouktif et al. [9], and Li et al. [154], the researchers employed the SUMO environment to evaluate their proposed methods. Table 29 indicates the simulation environments found during this study.

Environmental Challenges
GPUs are often used to train and test NNs to deliver the highest arithmetic performance for 32 bit floating-point NN inference. However, operating at 200+ W, their use is becoming prohibitively expensive in terms of energy footprint. Research showed that the carbon footprint of NNs using GPUs is about five-times the lifetime emissions of an average car (Strubell et al. [163]).

Conclusions
Forecasting traffic and intersection signal control are vitally important for an efficient, ITMC system. For forecasting, the data-driven approaches are gaining popularity because of their higher prediction power and accuracy. However, missing or imbalanced datasets impose difficulties in finding the optimal models. GANs can overcome these difficulties by generating new artificial data that approximate the same unknown distribution as found in the limited training data examples. For intersection signal control, multi-agent deep reinforcement learning and deep Q-networks can be explored in more detail to efficiently control multi-intersection traffic. In summary, with the advancement of image-processing and deep-learning technologies, ITMC research opens a new horizon to enable researchers to address more complex problems in a manageable ways. Therefore, this review aimed to identify the state-of-the-art methods used in ITMC and systematically presented their structure, overall performances, and limitations. Funding: This article is the result of the project Safe Cities-"Inovação para Construir Cidades Seguras", with reference POCI-01-0247-FEDER-041435, cofunded by the European Regional Development Fund (ERDF), through the Operational Programme for Competitiveness and Internationalization (COMPETE 2020), under the PORTUGAL 2020 Partnership Agreement.
Institutional Review Board Statement: Not applicable.