1. Introduction
Urban traffic congestion has emerged as one of the most pressing challenges facing cities worldwide as rapid urbanization continues to intensify. According to recent projections, approximately 68% of the global population is expected to reside in urban areas by 2050, placing unprecedented strain on existing transportation infrastructure [
1]. The resulting traffic delays impose substantial economic costs, reduce quality of life, increase fuel consumption and emissions, and undermine the efficiency of urban transportation systems. Addressing these challenges requires innovative approaches to traffic management that leverage advances in data collection, computational capabilities, and predictive modeling. Improving urban traffic efficiency is also closely aligned with the United Nations Sustainable Development Goal (SDG) 11, which aims to make cities inclusive, safe, resilient, and sustainable. Intelligent traffic management systems that reduce congestion, travel delays, and vehicle emissions contribute directly to more sustainable urban mobility and improved quality of life in rapidly growing cities.
In recent years, the proliferation of connected vehicles, roadside sensors, and floating car data has created new opportunities for real-time traffic monitoring and prediction. These data sources enable the collection of high-resolution traffic measurements at fine temporal and spatial scales, providing detailed insights into traffic conditions at individual intersections. Such granular data are particularly valuable for intersection-level traffic management, where delays are often concentrated and where effective intervention can yield substantial system-wide benefits. Accurate short-term prediction of traffic delay at intersections is therefore essential for supporting adaptive traffic intersection dynamics, dynamic route guidance, and real-time traveler information systems.
The development of deep learning methods has opened new avenues for traffic flow and delay prediction, overcoming many limitations of traditional statistical and shallow machine learning approaches. Recurrent neural networks, particularly Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRU) [
2,
3], have demonstrated strong performance in modeling sequential traffic data due to their ability to capture long-term temporal dependencies [
4,
5,
6]. These architectures have been successfully applied to various traffic prediction tasks, including traffic flow forecasting [
7,
8,
9,
10] and speed prediction [
11]. More recently, attention mechanisms and graph-based models have been proposed to further enhance prediction accuracy by explicitly modeling spatial dependencies and dynamic relationships in traffic networks [
12,
13].
Despite these advances in model architecture, a critical aspect that has received comparatively less attention is the role of input feature selection in shaping prediction performance. Many existing studies focus primarily on developing more complex model architectures, while the role of input feature selection is often treated as a secondary consideration. However, the choice of input features—including which traffic measurements to include, how to engineer derived features, and which temporal lags to consider—can be significantly influential, particularly when data availability or computational resources are constrained. Understanding which types of features contribute most to prediction accuracy can inform more efficient data collection strategies, reduce model complexity, and improve interpretability. This gap is particularly important at intersections, where operational traffic dynamics such as queue formation and stop behavior strongly influence short-term delay patterns.
At intersections and roundabouts, traffic delays arise from the complex interplay of demand, capacity, signal timing, and queue dynamics. Conventional traffic engineering approaches have long recognized the importance of queue length, vehicle stops, and traffic intensity as key indicators of delay [
14,
15]. However, the relative importance of different feature groups—such as historical delay values, operational state measurements (queue length, stops), efficiency indicators (free-flow travel time ratios), and demand characteristics (traffic volume)—remains an open question in the context of deep learning-based prediction. Different feature groups may capture complementary or redundant information, and their relative contributions may vary depending on the prediction horizon, traffic conditions, and model architecture.
This study addresses this gap by conducting a systematic investigation of how different input feature configurations affect short-term traffic delay prediction at an urban intersection. Unlike prior studies that primarily focus on developing new prediction architectures, this work isolates the contribution of different feature categories under a controlled experimental framework. Therefore, rather than introducing a novel model architecture, we adopt a controlled experimental design in which the model architecture (GRU and LSTM) and hyperparameters are held constant across multiple scenarios, while the input feature set is varied systematically. This approach allows us to isolate the effect of input features on prediction performance and to draw conclusions about the relative importance of different traffic measurements.
We define five feature scenarios based on distinct groups of traffic characteristics: a temporal baseline (S0), historical delay features (S1), operational state features (S2), combined historical and operational features (S3), and a full feature configuration that additionally incorporates demand measurements (S4). By comparing prediction performance across scenarios that include different combinations of these feature groups, we aim to quantify their individual and joint contributions to delay prediction accuracy. The analysis is conducted using high-resolution approach-level data collected at one-minute intervals from a roundabout in Jeddah, Saudi Arabia, providing insights relevant to similar urban intersection environments.
The findings of this study have practical implications for the design and deployment of traffic prediction systems. If certain feature groups are found to contribute little to prediction accuracy, they can be deprioritized in data collection or excluded from models to reduce computational burden. Conversely, identifying highly informative feature groups can guide investment in sensor technologies and data collection infrastructure. Moreover, understanding feature importance can aid model interpretability, helping traffic engineers and operators understand which traffic conditions drive delay predictions and supporting more informed decision-making.
The remainder of this paper is organized as follows.
Section 2 reviews related work on traffic prediction, delay modeling, and feature importance.
Section 3 describes the study site, data collection, feature engineering, and experimental design.
Section 4 presents the results of the controlled experiments.
Section 5 discusses the implications of the findings and study limitations.
Section 6 concludes the paper.
3. Methodology
3.1. Study Site and Data Collection
3.1.1. Intersection Description
This study focuses on a major urban roundabout located in Jeddah, Saudi Arabia, which consists of four approaches (Eastbound, Northbound, Westbound, and Southbound). The roundabout represents a typical high-demand urban traffic facility subject to recurrent congestion during peak periods.
To ensure methodological clarity and controlled analysis, the present study concentrates on a single approach (Approach 2), which exhibits the highest congestion levels and the largest delay variability during the observation period.
To verify the general applicability of the proposed modeling framework, the best-performing configuration was additionally evaluated on the remaining three approaches. Only summary performance statistics are reported for these approaches in order to maintain a focused methodological analysis.
3.1.2. Data Source and Period
Traffic data were obtained from the TomTom Junction Analytics platform (TomTom International BV, Amsterdam, The Netherlands), which provides high-resolution, approach-level traffic performance indicators derived from floating car data.
Study period: March 2025 (full month)
Temporal resolution: 1-min intervals
Total observations: 44,611 records for the selected approach
The study focuses on a single month (March 2025) to maintain consistent traffic conditions and avoid seasonal variability, enabling a controlled comparison of feature configurations. Moreover, the high temporal resolution enables short-term traffic delay prediction suitable for real-time traffic management applications.
3.2. Data Description and Variables
Table 1 summarizes the traffic variables available for each minute.
To further examine the relationships between key traffic variables, a pairwise correlation analysis was conducted between delay, queue length, and traffic volume. The results are illustrated in
Table 2.
The results show a moderate positive correlation between queue length and traffic volume (r = 0.61), indicating that queue dynamics reflect variations in traffic demand. In contrast, the correlations between delay and the other variables are weaker (r ≈ 0.24–0.25), suggesting that delay is influenced by multiple operational factors beyond traffic demand alone. These findings support the interpretation that queue-related variables provide valuable information about congestion dynamics in intersection environments.
The delay (D) variable is selected as the primary prediction target because it directly reflects congestion severity and operational efficiency at intersections. Unlike travel time, which may include variations unrelated to intersection performance, delay isolates the additional time experienced by vehicles relative to free-flow conditions. As a result, delay provides a more informative indicator for evaluating traffic control performance and short-term congestion dynamics. Furthermore, delay is commonly used in intelligent transportation systems (ITS) to evaluate intersection performance and guide adaptive intersection dynamics strategies.
Although the predictive models in this study focus primarily on Approach 2, descriptive statistics for all roundabout approaches are presented in
Table 3. Providing statistics for all approaches allows a quick comparison of the traffic characteristics across the intersection and helps contextualize the prediction results discussed later in the paper.
The statistics reveal noticeable differences in traffic demand and operational conditions among the approaches. In particular, Approach 2 exhibits relatively high queue lengths and substantial delay variability compared with the other approaches, indicating pronounced congestion dynamics suitable for evaluating delay prediction models.
Approach 1, in contrast, shows significantly lower traffic volumes and shorter queues, indicating more stable traffic conditions. Such characteristics typically result in smoother traffic flow and lower variability in delay, which can make prediction tasks comparatively easier. Meanwhile, Approaches 3 and 4 present higher traffic demand levels and larger queue variations, reflecting more complex congestion patterns.
For these reasons, Approach 2 was selected as the primary modeling case in this study. It provides a balanced traffic environment where congestion dynamics are sufficiently pronounced to evaluate prediction models while still maintaining adequate data consistency for training deep learning models. In addition, the similarity of key traffic indicators across several approaches supports the general applicability of the proposed prediction framework beyond a single approach.
3.3. Feature Engineering
Rather than using raw variables directly, the input features are organized into conceptually meaningful groups to analyze their relative contribution to prediction performance. This grouping allows a structured investigation of how different sources of traffic information, including temporal persistence, operational state variables, and demand indicators, contribute to prediction accuracy.
Feature Groups
Group A: Historical Delay Features
These features capture short-term temporal persistence in traffic conditions:
: delay at the previous minute;
: delay five minutes earlier;
: usual (historical baseline) delay.
Group B: Traffic Efficiency Indicators
This group reflects vehicle movement efficiency and delay intensity:
Group C: Queue-Related Measurements
Queue dynamics are represented using:
Group D: Demand-Related Features
Traffic demand characteristics are described by:
3.4. Temporal Context Features
To capture periodic traffic patterns, temporal features are included in all scenarios. Temporal context variables include hour-of-day and day-of-week encoded using sine and cosine transformations to capture cyclic temporal patterns. In addition, a binary peak-period indicator was introduced to represent typical congestion periods, including the morning peak (07:00–09:00) and evening peak (16:00–19:00).
These features allow the model to learn daily and weekly traffic regularities without explicitly relying on calendar variables.
3.5. Prediction Target
The prediction task is formulated as short-term delay forecasting. Two prediction horizons are examined: one minute ahead and five minutes ahead.
where
h represents the prediction horizon (
h = 1 or
h = 5 min).
This short prediction horizon is operationally meaningful for real-time traffic monitoring and adaptive control strategies.
3.6. Input Feature Scenarios
To systematically assess the impact of different input configurations, five feature scenarios are designed as shown in
Table 4.
This design enables controlled comparison between temporal, historical, operational, and demand-driven information.
3.7. Recurrent Neural Network Models
3.7.1. Recurrent Neural Network Architecture
To model temporal dependencies in traffic delay dynamics, recurrent neural networks are adopted. Specifically, both Gated Recurrent Unit (GRU) and Long Short-Term Memory (LSTM) architectures are evaluated. These architectures are widely used in traffic prediction because of their ability to capture sequential patterns in time-series data.
To ensure a fair comparison between architectures, both models follow the same overall network configuration, differing only in the recurrent layer type.
The network structure is defined as follows:
Input layer (scenario-dependent feature dimension);
Recurrent layer (GRU or LSTM) with 32 hidden units;
Dropout layer (rate = 0.1) to reduce overfitting;
Fully connected layer with 16 neurons and ReLU activation;
Output layer with a single neuron for delay prediction.
3.7.2. Training Configuration
The selected network configuration represents a moderate-complexity architecture commonly adopted in short-term traffic prediction studies. To isolate the effect of input features, all hyperparameters were kept constant across scenarios and model architectures:
Lookback window: 10 time steps (10 min);
Optimizer: Adam;
Learning rate: 0.001;
Batch size: 64;
Loss function: Mean Squared Error (MSE);
Early stopping based on validation loss.
3.8. Data Preparation
3.8.1. Cleaning and Filtering
Data preprocessing was conducted to ensure the reliability of the training dataset. Missing observations were removed using a complete-case strategy. In addition, records with extreme delay values greater than 300 s were excluded, as such values typically correspond to abnormal traffic conditions or measurement noise. These extreme observations represented less than 0.5% of the dataset and their removal did not materially affect the overall data volume. After these steps, only valid operational records were retained for model training and evaluation.
3.8.2. Feature Scaling
All numerical features were standardized using z-score normalization, with scaling parameters computed from the training set. To prevent information leakage and ensure reproducibility in real-time deployment scenarios, all normalization parameters were computed using the training set only. The same scaling parameters were then applied to the validation and test sets. This approach ensures that no information from future observations is used during model training.
3.8.3. Train–Validation–Test Split
A chronological split was applied to preserve temporal integrity:
Training set: 75%;
Validation set: 10%;
Test set: 15%.
3.9. Evaluation Metrics
Prediction performance is assessed using:
Mean Absolute Error (MAE):
Root Mean Squared Error (RMSE):
MAE provides an interpretable measure in seconds, while RMSE penalizes large errors associated with congestion peaks.
3.10. Visualization and Analysis Framework
Model performance is analyzed through a multi-level framework that includes:
Aggregate MAE and RMSE comparison across feature scenarios (S0–S4) to evaluate the contribution of temporal, historical, operational, and demand-related feature groups.
Relative improvement with respect to the historical baseline scenario (S1) to quantify the practical benefit of adding operational and demand-related inputs.
Comparison across recurrent architectures (GRU and LSTM) to examine whether the observed feature effects remain consistent under different model structures.
Comparison across prediction horizons (1-min and 5-min ahead forecasts) to assess how the value of different feature groups changes with forecasting range.
Repeated-run analysis (10 runs) using mean and standard deviation of performance metrics to evaluate result stability under stochastic training effects.
Statistical significance testing of pairwise scenario differences to determine whether observed performance gains reflect genuine improvements rather than random variation.
Time-window comparison of observed and predicted delay for representative scenarios to provide a qualitative view of temporal tracking behavior.
Residual analysis to examine prediction error distributions and identify systematic under- or over-estimation patterns.
Summary evaluation across the remaining three approaches using the best-performing configuration to assess the general consistency of the proposed framework beyond the main analyzed approach.
A hyperparameter sensitivity analysis to verify that the main findings are not highly dependent on a specific hidden-layer size choice.
This multi-level framework enables quantitative comparison, statistical validation, and practical interpretation of model behavior under different feature configurations, model architectures, forecasting horizons, and traffic conditions. All experiments were implemented using MATLAB R2025b (MathWorks, Natick, MA, USA).
4. Results
4.1. Overall Performance Comparison
Table 5 and
Table 6 summarize the prediction performance of all feature scenarios for both the GRU and LSTM models under the two forecasting horizons considered in this study (1-min and 5-min ahead prediction, respectively). The reported values represent the mean and standard deviation obtained from 10 independent training runs, allowing assessment of both prediction accuracy and result stability.
The results presented in
Table 5 and
Table 6 reveal a clear hierarchy among the tested feature configurations. As shown in
Table 5, which corresponds to the 1-min forecasting horizon, the temporal-only baseline scenario (S0) consistently produces the largest prediction errors for both models, with MAE values exceeding 22 s. The relatively small standard deviations observed across runs indicate stable model behavior despite stochastic training effects.
Introducing historical delay information (S1) improves prediction accuracy compared with S0. However, the improvement remains limited, with MAE values around 20–21 s. This suggests that relying solely on short-term temporal persistence does not fully capture the rapid dynamics of congestion formation at urban intersections.
A significant improvement emerges when operational traffic indicators are incorporated. As shown in
Table 5, scenarios S2 and S3 reduce the MAE to approximately 17.3 s for both GRU and LSTM models. These results confirm that operational indicators such as queue length, stop frequency, and efficiency ratios provide highly informative signals about the current traffic state.
The best overall performance for the 1-min horizon is achieved by the full feature configuration (S4). The GRU model achieves a mean MAE of 17.24 ± 0.07 s, while the LSTM model achieves 17.22 ± 0.11 s. Compared with the historical baseline scenario (S1), this corresponds to an improvement of approximately 17% in prediction accuracy.
Another important observation is that the performance differences between scenarios S2, S3, and S4 remain relatively small, and their standard deviations largely overlap. This indicates that operational traffic variables already capture a large portion of the predictive information, while the additional demand-related variables in S4 provide only marginal improvement.
For the 5-min forecasting horizon, the results in
Table 6 show that the overall prediction errors increase and the differences between feature scenarios become smaller. For example, the MAE values for the GRU model range from 22.52 s in S0 to 21.84 s in S4. This narrowing performance gap indicates that the predictive advantage of richer feature sets decreases as the forecasting horizon becomes longer.
Comparing
Table 5 and
Table 6 therefore suggests that richer feature configurations are particularly beneficial for very short-term predictions, while their relative advantage decreases for longer forecasting horizons.
Although both GRU and LSTM architectures were evaluated, the performance differences between the two models remain consistently small across all scenarios and horizons. This indicates that both recurrent architectures are capable of effectively modeling the temporal dynamics of traffic delay.
Given its simpler structure and lower computational complexity, the GRU architecture may therefore be more suitable for real-time deployment scenarios where computational efficiency is important.
To verify whether the observed differences between feature scenarios are statistically meaningful, additional statistical significance testing was conducted. The MAE values obtained from the 10 repeated runs were compared using paired t-tests. The results indicate that the improvements observed when moving from the baseline scenarios (S0 and S1) to the richer feature configurations are statistically significant for the one-minute forecasting horizon. However, the differences among scenarios S2, S3, and S4 are relatively small and not always statistically significant, confirming that operational features already capture most of the predictive information.
4.2. Statistical Significance Analysis
To further verify whether the observed improvements between feature scenarios are statistically significant,
Table 7 presents a paired
t-test that was conducted on the MAE values obtained from the 10 independent training runs for each model, scenario, and forecasting horizon.
The results indicate that for the 1-min forecasting horizon, the full feature configuration (S4) significantly outperforms both the temporal baseline (S0) and the historical baseline (S1) for both GRU and LSTM models (p < 0.001). This confirms that incorporating operational and demand-related traffic indicators leads to a statistically meaningful improvement in prediction accuracy.
For the 5-min forecasting horizon, S4 remains significantly better than S0 for both models. However, the differences between S4 and the intermediate feature configurations (S1–S3) are generally not statistically significant. This suggests that the relative advantage of richer feature sets decreases as the prediction horizon increases, which is consistent with the narrowing performance differences observed in
Table 6.
Overall, the statistical analysis supports the conclusion that richer feature representations provide the greatest benefit for very short-term traffic delay prediction, while their relative contribution becomes less pronounced for longer forecasting horizons.
4.3. Temporal Prediction Behavior
To better illustrate the qualitative prediction behavior of the models,
Figure 1 and
Figure 2 present time-series comparisons between the observed delay and the predicted delay produced by the models for a representative half-day time window.
Figure 1 shows the predictions obtained using the GRU model for the baseline scenario (S1) and the full feature configuration (S4). It can be observed that both scenarios are able to capture the general evolution of delay throughout the day. However, the predictions produced by S4 follow the observed delay pattern more closely than those produced by S1. In particular, S4 better captures the magnitude of congestion peaks and the timing of rapid delay changes.
In contrast, the baseline scenario S1 tends to produce smoother predictions and often underestimates high delay values during peak congestion periods. This behavior reflects the limited information content of historical delay features alone.
A similar pattern can be observed for LSTM. The full feature configuration (S4) consistently tracks the observed delay more accurately than the historical baseline (S1). However, the difference between S1 and S4 appears slightly less pronounced for LSTM than for GRU, which is consistent with the small performance gap observed in the quantitative metrics.
Overall, the time-series analysis confirms that incorporating operational and demand-related features improves the model’s ability to follow rapid traffic dynamics.
4.4. Residual Distribution Analysis
To further analyze the prediction errors,
Figure 3 and
Figure 4 present the residual distributions for scenarios S1 and S4 for both GRU and LSTM models. The residuals are defined as the difference between the observed delay and the predicted delay.
The histogram shows that the residual distribution of S4 is more concentrated around zero than that of S1. This indicates that the full feature configuration produces more stable predictions and reduces the magnitude of large prediction errors.
A similar pattern is observed for the LSTM model, as shown in
Figure 4.
In both models, the S4 residual distribution exhibits a narrower spread and fewer extreme errors compared with the baseline scenario S1. This confirms that the richer feature set helps the model better capture sudden changes in traffic conditions.
4.5. Cross-Approach Generalization
To further evaluate the generalizability of the proposed framework, the best-performing configuration was tested across the remaining roundabout approaches. Although the main experiments were conducted on the most congested approach (Approach 2), the best-performing configuration was also evaluated on the remaining three approaches to verify the general consistency of the proposed framework.
Table 8 summarizes the prediction accuracy obtained for each approach using the S4 configuration.
The results show noticeable differences in prediction difficulty among the approaches. Approach 1 achieves the lowest prediction error, with an MAE of approximately 8.17 s, while the other approaches exhibit MAE values between approximately 16.6 and 17.4 s.
This difference is primarily related to the underlying traffic characteristics of each approach. As shown in the dataset summary statistics, Approach 1 exhibits significantly lower queue lengths and traffic volumes compared with the other approaches. Consequently, the traffic conditions at this approach are more stable and easier for the models to predict.
It is also important to note that the preprocessing step removed observations with delay values greater than 300 s. However, these extreme values represented less than 0.5% of the dataset for all approaches. Therefore, their removal had a negligible impact on the dataset size and does not explain the performance differences between approaches.
Overall, these results indicate that the proposed prediction framework remains applicable across different approaches, while the prediction difficulty is influenced by the underlying traffic demand and congestion variability.
4.6. Hyperparameter Sensitivity Analysis
To evaluate the robustness of the model with respect to hyperparameter selection, a sensitivity analysis was conducted by varying the number of hidden units in the GRU network (16, 32, and 64 units) while keeping all other training parameters fixed.
Table 9 presents the prediction accuracy and training time obtained for each configuration.
The results show only minor variations in prediction accuracy across the tested configurations. The best performance is obtained with 32 hidden units (MAE ≈ 17.19 s), while configurations with 16 and 64 units produce MAE values of approximately 17.28 s and 17.39 s, respectively.
These small differences indicate that the proposed framework is relatively insensitive to moderate variations in model capacity. In other words, the prediction performance is primarily driven by the selected feature configuration rather than extensive hyperparameter tuning.
5. Discussion
5.1. Operational Insights
The experimental results provide useful insights into the relative importance of different feature groups for short-term intersection delay prediction. In particular, the strong performance of scenarios incorporating operational variables highlights the critical role of real-time traffic state indicators.
Queue length and the number of vehicle stops emerge as particularly informative predictors. Queue length directly reflects the imbalance between demand and available capacity, while stop frequency captures stop-and-go traffic behavior associated with congestion. Together, these operational indicators provide an effective representation of the current traffic state.
Interestingly, the improvement obtained by combining historical and operational features remains relatively small. Although the combined scenario achieves the lowest prediction error, the difference compared with operational-only features is marginal. This suggests that operational measurements implicitly capture short-term traffic history, since queue formation inherently reflects congestion accumulation over preceding minutes.
The limited contribution of demand-related variables also indicates that congestion effects are more effectively represented through observable operational indicators than through raw demand measurements alone. Queue dynamics integrate the combined effects of demand and capacity, making them a more direct signal of traffic conditions.
5.2. Practical Implications and Limitations
From a practical perspective, the results suggest that effective short-term delay prediction can be achieved using a relatively compact set of operational features. This is advantageous for real-time deployment because it reduces both data requirements and computational complexity. Traffic monitoring systems may therefore benefit from prioritizing technologies capable of accurately measuring queue length and vehicle stops.
However, several limitations should be acknowledged. First, the analysis focuses primarily on one approach at a single roundabout, which may limit the generalizability of the findings to other intersection configurations. Although cross-approach experiments provide supporting evidence, further validation across multiple intersections would strengthen the conclusions.
Second, the experiments employ a fixed model configuration in order to isolate the effect of input features. Future research could examine whether similar feature importance patterns hold across alternative architectures or training strategies.
Finally, the study focuses on one-step-ahead prediction. Longer prediction horizons may exhibit different sensitivities to input feature configurations, particularly with respect to demand-related variables. Extending the analysis to multi-step forecasting represents an important direction for future research.
6. Conclusions
This study investigated the impact of input feature configurations on short-term traffic delay prediction at an urban intersection using a controlled experimental framework. Five feature scenarios were evaluated using two recurrent neural network architectures (GRU and LSTM) and high-resolution, approach-level traffic data collected at one-minute intervals from an urban roundabout in Jeddah, Saudi Arabia. The analysis considered two forecasting horizons (1-min and 5-min ahead) and employed repeated training runs to ensure the robustness of the results.
The experimental findings provide several important insights regarding the role of different traffic features in intersection-level delay prediction. First, the temporal-only baseline scenario (S0) consistently produced the largest prediction errors for both models and forecasting horizons, confirming that temporal indicators alone are insufficient to capture short-term variations in intersection congestion. Introducing historical delay information (S1) improved prediction accuracy, but the improvement remained limited.
A substantial performance improvement emerged when operational traffic indicators were introduced. Scenarios incorporating queue-related and efficiency variables (S2–S4) significantly reduced prediction errors, highlighting the critical importance of real-time traffic state measurements. These results indicate that variables such as queue length, stop frequency, and efficiency ratios provide highly informative signals about current congestion conditions.
Among the tested configurations, the full feature scenario (S4) generally achieved the best performance for the one-minute forecasting horizon for both the GRU and LSTM models. However, the performance differences between scenarios S2, S3, and S4 remained relatively small, suggesting that operational indicators already capture most of the predictive information required for short-term delay forecasting. In contrast, demand-related variables such as traffic volume provided only marginal additional benefits when queue dynamics were already included.
For the longer forecasting horizon (5 min), the overall prediction errors increased, and the differences between feature scenarios became noticeably smaller. This result indicates that the predictive advantage of richer feature sets diminishes as the forecasting horizon increases, which is consistent with the growing uncertainty associated with longer-term traffic prediction.
The comparison between the GRU and LSTM architectures revealed very similar prediction performance across all scenarios and horizons. This suggests that, for the considered dataset and prediction task, the selection of informative input features plays a more influential role than the specific recurrent architecture used.
Additional experiments across the remaining roundabout approaches demonstrated that the proposed framework remains applicable under different traffic conditions. However, the prediction difficulty varied across approaches, largely due to differences in traffic demand and queue characteristics. Approaches with lower traffic volumes and shorter queues exhibited more stable traffic patterns and consequently lower prediction errors.
A hyperparameter sensitivity analysis further showed that moderate variations in the number of hidden units produced only minor changes in prediction accuracy. This indicates that the overall findings are relatively robust with respect to reasonable changes in model capacity.
Overall, the results indicate that input feature configuration plays an important role in short-term intersection delay prediction for the considered dataset and prediction task. The findings further suggest that effective delay forecasting can be achieved using a relatively compact set of real-time operational measurements, without requiring extensive historical data or complex feature engineering.
Future research could extend this work by evaluating the proposed framework across multiple intersections and network configurations, investigating multi-step forecasting horizons, incorporating external factors such as weather and incidents, and exploring interpretable learning methods that provide deeper insights into the relationship between traffic dynamics and prediction performance.