A Scalable Context-Aware STGCN Framework for Real-Time Traffic Forecasting with Residual Correction

Karetsos, Panagiotis; Petkani, Viktoria; Tzanis, Dimitris; Mintsis, Evangelos; Mitsakis, Evangelos

doi:10.3390/futuretransp6030111

Open AccessArticle

A Scalable Context-Aware STGCN Framework for Real-Time Traffic Forecasting with Residual Correction

by

Panagiotis Karetsos

^*

,

Viktoria Petkani

,

Dimitris Tzanis

^*

,

Evangelos Mintsis

and

Evangelos Mitsakis

Hellenic Institute of Transport, Centre for Research and Technology Hellas, 57001 Thessaloniki, Greece

^*

Authors to whom correspondence should be addressed.

Future Transp. 2026, 6(3), 111; https://doi.org/10.3390/futuretransp6030111

Submission received: 15 April 2026 / Revised: 9 May 2026 / Accepted: 19 May 2026 / Published: 21 May 2026

(This article belongs to the Special Issue Intelligent Transportation Systems and Traffic Management in Urban Networks)

Download

Browse Figures

Versions Notes

Abstract

Accurate short-term traffic prediction is a key requirement for modern traffic management systems, yet many existing approaches remain focused on offline evaluation and do not address the challenges of continuous real-time deployment. In this work, we present a context-aware spatiotemporal graph convolutional network (STGCN) framework designed for low-latency, scalable traffic forecasting under operational conditions. The proposed approach integrates structural information from the road network, temporal regularities derived from historical data, and a residual correction mechanism trained on systematic prediction errors observed during real-time operation. The framework is designed to remain lightweight, enabling continuous minute-level inference without computational overhead that would hinder long-term deployment. The methodology is evaluated in two real-world case studies of different scale and complexity. In Thessaloniki, Greece, multiple forecasting models are evaluated across different temporal resolutions using one-minute speed data, with the proposed STGCN selected for real-time deployment. A residual correction module trained on historical prediction errors further improves real-time forecasting accuracy compared to the baseline STGCN deployment. Scalability is further demonstrated in the South Holland region of the Netherlands, where the same architecture is applied to a larger network and extended to multi-horizon forecasting. Results show that the proposed framework achieves competitive predictive performance while maintaining low computational cost, and that incorporating residual error learning provides a robust and practical solution for improving forecasting accuracy in real-world deployments. These findings highlight the importance of combining domain-specific modeling with operational considerations in traffic prediction systems.

Keywords:

traffic forecasting; spatiotemporal graph convolutional networks; contextual traffic management; residual learning; real-time prediction; graph neural networks

1. Introduction

Accurate short-term prediction of traffic conditions is a central requirement for modern Intelligent Transportation Systems (ITS) [1]. Forecasts of metrics such as average speed and travel time support congestion mitigation, incident response, dynamic routing, and proactive traffic management. Unlike vehicle-count-based traffic flow prediction, speed forecasting provides more direct insight into network performance and can be immediately utilized by operators and decision-support systems. However, urban traffic dynamics are inherently complex, exhibiting strong temporal variability, abrupt disruptions, and spatial dependencies shaped by road topology and directional connectivity [2,3]. As a result, traffic forecasting models must effectively integrate temporal, spatial, and contextual information while remaining computationally efficient enough for continuous real-time deployment.

The prediction of traffic conditions nowadays relies heavily on historical and real-time data collected from heterogeneous sources, and transportation management has become increasingly data-driven [4]. In recent years, deep learning approaches, including convolutional and recurrent neural network architectures, have significantly improved traffic forecasting performance by capturing complex temporal patterns in data [5]. At the same time, intelligent transportation applications have increasingly explored multimodal and heterogeneous data representations to improve predictive and decision-support capabilities [6].

However, many deep learning approaches still struggle to fully represent the spatial dependencies inherent in road networks, where traffic conditions propagate across connected segments in nonlinear and direction-dependent ways. To address this limitation, graph-based models such as Spatio-Temporal Graph Convolutional Networks have been introduced, enabling the joint modeling of temporal patterns and network topology [7]. Recent advances in graph-based traffic forecasting have demonstrated strong predictive capabilities through increasingly sophisticated architectures and richer contextual representations.

Despite these advances, most existing models remain primarily evaluated in offline settings or under controlled real-time experimental conditions using static datasets. As a result, limited attention has been given to their behavior under continuous real-time operation, where models must handle missing data, irregular measurements, non-stationary traffic patterns, and low-latency inference requirements. In such environments, models with strong offline accuracy may still experience reduced robustness or computational limitations that affect practical deployment. These challenges highlight the need for forecasting frameworks that remain not only accurate, but also computationally efficient, scalable, and operationally robust.

In this context, this study develops a context-aware STGCN-based forecasting framework explicitly designed for real-time traffic operation. The proposed approach integrates structural context through a distance-based directed adjacency matrix derived from network topology, together with a hierarchical preprocessing and imputation scheme designed to support robust operation under missing and irregular measurements. In addition, an error-aware residual correction mechanism is introduced and trained on historical prediction errors observed during deployment. These components are embedded within an operational architecture that supports continuous, low-latency minute-by-minute inference. By combining graph-based modeling with adaptive residual correction, the framework aims to improve both the accuracy and stability of real-time traffic forecasts while maintaining computational efficiency.

The proposed methodology is evaluated in two real-world case studies of different scale and operational requirements. The primary case study focuses on Thessaloniki, Greece, a mid-sized but highly dynamic urban network. Using one-minute average-speed data from 583 predefined paths, multiple models are trained, including established baseline architectures and the proposed STGCN model, at four temporal aggregation levels (1, 5, 15, and 30 min), each predicting the subsequent timestep at its corresponding resolution. This setup enables a systematic comparison of predictive performance and computational cost across models and temporal granularities.

Based on this comparative evaluation, the proposed STGCN model is selected for real-time deployment. For operational use, the 5-min model is adopted, producing updated forecasts every minute for the next 5-min interval, a requirement for capturing rapid traffic changes caused by incidents or sudden speed drops. To further improve real-time performance, a second STGCN model with identical architecture is introduced and trained on residual errors collected during real-time operation of the baseline model. The final prediction is obtained through additive residual correction. Weather variables were also tested but did not improve accuracy and increased computational cost, and were therefore excluded.

Scalability is assessed through a second case study in the South Holland (Keukenhof) region of the Netherlands, a substantially larger network with 1159 directed segments. In this case, only the proposed STGCN model is considered, without residual correction, in order to evaluate its standalone performance under large-scale conditions. Data are aggregated to 15-min intervals, and the model generates multi-horizon forecasts at 15, 30, and 45 min ahead. Although training time increases due to the larger network and multi-horizon configuration, the model maintains stable performance and is deployed in real-time operation. Both case studies are implemented within the SYNCHROMODE Toolbox, a modular platform designed to support multimodal traffic management through data-driven and predictive capabilities [8].

Together, these case studies demonstrate a scalable and operationally robust STGCN-based forecasting framework that integrates contextual preprocessing, graph-based spatial modeling, and lightweight residual learning. Continuous real-world deployment across networks of different scale and complexity highlights the practical readiness of the proposed approach for modern traffic management systems.

The main contributions of this work can be summarized as follows:

We propose a lightweight, scalable, and deployment-oriented STGCN-based framework for real-time traffic forecasting, designed to balance predictive performance, computational efficiency, and low-latency operation under practical deployment constraints.
We develop an operational forecasting pipeline that integrates data preprocessing, imputation, temporal aggregation, and continuous inference, enabling robust real-time operation under missing and irregular traffic measurements.
We introduce a residual correction mechanism that leverages historical prediction errors collected during real-time deployment to improve forecasting accuracy without increasing architectural complexity or inference latency.
We conduct a comprehensive evaluation across multiple temporal resolutions and baseline graph forecasting models, providing a systematic comparison of predictive accuracy and computational cost.
We validate the scalability and transferability of the proposed framework through two real-world case studies, including a large-scale regional deployment under multi-horizon forecasting conditions.

2. Related Work

2.1. Traditional Traffic Forecasting Methods

Early approaches to traffic forecasting relied on statistical and time-series models such as ARIMA [9,10] and Kalman filtering [11], which model temporal dependencies under assumptions of linearity and stationarity. These methods are computationally efficient and interpretable, making them suitable for short-term forecasting under stable traffic conditions. To improve their predictive performance, several extensions have been proposed, including ARIMA with explanatory variables (ARIMAX) [12] and seasonal ARIMA (SARIMA) [13]. Despite these enhancements, such approaches remain limited in their ability to capture the nonlinear, highly dynamic, and spatially correlated nature of traffic systems. As a result, their performance degrades under complex traffic patterns, motivating the development of more flexible data-driven models.

2.2. Deep Learning Approaches for Traffic Forecasting

Deep learning methods have significantly advanced traffic forecasting by enabling the modeling of complex nonlinear relationships in large-scale data. Recurrent neural networks (RNNs), particularly long short-term memory (LSTM) models and their variants, have been widely adopted to capture temporal dependencies in traffic time series [14,15,16]. These models improve prediction accuracy by learning sequential patterns that traditional statistical methods cannot effectively represent. Convolutional neural networks (CNNs) have also been applied to traffic forecasting, primarily to capture local spatial correlations alongside short-term temporal patterns. Hybrid architectures combining CNN and RNN components have improved performance by jointly modeling spatial and temporal dependencies [17,18]. In addition, attention mechanisms have been introduced to enhance the ability of models to focus on the most relevant temporal features [19,20].

Despite these advances, most deep learning approaches rely on grid-based or sequence-based representations and do not explicitly model the underlying graph structure of road networks. As a result, they are limited in their ability to capture complex spatial interactions between interconnected traffic segments, motivating the development of graph-based approaches.

2.3. Graph-Based Traffic Forecasting Models

To overcome the limitations of traditional deep learning approaches in modeling spatial dependencies, graph-based neural networks have been widely adopted for traffic forecasting. Early models such as STGCN [7] and T-GCN [21] integrate graph convolution with temporal modeling to capture both spatial and temporal dependencies in traffic networks. These approaches represent road networks as graphs, where nodes correspond to traffic sensors or road segments and edges encode connectivity, enabling more realistic modeling of traffic propagation.

Subsequent work has focused on improving the representation of spatial interactions and enhancing model flexibility. Graph WaveNet [22] introduces adaptive adjacency matrices to learn hidden spatial dependencies, while AGCRN [23] further improves this idea through node-specific parameter learning. Other approaches, such as the spatial–temporal complex graph convolution network (ST-CGCN) [24], incorporate multiple sources of spatial correlation and external factors to refine prediction performance.

More recent studies have explored dynamic graph learning, attention mechanisms, and increasingly context-aware architectures to better capture complex traffic dependencies. Graph attention-based approaches [25] and transformer-inspired models dynamically weight spatial and temporal relationships, improving the modeling of long-range dependencies and heterogeneous traffic patterns [26]. Dynamic graph-based architectures, such as DGCRN-TSA [27], further extend this idea by combining adaptive graph generation, recurrent graph convolutions, and temporal self-attention mechanisms to model evolving spatial correlations and temporal dynamics.

Recent work has also incorporated richer contextual and multi-graph representations. For example, STMGCN [28] integrates static and dynamic traffic knowledge graphs together with external contextual information, including meteorological data and points of interest, to improve traffic flow prediction in intelligent transportation systems. Similarly, Zhang et al. [29] proposed a context-aware knowledge graph framework combined with graph neural networks and attention mechanisms to incorporate spatial and temporal contextual dependencies into traffic speed forecasting.

However, most of these approaches are still evaluated primarily in offline settings or under controlled real-time experimental conditions, and do not explicitly address the challenges associated with continuous operational deployment. Issues such as data irregularities, non-stationarity, computational efficiency, and the integration of heterogeneous external data streams remain underexplored. While advanced context-aware and multi-graph architectures can improve predictive performance, they often introduce additional operational complexity related to data acquisition, preprocessing pipelines, retraining, and large-scale deployment. These challenges highlight the need for forecasting frameworks that remain not only accurate, but also computationally efficient, scalable, and operationally robust.

3. Materials and Methods

To facilitate clarity and provide a structured overview of the proposed methodology, the complete processing pipeline is illustrated in Figure 1. The framework consists of a sequence of stages, including data preprocessing and imputation, transformation of the time series into model-ready inputs, construction of the underlying graph structure, and spatiotemporal prediction using a graph-based model. In addition, a residual correction module is incorporated to improve predictive accuracy by learning from systematic errors observed during operation.

3.1. Problem Formulation

Let

G = (V, E)

be a directed network of predefined paths, where

| V | = 583

in the Thessaloniki case study. We consider a contiguous, minute-level time grid

T = {t_{0}, t_{0} + 1, t_{0} + 2, \dots, t_{1}} \subset Z

(1)

with sampling interval

Δ t = 1

min, covering the six-month study window. All traffic variables are indexed over

T \times V

and evolve on this common temporal grid.

After temporal alignment and initial validation, traffic measurements are available only at a subset of the minute-level grid. We observe average speed values (in km/h) at selected index pairs

(t, i) \in T \times V

.

We define the observation set as

Ω = \{(t, i) \in T \times V : a valid speed measurement is available at (t, i)\} .

(2)

The observed traffic signal is represented as a partial function

x : Ω \to R, (t, i) \mapsto x_{t, i},

(3)

and the corresponding binary observation indicator is defined as

R_{t, i} = 1 {(t, i) \in Ω} \in {0, 1} .

(4)

Due to data acquisition constraints and quality-control procedures, the observed signal is incomplete over

T \times V

. This motivates the preprocessing and imputation steps described in the following section.

3.2. Data Preprocessing and Imputation

Prior to model development, raw traffic observations are subjected to a preprocessing pipeline designed to ensure temporal consistency, structural validity, and data quality. Traffic forecasting models are inherently data-driven, and deficiencies in the input data can directly propagate into learned representations, leading to biased or unstable predictions. Robust preprocessing is therefore a necessary prerequisite for reliable spatiotemporal modeling under real-world conditions.

Traffic observations originating from heterogeneous sources may be recorded at slightly different timestamps. Let

x_{t^{'}, i}^{raw}

denote a raw observation recorded at continuous time

t^{'} \in R

for path

i \in V

. All records are first aligned to the discrete minute-level grid

T

using

τ (t^{'}) = ⌊\frac{t^{'}}{Δ t}⌋ Δ t, Δ t = 1 \min,

(5)

yielding synchronized observations

x_{τ (t^{'}), i}

. This alignment enforces a common temporal reference across all data streams and prevents artificial temporal distortions.

Following alignment, structural validation and quality control are applied. Duplicate records are removed, and each observation is evaluated using rule-based filters and consistency checks to detect implausible values. This process yields a validated observation set

Ω \subset T \times V

, which is generally incomplete due to both missing data and filtering.

To obtain a fully populated time series, a hierarchical three-stage imputation procedure is applied.

First, a seasonal carryover strategy is used. For each missing entry

(t, i)

, the algorithm searches previous weeks for observations at the same minute-of-week and assigns the closest available value:

{\tilde{x}}_{t, i}^{A} = x_{t - l^{*} W, i},

(6)

where W denotes one week in minutes and

l^{*}

is the nearest valid lag.

Second, if no weekly match is found, a minute-of-day climatology is applied. Missing values are estimated using the average of historical observations at the same minute-of-day over a trailing window:

{\tilde{x}}_{t, i}^{B} = \frac{1}{| S_{t, i} |} \sum_{s \in S_{t, i}} x_{s, i},

(7)

where

S_{t, i}

contains past observations with matching daily position.

Finally, remaining gaps are filled using a continuity-based approach, applying forward and backward nearest-neighbor propagation to ensure temporal completeness.

The final imputed series

{\tilde{x}}_{t, i}

is defined as

{\tilde{x}}_{t, i} = \{\begin{matrix} x_{t, i}, & (t, i) \in Ω \\ {\tilde{x}}_{t, i}^{A}, & if Stage A applies \\ {\tilde{x}}_{t, i}^{B}, & if Stage B applies \\ {\tilde{x}}_{t, i}^{C}, & otherwise \end{matrix}

(8)

yielding a complete and temporally consistent minute-level dataset suitable for downstream modeling. A detailed mathematical formulation of the imputation procedure is provided in Appendix A.

3.3. Preparation of Model Inputs

Following preprocessing and imputation, the complete minute-level series

{\tilde{x}}_{t, i}

is transformed into a format suitable for model training and evaluation. This transformation includes temporal aggregation, chronological dataset partitioning, normalization, and supervised window construction.

First, the minute-level data are aggregated into non-overlapping time bins to reduce noise and align with the operational forecasting horizon. Let

U = {u_{0}, u_{1}, \dots, u_{n - 1}}

denote the set of bin start times with fixed resolution (e.g., 5 min). For each node

i \in V

, the aggregated value is computed as the average within each bin:

{\bar{x}}_{u, i} = \frac{1}{Δ} \sum_{m = 0}^{Δ - 1} {\tilde{x}}_{u + m, i},

(9)

where

Δ

denotes the aggregation interval. Stacking across all nodes yields the network snapshot

{\bar{X}}_{u} \in R^{N}

.

The aggregated time series is then partitioned into training, validation, and test sets using a chronological split to preserve temporal ordering. Specifically, the dataset is divided into 70%, 20%, and 10% segments, respectively:

U = U_{train} \cup U_{val} \cup U_{test},

(10)

ensuring that all subsets are disjoint and ordered in time.

To stabilize training, a global standardization is applied. The mean and standard deviation are computed using only the training data:

{\hat{x}}_{u, i} = \frac{{\bar{x}}_{u, i} - μ_{glob}}{σ_{glob}},

(11)

and the same transformation is applied to validation and test sets to avoid data leakage. Stacking across noded yields the standarized snapshot

{\hat{X}}_{u} \in R^{N}

.

Finally, the standardized time series is converted into supervised learning samples using a rolling window approach. Given a look-back length L and prediction horizon H, each input–target pair is defined as:

X_{u} = {{\hat{X}}_{u - L + 1}, \dots, {\hat{X}}_{u}}, Y_{u + 1 : u + H} = {{\bar{X}}_{u + 1}, \dots, {\bar{X}}_{u + H}} .

(12)

Sliding the window across time yields the final dataset used for model training and evaluation. A detailed mathematical formulation of these steps is provided in Appendix B.

3.4. Graph Construction and Adjacency Matrix

To model spatial dependencies between traffic segments, we construct a directed graph

G = (V, E)

where nodes correspond to predefined paths and edges represent connectivity in the road network.

For each pair of nodes

i, j \in V

, the directed shortest-path distance

d_{i j}

is computed based on edge weights representing physical distance or travel cost. These distances are obtained using Dijkstra’s algorithm and are not necessarily symmetric due to the directed nature of the network.

A similarity matrix W is then derived using a Gaussian kernel:

W_{i j} = \{\begin{matrix} exp (- d_{i j}^{2} / γ), & d_{i j} \leq δ \\ 0, & otherwise \end{matrix}

(13)

where

γ

controls the decay rate and

δ

defines a distance cutoff for sparsification.

To further enforce sparsity, small weights are removed through thresholding:

A_{i j}^{(0)} = W_{i j} \cdot 1 {W_{i j} \geq τ} .

(14)

Finally, self-loops are added and the matrix is column-normalized to obtain the adjacency matrix used in the model:

A_{i j} = \frac{{\tilde{A}}_{i j}}{\sum_{p = 1}^{N} {\tilde{A}}_{p j}},

(15)

where

{\tilde{A}}_{i j} = A_{i j}^{(0)} + γ_{0} 1 {i = j}

. The resulting matrix

A \in R^{N \times N}

is sparse, directed, and captures the spatial structure of the traffic network. A detailed formulation of the adjacency matrix construction is provided in Appendix C.

3.5. Model

Our proposed model is a spatiotemporal graph convolutional network (STGCN) architecture, illustrated in Figure 2, which serves as the core forecasting model within the overall framework. The inputs are the standardized, aggregated time series, arranged in rolling windows together with the directed adjacency matrix A. The same architecture is used both for the primary traffic prediction task and, as described later, for residual error modeling. We formalize each component below.

3.5.1. Input Representation

For each valid time index, the model input consists of a rolling window of standardized traffic observations over the network. Given a look-back horizon L, the input tensor is defined as:

X \in R^{B \times L \times N \times F}

(16)

where B is the batch size, L the temporal window length, N the number of nodes in the network, and F the number of features per node (here

F = 1

corresponding to average speed). Each sample therefore captures the recent temporal evolution of traffic conditions across all network nodes.

In addition to the temporal input, the model incorporates the directed adjacency matrix

A \in R^{N \times N}

, which encodes the spatial structure of the road network and governs information propagation between connected nodes. The detailed mathematical formulation of the input tensors and graph representation is provided in Appendix D.

3.5.2. Temporal Processing

The temporal component of the model transforms the input sequence into a compact representation for each node by capturing sequential dependencies and filtering temporal noise. The process consists of four stages: linear encoding with node embeddings, recurrent temporal modeling, smoothing, and attention-based aggregation.

First, the standardized input tensor is projected into a higher-dimensional latent space through a linear encoder, while learnable node embeddings are added to incorporate node-specific information. This results in an initial hidden representation

H^{(0)}

.

Next, a gated recurrent unit (GRU) is applied independently to each node along the temporal dimension, producing a sequence of hidden states that capture temporal dependencies in the input data.

To improve robustness, a fixed Gaussian smoothing filter is applied along the temporal axis, reducing high-frequency noise while preserving the overall structure of the signal.

Finally, a temporal attention mechanism is used to aggregate the sequence into a single representation per node. The attention weights are learned to emphasize the most relevant time steps, resulting in a compact node-level representation:

H^{(3)} \in R^{B \times N \times C}

(17)

The detailed mathematical formulation of each component is provided in Appendix D.

3.5.3. Spatial Processing

The spatial component models interactions between nodes in the traffic network by propagating and aggregating information across the graph structure. This is achieved through a combination of diffusion convolution and graph attention mechanisms.

First, a diffusion convolution is applied to the node representations

H^{(3)}

, using the directed adjacency matrix to aggregate information from incoming neighbors over multiple hops. This operation captures spatial dependencies by allowing each node to incorporate information from progressively distant parts of the network.

The resulting representation is then refined using a graph attention layer, which assigns adaptive weights to neighboring nodes based on their relevance. This enables the model to focus on the most informative spatial interactions, rather than relying solely on fixed graph connectivity.

The output of the spatial processing block is:

H^{(5)} \in R^{B \times N \times C}

(18)

which represents the final node-level embedding after spatiotemporal feature extraction. The detailed mathematical formulation is provided in Appendix D.

3.5.4. Residual Connection and Decoding

To preserve node-specific information, a residual connection is introduced using the learnable node embeddings. This residual pathway provides a stable, time-invariant representation that complements the spatiotemporal features extracted by the network.

The enriched representation is then passed through a linear decoder that maps each node’s hidden state to future predictions. The decoder outputs multi-step forecasts for each node, producing:

\hat{Y} \in R^{B \times H \times N \times F}

(19)

which represents the predicted traffic conditions over the forecasting horizon. The complete mathematical formulation is provided in Appendix D.

3.6. Residual Correction Framework

While the proposed STGCN model provides accurate baseline forecasts, real-world operation reveals the presence of systematic prediction errors that persist over time. To address this limitation, a residual correction strategy is introduced.

The overall framework consists of a baseline STGCN followed by a residual correction module, as illustrated in Figure 3. The process operates in two stages. First, a baseline STGCN model is trained on historical traffic data to produce initial predictions. In an operational setting, these predictions can be compared with observed values, enabling the computation of residual errors over time.

Specifically, let

{\hat{Y}}^{(1)}

denote the baseline prediction and Y the ground truth. The residual is defined as:

R = Y - {\hat{Y}}^{(1)}

(20)

A second STGCN model with identical architecture is then trained on this residual signal to learn systematic prediction errors. Given an input sequence, this model produces a correction term

\hat{R}

, and the final prediction is obtained as:

\hat{Y} = {\hat{Y}}^{(1)} + \hat{R}

(21)

During inference, the two models operate sequentially: the baseline model produces an initial forecast, and the residual model provides an additive correction. This approach enables the second model to capture persistent biases and patterns not learned by the baseline model, improving overall prediction accuracy without modifying the underlying architecture.

The residual correction framework is applied in the Thessaloniki case study. In contrast, the Keukenhof case study serves as a baseline deployment scenario without residual correction, allowing the evaluation of the core STGCN model under larger-scale conditions.

4. Results

4.1. Case Study: Thessaloniki

The first case study focuses on the urban road network of Thessaloniki, Greece, a mid-sized metropolitan area characterized by dense traffic flows, mixed arterial and local roads, and pronounced temporal variability driven by daily commuting patterns and recurrent congestion. The network is represented by 583 predefined directional paths, selected to capture key corridors and traffic streams relevant for operational traffic management. For each path, historical average-speed measurements are obtained from floating car data, providing continuous coverage of real-world traffic conditions. Figure 4 illustrates the spatial layout of the modeled traffic network and the corresponding directional paths.

Traffic observations are originally available at a one-minute resolution and are processed through a preprocessing pipeline that ensures temporal consistency and robustness to missing or irregular measurements (Section 3.2). For offline evaluation, the data are additionally aggregated to coarser temporal resolutions in order to assess the impact of temporal granularity on forecasting accuracy and computational cost. Multiple models are evaluated, including baseline architectures and the proposed STGCN model, across all temporal resolutions.

Across all configurations, the forecasting task is formulated as a one-step-ahead prediction at the corresponding temporal resolution, targeting short-term average speed for each path.

While multiple temporal resolutions are examined offline, the operational system in Thessaloniki is based on a 5-min aggregation. This choice provides a practical balance between responsiveness to rapid traffic changes and robustness to measurement noise, while remaining compatible with real-time forecasting constraints. New observations are ingested every minute, allowing forecasts to be updated continuously and enabling timely adaptation to abrupt disruptions such as incidents or sudden speed drops. This deployment setting underpins both the comparative evaluation and the real-time experiments presented in the following sections.

4.1.1. Offline Evaluation Across Temporal Aggregation Levels

To quantify the impact of temporal granularity on forecasting accuracy and computational efficiency, multiple models were trained on average-speed data aggregated at 1-, 5-, 15-, and 30-min intervals. The evaluated models include several baseline architectures as well as the proposed STGCN model. In all cases, a fixed input window of 30 timesteps was used, and the models were trained to predict the subsequent timestep at the corresponding aggregation level.

Prediction performance was assessed using mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean square error (RMSE), while training time was recorded as an indicator of computational cost. The results are summarized in Table 1.

Finer temporal resolutions generally lead to improved predictive accuracy but incur significantly higher computational cost due to increased data volume and temporal variability. Conversely, coarser aggregations reduce training time at the expense of reduced responsiveness to short-term traffic dynamics. Among the evaluated models, the proposed STGCN demonstrates consistent performance across all resolutions, achieving a favorable balance between accuracy and efficiency. Based on this trade-off, the 5-min aggregation is selected as the operational configuration for real-time deployment in Thessaloniki.

All training experiments were conducted on a fixed cloud computing environment using an AWS g4dn.8xlarge instance equipped with 32 vCPUs, 128 GB of system memory, and a single NVIDIA T4 Tensor Core GPU with 16 GB of video memory. The reported training times therefore reflect relative computational cost under consistent hardware conditions.

4.1.2. Real-Time Deployment and Residual Correction

Based on the offline evaluation, the 5-min STGCN configuration was selected for real-time deployment in Thessaloniki. The operational system continuously ingests one-minute traffic observations and updates forecasts in a rolling manner, producing predictions for the next 5-min aggregated timestep at a high temporal frequency.

In contrast to the offline setting, where predictions are generated on fixed aggregated sequences, the deployed system operates with a sliding input window that is updated every minute as new observations arrive. This enables rapid adaptation to evolving traffic conditions while preserving the original forecasting horizon.

During deployment, the baseline STGCN model was operated continuously for approximately four months, during which both predictions and corresponding observed values were recorded. This enabled the construction of a residual time series capturing systematic deviations between forecasts and real-world measurements.

To address these persistent errors, a residual correction model was trained using the same STGCN architecture, with residuals as the prediction target. The final forecast is obtained by combining the baseline prediction with the estimated residual, as described in Section 3.6.

Table 2 summarizes the performance of the deployed system. While offline evaluation yields a MAPE of 5.42%, real-time operation introduces a modest degradation to 5.83%, reflecting the challenges of live data conditions. The residual correction framework improves performance, reducing the error to 1.45%, highlighting the effectiveness of leveraging historical operational discrepancies.

The substantial improvement observed with residual correction can be attributed to the different nature of the residual signal compared to the original prediction task. While the baseline model is required to learn the full spatiotemporal dynamics of traffic flow, the residual model operates on a smoother and lower-variance signal, capturing systematic and temporally consistent deviations between predictions and observations. In practice, these residual patterns reflect recurring biases associated with specific locations, traffic regimes, or time-of-day effects that are difficult to fully capture in a single-stage model. By focusing on these structured errors, the residual model is able to learn correction patterns more efficiently, leading to improved overall accuracy when combined with the baseline forecasts.

While Table 2 provides an aggregate view of model performance, such summary metrics do not capture how residual correction influences predictions at the level of individual traffic streams. In practice, both forecasting errors and correction effects vary across different road segments and traffic conditions. To provide a more detailed view of model behavior under real-world operation, Figure 5 and Figure 6 present representative time-series examples comparing observed speeds, baseline forecasts, and residual-corrected predictions across different traffic regimes.

Overall, the time-series examples illustrate how residual correction systematically refines baseline forecasts across heterogeneous traffic regimes, improving alignment with observed dynamics both during the daytime period and under peak traffic conditions.

4.2. Case Study: South Holland (Keukenhof Region)

To evaluate the scalability and transferability of the proposed STGCN framework, a complementary case study was conducted in the South Holland (Keukenhof) region of the Netherlands. In contrast to the Thessaloniki case study, which serves as the primary methodological validation and benchmarking setting, this experiment is designed as a supplementary deployment scenario to assess out-of-the-box applicability under a larger-scale and structurally different traffic network. The regional network comprises 1159 directed road segments with heterogeneous traffic dynamics, representing a significantly larger and more complex environment than the urban network of Thessaloniki (Figure 7). No residual correction module is applied in this setting, allowing the evaluation to focus on the standalone performance and scalability of the proposed STGCN model.

Traffic observations are available at one-minute resolution through an operational Application Programming Interface (API). As in the Thessaloniki case study, incoming measurements are preprocessed using the same imputation and normalization strategy described in Section 3.2. Data are subsequently aggregated to 15-min intervals to align with regional traffic management requirements.

Unlike the Thessaloniki deployment, the prediction target in this case study is average travel time rather than speed, reflecting corridor-level monitoring objectives. The forecasting task is formulated as a multi-horizon problem, where a single STGCN model simultaneously predicts average travel time 15, 30, and 45 min ahead. These forecast horizons were selected in collaboration with regional traffic management authorities to support operational decision-making during the Keukenhof exhibition period.

The South Holland model was trained exclusively on data collected between January 2025 and early March 2025, with the objective of providing short-term operational support during the 2025 Keukenhof flower exhibition period rather than long-term traffic modeling. Consistent with the goal of assessing scalability and transferability, no architecture modifications, additional tuning, or residual correction mechanisms were introduced beyond those described for the Thessaloniki case study. The same STGCN configuration was applied directly to the larger regional network, enabling evaluation under large-scale and multi-horizon conditions without additional model enhancement.

Table 3 summarizes training accuracy and horizon-specific real-time prediction performance. Real-time metrics per forecast horizon were computed using controlled inference experiments with preserved horizon metadata. In the operational deployment, forecasts are generated in a rolling multi-horizon configuration, where horizon-specific metadata are not retained in the production system. As expected, prediction error increases with forecast horizon, reflecting increasing uncertainty at longer lead times. Nevertheless, the model maintains stable and bounded performance across all horizons, demonstrating robust generalization despite differences in network scale, traffic composition, and operational objectives. The reported MAE and RMSE values are expressed in seconds and correspond to relatively short travel-time segments, naturally resulting in small absolute error magnitudes despite meaningful relative differences captured by MAPE.

The real-time forecasting service follows the same architectural principles as the Thessaloniki deployment, with minor adaptations to data ingestion. Instead of consuming streamed measurements, the system retrieves the latest observations via the API, constructs the required historical input window, and performs inference asynchronously. Forecasts are generated every five minutes and stored in a database for downstream operational use.

Due to the rolling multi-horizon inference strategy, multiple predictions may correspond to overlapping target timestamps. Since horizon-specific metadata are not retained in the production system, overlapping forecasts are aggregated using a simple arithmetic mean to produce a single operational estimate per timestamp. While this aggregation introduces a mild smoothing effect and may slightly reduce point-wise accuracy, it reflects the actual configuration of the deployed real-time service.

Consistent with the objective of evaluating scalability under minimal intervention, no residual correction module was applied in this deployment. The results therefore represent a baseline validation of the proposed STGCN framework under large-scale, multi-horizon, and minimal-tuning conditions.

To provide a qualitative illustration of model behavior under this transfer setting, Figure 8 presents real-time travel time forecasts for a representative road segment located near the Keukenhof exhibition area during 5th May 2025. The selected segment exhibits moderate temporal variability and is representative of corridor-level traffic conditions during the exhibition period.

Despite the absence of residual correction and the use of aggregated multi-horizon forecasts, the model effectively captures the overall temporal evolution of travel times, including the morning build-up, midday peak, and afternoon dissipation phases. Short-term fluctuations are partially smoothed due to the operational aggregation strategy; however, the underlying structural dynamics of regional demand are consistently preserved.

These results indicate that the proposed STGCN framework maintains stable predictive behavior when transferred to a substantially larger network and deployed without additional tuning, demonstrating practical generalization capability beyond the primary urban case study.

5. Discussion

The results from the Thessaloniki and South Holland case studies provide insight into both the predictive performance and the operational characteristics of the proposed STGCN-based forecasting framework under realistic deployment conditions.

In the Thessaloniki case study, the offline evaluation shows that the proposed STGCN achieves the best overall balance between prediction accuracy and computational efficiency across the examined temporal aggregation levels. Several baseline models, including graph-based alternatives, provide competitive performance in specific configurations; however, they either exhibit higher prediction errors or substantially greater training cost. The proposed STGCN therefore provides a strong balance between model complexity, predictive accuracy, and deployment feasibility. As aggregation intervals increase, prediction error rises for all models, reflecting the loss of fine-grained temporal information and the reduced responsiveness of coarser representations to short-term traffic variations.

Beyond predictive accuracy, computational efficiency is a critical factor for real-time deployment. Some baseline models require higher computational resources, making continuous minute-level forecasting less practical under operational constraints. The proposed STGCN, by contrast, maintains low-latency inference and stable resource requirements, making it well suited for sustained real-time operation. This highlights the importance of balancing predictive performance with deployment feasibility when designing traffic forecasting systems.

The transition from offline evaluation to real-time operation reveals additional challenges. Even with strong offline performance, the baseline STGCN exhibits increased error under live conditions, reflecting unmodeled effects such as abrupt incidents, behavioral variability, and other stochastic influences inherent to traffic systems. These sources of uncertainty cannot be fully captured by deterministic forecasting models. The residual correction module effectively addresses part of this gap by learning systematic deployment-time errors from historical predictions, resulting in a substantial improvement in real-time accuracy without increasing inference latency or architectural complexity. This demonstrates the practical value of incorporating operational feedback into the forecasting pipeline.

The South Holland (Keukenhof) case study serves as a complementary transfer experiment, focusing on scalability and portability under a substantially larger regional network and an event-driven operational context. In contrast to Thessaloniki, this experiment is not intended as a benchmarking setting, but rather as a validation of out-of-the-box applicability. Despite differences in spatial scale, traffic composition, prediction target (travel time), and the use of multi-horizon forecasting, the same STGCN architecture was applied without structural modification, additional tuning, or residual correction. Horizon-specific evaluation under controlled conditions indicates stable performance degradation as forecast lead time increases, reflecting the expected growth in uncertainty further from the input window. In operational deployment, overlapping multi-horizon forecasts are aggregated into a single estimate, introducing mild smoothing effects while preserving overall temporal structure. Despite the absence of additional refinement mechanisms and the relatively short training period, the model maintains stable predictive behavior, demonstrating robust generalization across heterogeneous traffic environments.

Several limitations merit discussion. First, while the framework can incorporate exogenous variables, explicit weather features were not included, as their effects are largely reflected in the observed speed or travel time series and did not yield measurable performance gains in preliminary evaluations. Second, residual correction was evaluated only in the Thessaloniki deployment; its extension to large-scale and multi-horizon settings remains an avenue for future work. Third, in the South Holland deployment, horizon-specific predictions were aggregated operationally and per-horizon metadata were not retained, limiting post hoc analysis of forecast lead times in the production environment. Fourth, although the framework demonstrates strong short-term predictive capability, long-term adaptation strategies, such as periodic retraining under evolving traffic patterns or seasonal effects, were beyond the scope of this study.

Overall, the findings demonstrate that the primary contribution of the proposed framework lies not only in predictive accuracy, but in its ability to operate reliably under real-world constraints. The Thessaloniki deployment validates the effectiveness of graph-aware modeling combined with residual correction in an urban setting, while the South Holland case study confirms scalability and structural portability to a larger regional network under event-driven conditions. By integrating contextual preprocessing, directed graph modeling, and deployment-aware refinement within a unified pipeline, the proposed framework addresses key practical challenges faced by real-time traffic management systems and provides a scalable foundation for future intelligent transportation applications.

6. Conclusions

This study presented a scalable, context-aware STGCN framework for real-time traffic forecasting, designed to address the practical challenges of operational deployment in both urban and regional road networks. The proposed approach integrates structured data preprocessing, directed graph-based spatial modeling, and a lightweight residual correction mechanism to support accurate and stable short-term predictions under non-stationary traffic conditions.

Through two real-world case studies, the framework demonstrated both methodological effectiveness and deployment robustness. In the Thessaloniki case study, the proposed STGCN achieved strong performance across multiple temporal aggregation levels and offered a favorable balance between prediction accuracy and computational efficiency compared to the evaluated baseline models. Furthermore, the residual correction strategy led to substantial improvements in real-time performance, highlighting the value of leveraging historical deployment errors for refinement under live operating conditions.

In the South Holland (Keukenhof) case study, the same STGCN architecture was applied without structural modification to a substantially larger network under a multi-horizon forecasting configuration. Despite the absence of residual correction and the operational aggregation of overlapping forecasts, the model maintained stable predictive behavior and successfully captured the overall temporal evolution of regional traffic demand. These results demonstrate the scalability and transferability of the framework under heterogeneous spatial, temporal, and operational conditions.

Overall, the findings indicate that graph-based spatiotemporal modeling, when combined with deployment-aware design choices, provides a reliable and computationally efficient solution for real-time traffic forecasting. The residual correction mechanism further offers a practical pathway to enhance deployment-time accuracy without increasing architectural complexity.

Future work will explore the integration of additional exogenous variables, adaptive retraining strategies under evolving traffic regimes, and the extension of residual correction to large-scale and multi-horizon settings. The proposed framework provides a scalable and deployment-ready foundation for real-time traffic management systems where reliability, efficiency, and robustness are essential.

Author Contributions

Conceptualization, P.K. and D.T.; methodology, P.K. and V.P.; software, P.K.; validation, P.K., V.P. and D.T.; formal analysis, P.K. and V.P.; investigation, P.K.; resources, P.K., V.P. and D.T.; data curation, V.P.; writing—original draft preparation, P.K. and V.P.; writing—review and editing, P.K., V.P. and D.T.; visualization, P.K.; supervision, E.M. (Evangelos Mintsis) and E.M. (Evangelos Mitsakis); project administration, E.M. (Evangelos Mitsakis); funding acquisition, E.M. (Evangelos Mitsakis). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the European Union, Horizon Europe research and innovation programme, under grant agreement No 101104171 (SYNCHROMODE).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets presented in this article are not readily available due to licensing, privacy, and contractual restrictions. Requests to access the datasets should be directed to the corresponding author.

Code Availability Statement

The forecasting pipeline was implemented in Python using PyTorch and deployed within the SYNCHROMODE real-time forecasting framework. The source code is subject to institutional and project-related intellectual property restrictions and therefore cannot be publicly released. However, all architectural details, preprocessing steps, model configurations, and evaluation procedures are fully described in this manuscript to ensure methodological transparency and reproducibility. Further technical details may be provided by the corresponding author upon reasonable request, subject to applicable project agreements.

Acknowledgments

Generative artificial intelligence tools were used for language refinement and editorial assistance. No generative AI tools were used for data generation, model development, analysis, or interpretation of results.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AGCRN	Adaptive Graph Convolutional Recurrent Network
ARIMA	AutoRegressive Integrated Moving Average
ARIMAX	AutoRegressive Integrated Moving Average with Exogenous Variables
CNN	Convolutional Neural Network
DGCRN-TSA	Dynamic Graph Convolutional Recurrent Network with Temporal Self-Attention
FC-GNN	Fully Connected Graph Neural Network
GRUGCN	Gated Recurrent Unit Graph Convolutional Network
ITS	Intelligent Transportation Systems
LSTM	Long Short-Term Memory
MAE	Mean Absolute Error
MAPE	Mean Absolute Percentage Error
RMSE	Root Mean Square Error
RNN	Recurrent Neural Network
SARIMA	Seasonal AutoRegressive Integrated Moving Average
ST-CGCN	Spatio-Temporal Contextual Graph Convolutional Network
STGCN	Spatio-Temporal Graph Convolutional Network
STMGCN	Spatio-Temporal Multi-Graph Convolutional Network
T-GCN	Temporal Graph Convolutional Network

Appendix A. Detailed Imputation Formulation

This appendix provides the full mathematical formulation of the preprocessing and hierarchical imputation procedure described in Section 3.2.

Appendix A.1. Preprocessing and Validation

The quality-control process is summarized by a binary validity indicator:

Q_{t, i} \in {0, 1}

(A1)

where

Q_{t, i} = 1

denotes a validated observation and

Q_{t, i} = 0

indicates that the observation is discarded.

The resulting validated observation set is defined as:

Ω = {(t, i) \in T \times V : Q_{t, i} = 1}

(A2)

which is generally a strict subset of the temporally aligned data domain. As a consequence, the quality-control procedure may introduce additional missing values beyond those originally present in the raw data.

Appendix A.2. Stage A: Week-Back Seasonal Carryover

Let W denote one week in minutes. For each

(t, i) \in (T \times V) ∖ Ω

, we define the set of available weekly lags:

K (t, i) = \{l \in {1, 2, 3} : (t - l W, i) \in Ω\} .

(A3)

If

K (t, i) \neq \emptyset

, the closest valid lag is selected:

l^{*} (t, i) = min K (t, i),

(A4)

and the imputed value is given by:

{\tilde{x}}_{t, i}^{A} = x_{t - l^{*} (t, i) W, i} .

(A5)

The domain of successful Stage-A imputation is:

Ω_{A} = \{(t, i) \in T \times V : K (t, i) \neq \emptyset\} .

(A6)

Appendix A.3. Stage B: Minute-of-Day Climatology

If

(t, i) \notin Ω \cup Ω_{A}

, a minute-of-day climatology is used. Let D denote the number of minutes per day and define:

ϕ_{D} (t) = t mod D .

(A7)

We define the historical sample set:

S_{t, i}^{(M)} = \{s \in ([t - M D, t - 1] \cap T) : ϕ_{D} (s) = ϕ_{D} (t), (s, i) \in Ω\},

(A8)

where M is the trailing window size (here

M = 90

days). If

S_{t, i}^{(M)} \neq \emptyset

, the imputed value is:

{\tilde{x}}_{t, i}^{B} = \frac{1}{| S_{t, i}^{(M)} |} \sum_{s \in S_{t, i}^{(M)}} x_{s, i} .

(A9)

The corresponding domain is:

Ω_{B} = \{(t, i) \in T \times V : S_{t, i}^{(M)} \neq \emptyset\} .

(A10)

Appendix A.4. Stage C: Continuity Completion

Let the intermiate value after Stages A and B be:

z_{t, i} = \{\begin{matrix} x_{t, i}, & (t, i) \in Ω \\ {\tilde{x}}_{t, i}^{A}, & (t, i) \in Ω_{A} ∖ Ω \\ {\tilde{x}}_{t, i}^{B}, & (t, i) \in Ω_{B} ∖ (Ω \cup Ω_{A}) \end{matrix} .

(A11)

and define the resolved-index set:

S_{i} = \{s \in T : (s, i) \in Ω \cup Ω_{A} \cup Ω_{B}\} .

(A12)

For unresolved indices:

\begin{matrix} N^{-} (t, i) & = max {s \in S_{i} : s \leq t}, \\ N^{+} (t, i) & = min {s \in S_{i} : s \geq t} . \end{matrix}

(A13)

The final completion step is:

{\tilde{x}}_{t, i}^{C} = \{\begin{matrix} z_{N^{-} (t, i), i}, & S_{i} \cap (- \infty, t] \neq \emptyset \\ z_{N^{+} (t, i), i}, & S_{i} \cap (- \infty, t] = \emptyset and S_{i} \cap [t, \infty) \neq \emptyset \end{matrix}

(A14)

The complete imputed signal is defined as:

{\tilde{x}}_{t, i} = \{\begin{matrix} z_{t, i}, & (t, i) \in Ω \cup Ω_{A} \cup Ω_{B} \\ {\tilde{x}}_{t, i}^{C}, & otherwise \end{matrix} .

(A15)

This procedure yields a fully populated time series

{\tilde{x}}_{t, i}

over

T \times V

.

Appendix B. Dataset Construction Details

This appendix provides the detailed mathematical formulation of the data aggregation, dataset partitioning, standardization, and window construction procedures described in Section 3.3.

Appendix B.1. Temporal Aggregation

Let

{\tilde{x}}_{t, i}

denote the imputed minute-level signal for all

(t, i) \in T \times V

, with

T = {t_{0}, t_{0} + 1, \dots, t_{1}}

.

We define the set of aggregation bins:

\begin{matrix} U = {u_{0}, u_{1}, \dots, u_{n - 1}} \\ u_{k} = t_{0} + Δ k, k = 0, 1, \dots, n - 1 \\ n = ⌊\frac{t_{1} - t_{0} + 1}{Δ}⌋ \end{matrix}

(A16)

where

Δ

denotes the aggregation interval (e.g.,

Δ = 5

min). Bins are non-overlapping and anchored at

t_{0}

. Any trailing partial bin is discarded. Each bin

u \in U

corresponds to:

B (u) = {u, u + 1, \dots, u + Δ - 1} .

(A17)

The aggregated value for node

i \in V

is:

{\bar{x}}_{u, i} = \frac{1}{Δ} \sum_{m = 0}^{Δ - 1} {\tilde{x}}_{u + m, i} .

(A18)

The corresponding network snapshot is:

{\bar{X}}_{u} = {[{\bar{x}}_{u, 1}, \dots, {\bar{x}}_{u, N}]}^{⊤} \in R^{N} .

(A19)

Appendix B.2. Chronological Dataset Partitioning

Let

n = | U |

. The dataset is partitioned into training, validation, and test subsets using a chronological split:

n_{tr} = ⌊ 0.7 n ⌋, n_{val} = ⌊ 0.2 n ⌋, n_{te} = n - n_{tr} - n_{val} .

(A20)

The corresponding index sets are:

\begin{matrix} U_{train} & = {u_{0}, \dots, u_{n_{tr} - 1}}, \\ U_{val} & = {u_{n_{tr}}, \dots, u_{n_{tr} + n_{val} - 1}}, \\ U_{test} & = {u_{n_{tr} + n_{val}}, \dots, u_{n - 1}} . \end{matrix}

(A21)

By construction, the split is strictly chronological:

U_{train} \cup U_{val} \cup U_{test} = U .

(A22)

Appendix B.3. Global Standardization

A global standardization is applied using statistics computed from the training set:

\begin{matrix} μ_{glob} = \frac{1}{| U_{train} | N} \sum_{u \in U_{train}} \sum_{i \in V} {\bar{x}}_{u, i}, \\ σ_{glob} = \sqrt{\frac{1}{| U_{train} | N} \sum_{u \in U_{train}} \sum_{i \in V} {({\bar{x}}_{u, i} - μ_{glob})}^{2} + ε}, \end{matrix}

(A23)

where

ε

is a small constant for numerical stability. The standardized values are:

{\hat{x}}_{u, i} = \frac{{\bar{x}}_{u, i} - μ_{glob}}{σ_{glob}} .

(A24)

Stacking across nodes yields the standardized snapshot:

{\hat{X}}_{u} = {[{\hat{x}}_{u, 1}, \dots, {\hat{x}}_{u, N}]}^{⊤} \in R^{N}

(A25)

In vector form, with element-wise operations:

{\hat{X}}_{u} = \frac{{\bar{X}}_{u} - μ_{glob}}{σ_{glob}}

(A26)

The same

(μ_{glob}, σ_{glob})

are used in production to avoid data leakage. For reporting in original units (km/h), we invert the transformation:

{\bar{X}}_{u} = σ_{glob} {\hat{X}}_{u} + μ_{glob}

(A27)

The same normalization parameters are applied across all dataset splits to prevent data leakage.

Appendix B.4. Sliding Window Construction

Let

{\bar{X}}_{u} \in R^{N \times F}

we denote the aggregated snapshot at bin

u \in U

and let

{\hat{X}}_{u}

be its standardized version. In our case

F = 1

(average speed). Given a look-back length

L \in N

and prediction horizon

H \in N

, we can form for each valid center u the history tensor:

X_{u} = {{\hat{X}}_{u - L + 1}, \dots, {\hat{X}}_{u}} \in R^{L \times N \times F},

(A28)

and the corresponding target:

Y_{u + 1 : u + H} = {{\bar{X}}_{u + 1}, \dots, {\bar{X}}_{u + H}} \in R^{H \times N \times F} .

(A29)

For each dataset split, valid window centers are defined as:

C_{split} = \{u \in U_{split} : {u - L + 1, \dots, u + H} \subseteq U_{split}\} .

(A30)

We build the supervised dataset with a rolling window (stride

s = 1

):

D_{split} = {(X_{u}, Y_{u + 1 : u + H}) : u \in C_{split}} .

(A31)

Appendix C. Adjacency Matrix Construction Details

This appendix provides the full mathematical formulation of the adjacency matrix construction described in Section 3.4.

Appendix C.1. Directed Shortest-Path Distances

Let

G = (V, E)

denote the directed road network, where edges

e \in E

are associated with positive weights

w : E \to R_{> 0}

representing physical distance or travel cost. For each pair of nodes

i, j \in V

, the directed shortest-path distance is defined as:

d_{i j} = min_{π \in Π (i \to j)} \sum_{e \in π} w (e), d_{i i} = 0,

(A32)

where

Π (i \to j)

denotes the set of all directed paths from i to j. Distances are computed using Dijkstra’s algorithm and are generally asymmetric, reflecting the directed nature of the network.

Appendix C.2. Distance Kernel and Sparsification

Given the distance matrix

D = [d_{i j}]

, a similarity matrix

W = [W_{i j}]

is constructed using a Gaussian kernel with compact support:

W_{i j} = \{\begin{matrix} exp (- d_{i j}^{2} / γ), & d_{i j} \leq δ \\ 0, & otherwise \end{matrix}, γ > 0, δ > 0 .

(A33)

To further enforce sparsity and remove weak connections, thresholding is applied:

A_{i j}^{(0)} = W_{i j} 1 {W_{i j} \geq τ}, τ > 0 .

(A34)

Appendix C.3. Self-Loops and Column Normalization

Self-loops are incorporated to preserve node-specific information:

{\tilde{A}}_{i j} = A_{i j}^{(0)} + γ_{0} 1 {i = j}, γ_{0} > 0 .

(A35)

The adjacency matrix is then normalized column-wise to produce a diffusion-style operator:

A_{i j} = \frac{{\tilde{A}}_{i j}}{max (\sum_{p = 1}^{N} {\tilde{A}}_{p j}, δ_{0})}, δ_{0} \in (0, 10^{- 6}] .

(A36)

The resulting adjacency matrix

A \in R_{\geq 0}^{N \times N}

is sparse and directed, preserving the asymmetry of the original road network. The column-normalized structure allows it to be interpreted as a diffusion operator, where each column represents the distribution of incoming influence to a node. For completeness, we define the set of incoming neighbors for node j as:

N_{in} (j) = {i \in V : A_{i j} > 0} .

(A37)

Appendix D. Detailed Model Formulation

This appendix provides the full mathematical formulation of the model developed as described in Section 3.5.

Appendix D.1. Input Tensor Construction

For a valid center

u \in C_{split}

, the single-sample input window is (each sample may have its own center

u^{(b)} \in C_{split}

):

X_{u} = {{\hat{X}}_{u - L + 1}, \dots, {\hat{X}}_{u}} \in R^{L \times N \times F}

(A38)

Mini-batches are constructed by stacking samples:

X \in R^{B \times L \times N \times F}

(A39)

With indices

b = 1, \dots, B

(batch),

t = 1, \dots, L

(look-back),

i = 1, \dots, N

(node),

f = 1, \dots, F

(feature), elements are:

X_{b, t, i, f} = {\hat{x}}_{u^{(b)} - L + t, i, f}

(A40)

i.e., the standardized value at node i, feature f, and the

(u^{(b)} - L + t)

-th bin for sample b.

Appendix D.2. Graph Representation

The model operates on a directed graph represented by a column-normalized adjacency matrix

A \in R_{\geq 0}^{N \times N}

with self-loops. For node i, the incoming and outgoing neighborhoods are defined as:

N_{in} (i) = {p : A_{p i} > 0}, N_{out} (i) = {q : A_{i q} > 0}

(A41)

The weighted in-degree is:

d_{in} (i) = \sum_{p = 1}^{N} A_{p i}

(A42)

By construction,

\sum_{p} A_{p i} \approx 1

for all i. For convenience in diffusion operations, we also define the column-stochastic transition matrix:

P = A^{⊤}

(A43)

so that incoming aggregation at node i corresponds to multiplying by P on the left. Self-loops ensure

N_{in} (i) \neq \emptyset

.

Appendix D.2.1. Linear Encoder and Node Embeddings

Let C denote the hidden dimension. Given the standardized input tensor

X \in R^{B \times L \times N \times F}

, we first project the feature dimension to the latent space using a linear transformation with parameters:

W_{enc} \in R^{F \times C}, b_{enc} \in R^{C}

(A44)

For each

(b, t, i)

:

Z_{b, t, i, :} = X_{b, t, i, :} W_{enc} + b_{enc} \in R^{C}

(A45)

Each node i is assigned a learnable, time-invariant identity vector

E_{i, :} \in R^{C}

, collected as:

E \in R^{N \times C}, i = 1, \dots, N

(A46)

The embedding is injected by broadcasting over

(b, t)

:

H_{b, t, i, :}^{(0)} = Z_{b, t, i, :} + E_{i, :}

(A47)

Thus:

H^{(0)} \in R^{B \times L \times N \times C}

(A48)

Appendix D.2.2. Recurrent Temporal Modeling (GRU)

Temporal dependencies are modeled independently per node using a gated recurrent unit (GRU). Let

Θ_{gru} = {W_{\cdot}, U_{\cdot}, b_{\cdot}}

denote the GRU parameters. For each node i and batch b, we define the input sequence:

x_{b, t}^{(i)} : = H_{b, t, i, :}^{(0)}, t = 1, \dots, L

(A49)

The hidden state evolves as:

h_{b, t}^{(i)} = GRUCell (x_{b, t}^{(i)}, h_{b, t - 1}^{(i)}; Θ_{gru}), t = 1, \dots, L

(A50)

with initial condition:

h_{b, 0}^{(i)} = 0 \in R^{C}

(A51)

Stacking hidden states over time yields:

H_{b, t, i, :}^{(1)} = h_{b, t}^{(i)}

(A52)

Thus:

H^{(1)} \in R^{B \times L \times N \times C}

(A53)

Appendix D.2.3. Gaussian Temporal Smoothing

To reduce high-frequency noise, a fixed Gaussian filter is applied along the temporal axis. Let the kernel size be

K = 2 M + 1

and bandwidth

σ > 0

. The normalized coefficients are:

g_{m} = \frac{exp (- \frac{m^{2}}{2 σ^{2}})}{\sum_{r = - M}^{M} exp (- \frac{r^{2}}{2 σ^{2}})}, m = - M, \dots, M

(A54)

so that

\sum_{m = - M}^{M} g_{m} = 1

. For each batch b, node i, channel c, and time

t = 1, \dots, L

, we set:

H_{b, t, i, c}^{(2)} = \sum_{m = - M}^{M} g_{m} H_{b, t + m, i, c}^{(1)}

(A55)

with “same-length” padding along t to preserve length L. Equivalently, this is a depthwise 1D convolution along the time axis with C groups, applied independently for every node i. The smoothed representation has the same shape:

H^{(2)} \in R^{B \times L \times N \times C}

(A56)

Intuitively, this attenuates high-frequency temporal noise while preserving level and trend before temporal attention.

Appendix D.2.4. Temporal Attention

Temporal attention aggregates the sequence into a single representation per node. Let

W_{att} \in R^{C \times C}

,

b_{att} \in R^{C}

, and

v_{att} \in R^{C}

be learnable parameters for each

(b, t, i)

the attention score is:

s_{b, t, i} = v_{att}^{⊤} tanh (W_{att} H_{b, t, i, :}^{(2)} + b_{att})

(A57)

For each

(b, i)

, we define the stabilized scores:

{\tilde{s}}_{b, t, i} = s_{b, t, i} - {max}_{τ \in {1, \dots, L}} s_{b, τ, i}

and normalize across time:

α_{b, i, t} = \frac{exp ({\tilde{s}}_{b, t, i})}{\sum_{τ = 1}^{L} exp ({\tilde{s}}_{b, τ, i})} \sum_{t = 1}^{L} α_{b, i, t} = 1

(A58)

The attention-weighted context per node is:

C_{b, i, :} = \sum_{t = 1}^{L} α_{b, i, t} H_{b, t, i, :}^{(2)} \in R^{C}

(A59)

Stacking over batch and nodes yields:

H^{(3)} = C \in R^{B \times N \times C}

(A60)

Appendix D.2.5. Diffusion Convolution

Let

H^{(3)} \in R^{B \times N \times C}

denote the node representations obtained from the temporal module. Spatial aggregation is performed using a diffusion process over the directed graph. We define the propagation operator:

P = A^{⊤} \in R^{N \times N}

(A61)

Thus,

{(P X)}_{j} = \sum_{i} A_{i j} X_{i}

aggregates information from incoming neighbors. Given learnable weights

{W_{k} \in R^{C \times C}}_{k = 0}^{K}

and bias

b_{diff} \in R^{C}

, the K-hop diffusion convolution is:

Z_{b}^{(diff)} = \sum_{k = 0}^{K} (P^{k} H_{b}^{(3)}) W_{k} + 1 b_{diff}^{⊤}, b = 1, \dots, B

(A62)

where

H_{b}^{(3)} \in R^{N \times C}

and

1 \in R^{N}

is the all-ones vector so that

1 b_{diff}^{⊤} \in R^{N \times C}

broadcasts the bias across nodes. Note that

P^{0} = I

retains each node’s own signal; larger k collect k-hop incoming neighborhoods. We apply a pointwise activation:

H^{(4)} = ReLU (Z^{(diff)}) \in R^{B \times N \times C}

(A63)

Appendix D.2.6. Graph Attention Layer

Given

H^{(4)} \in R^{B \times N \times C}

and the directed adjacency A, we apply a multi-head graph attention layer that aggregates incoming neighbors at each node j. Let the number of heads be

H_{att} \in N

. For head

h = 1, \dots, H_{att}

, let

W^{(h)} \in R^{C \times C}

and

a^{(h)} \in R^{2 C}

be learnable, and let

{LReLU}_{α}

denote LeakyReLU. For batch b and node i:

\begin{matrix} q_{b, i}^{(h)} = H_{b, i, :}^{(4)} W^{(h)} \in R^{C} \\ e_{b, i \to j}^{(h)} = {LReLU}_{α} (a^{(h) ⊤} [q_{b, i}^{(h)} ∥ q_{b, j}^{(h)}]), i \in N_{in} (j) = {p : A_{p j} > 0} \end{matrix}

(A64)

Attention coefficients are a masked softmax over the incoming neighborhood:

α_{b, i \to j}^{(h)} = \frac{exp (e_{b, i \to j}^{(h)})}{\sum_{p \in N_{in} (j)} exp (e_{b, p \to j}^{(h)})}, i \in N_{in} (j)

(A65)

Each head aggregates projected neighbor features:

o_{b, j, :}^{(h)} = \sum_{i \in N_{in} (j)} α_{b, i \to j}^{(h)} q_{b, i}^{(h)} \in R^{C}

(A66)

We average across heads to keep the hidden size C (no concatenation):

H_{b, j, :}^{(5)} = \frac{1}{H_{att}} \sum_{h = 1}^{H_{att}} o_{b, j, :}^{(h)} \in R^{C}, H^{(5)} \in R^{B \times N \times C}

(A67)

Appendix D.2.7. Residual Connection from Node Embeddings

We inject a node-identity residual derived from the embeddings

E \in R^{N \times C}

. A linear projection shapes the residual in the current hidden space. The parameters are trained jointly with the network:

\begin{matrix} W_{res} \in R^{C \times C} \\ b_{res} \in R^{C} \\ R_{i, :} = E_{i, :} W_{res} + b_{res}, i = 1, \dots, N \end{matrix}

(A68)

Stacking across nodes gives

R \in R^{N \times C}

. Broadcasting R across the batch, we add it to the spatially attended features:

H_{b, i, :}^{(6)} = H_{b, i, :}^{(5)} + R_{i, :}, H^{(6)} \in R^{B \times N \times C}

(A69)

This residual preserves a learnable, time-invariant node-identity pathway alongside the temporal-spatial processing.

Appendix D.2.8. Linear Decoder

The decoder maps each node’s hidden state to H future steps and F features per node. Let:

\begin{matrix} W_{dec} \in R^{C \times (H F)} \\ b_{dec} \in R^{H F} \end{matrix}

(A70)

For each

(b, i)

,

z_{b, i, :}^{(dec)} = H_{b, i, :}^{(6)} W_{dec} + b_{dec} \in R^{H F}

(A71)

where

b_{dec}

is broadcast across the

B \times N

pairs. We interpret the

(H F)

-vector as an

(H, F)

matrix and stack across nodes:

{\hat{Y}}_{b, 1 : H, i, 1 : F} = reshape (z_{b, i, :}^{(dec)}, (H, F)), \hat{Y} \in R^{B \times H \times N \times F}

(A72)

No output activation is applied, so the decoder is linear. Since targets

Y_{u + 1 : u + H}

were defined in original units,

\hat{Y}

is directly comparable to the ground truth (km/h) and naturally supports both multi-horizon (

H > 1

) and multi-feature (

F > 1

) settings.

References

Zhang, N.; Wang, F.Y.; Zhu, F.; Zhao, D.; Tang, S. DynaCAS: Computational Experiments and Decision Support for ITS. IEEE Intell. Syst. 2008, 23, 19–23. [Google Scholar] [CrossRef]
Yin, X.; Wu, G.; Wei, J.; Shen, Y.; Qi, H.; Yin, B. Deep Learning on Traffic Prediction: Methods, Analysis, and Future Directions. IEEE Trans. Intell. Transp. Syst. 2022, 23, 4927–4943. [Google Scholar] [CrossRef]
Jiang, W.; Luo, J.; He, M.; Gu, W. Graph Neural Network for Traffic Forecasting: The Research Progress. ISPRS Int. J. Geo-Inf. 2023, 12, 100. [Google Scholar] [CrossRef]
Philip Chen, C.; Zhang, C.Y. Data-intensive applications, challenges, techniques and technologies: A survey on Big Data. Inf. Sci. 2014, 275, 314–347. [Google Scholar] [CrossRef]
Wang, K.; Ma, C.; Qiao, Y.; Lu, X.; Hao, W.; Dong, S. A hybrid deep learning model with 1DCNN-LSTM-Attention networks for short-term traffic flow prediction. Phys. A Stat. Mech. Its Appl. 2021, 583, 126293. [Google Scholar] [CrossRef]
Liu, J.; Liu, W.; Li, X.; Chen, A.; Schonfeld, P.; Du, B. Hybrid Ensemble Learning Model Combining BERT and CNN for Predicting Urban Rail Transit Accident Consequences. IEEE Trans. Intell. Transp. Syst. 2025, 26, 12727–12739. [Google Scholar] [CrossRef]
Yu, B.; Yin, H.; Zhu, Z. Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence IJCAI-18, Stockholm, Sweden, 13–19 July 2018; International Joint Conferences on Artificial Intelligence: Stockholm, Sweden, 2018; pp. 3634–3640. [Google Scholar] [CrossRef]
Mitsakis, E.; Tzanis, D.; Petkani, V.; Dolianitis, A.; Mintsis, E.; Kotsi, A.; Psonis, V. A Data-Driven Decision Support System for Multimodal Network and Traffic Management–SYNCHROMODE. In Climate Crisis and Resilient Transportation Systems; Nathanail, E.G., Gavanas, N., Adamos, E., Eds.; Springer: Cham, Switzerland, 2025; pp. 94–106. [Google Scholar] [CrossRef]
Chen, C.; Hu, J.; Meng, Q.; Zhang, Y. Short-time traffic flow prediction with ARIMA-GARCH model. In 2011 IEEE Intelligent Vehicles Symposium (IV); IEEE: New York, NY, USA, 2011; pp. 607–612. [Google Scholar] [CrossRef]
Alghamdi, T.; Elgazzar, K.; Bayoumi, M.; Sharaf, T.; Shah, S. Forecasting Traffic Congestion Using ARIMA Modeling. In 2019 15th International Wireless Communications and Mobile Computing Conference (IWCMC); IEEE: New York, NY, USA, 2019; pp. 1227–1232. [Google Scholar] [CrossRef]
Xu, D.W.; Wang, Y.D.; Jia, L.M.; Qin, Y.; Dong, H.H. Real-time Road Traffic State Prediction Based on ARIMA and Kalman Filter. Front. Inf. Technol. Electron. Eng. 2017, 18, 287–302. [Google Scholar] [CrossRef]
Williams, B. Multivariate Vehicular Traffic Flow Prediction: Evaluation of ARIMAX Modeling. Transp. Res. Rec. 2001, 1776, 194–200. [Google Scholar] [CrossRef]
Williams, B.; Hoel, L. Modeling and Forecasting Vehicular Traffic Flow as a Seasonal ARIMA Process: Theoretical Basis and Empirical Results. J. Transp. Eng. 2003, 129, 664–672. [Google Scholar] [CrossRef]
Ramakrishnan, N.; Soni, T. Network Traffic Prediction Using Recurrent Neural Networks. In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA); IEEE: New York, NY, USA, 2018; pp. 187–193. [Google Scholar] [CrossRef]
Poonia, P.; Jain, V.K. Short-Term Traffic Flow Prediction: Using LSTM. In 2020 International Conference on Emerging Trends in Communication, Control and Computing (ICONC3); IEEE: New York, NY, USA, 2020; pp. 1–4. [Google Scholar] [CrossRef]
Shao, H.; Soong, B.H. Traffic flow prediction with Long Short-Term Memory Networks (LSTMs). In 2016 IEEE Region 10 Conference (TENCON); IEEE: New York, NY, USA, 2016; pp. 2986–2989. [Google Scholar] [CrossRef]
Cao, M.; Li, V.O.K.; Chan, V.W.S. A CNN-LSTM Model for Traffic Speed Prediction. In 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring); IEEE: New York, NY, USA, 2020; pp. 1–5. [Google Scholar] [CrossRef]
Zhao, Z.; Li, Z.; Li, F.; Liu, Y. CNN-LSTM Based Traffic Prediction Using Spatial-temporal Features. J. Phys. Conf. Ser. 2021, 2037, 012065. [Google Scholar] [CrossRef]
Topilin, I.; Jiang, J.; Feofilova, A.; Beskopylny, N. Traffic Flow Prediction via a Hybrid CPO-CNN-LSTM-Attention Architecture. Smart Cities 2025, 8, 148. [Google Scholar] [CrossRef]
Biju, R.; Goparaju, S.U.; Gangadharan, D.; Mandal, B. Grid LSTM based Attention Modelling for Traffic Flow Prediction. In 2024 IEEE 99th Vehicular Technology Conference (VTC2024-Spring); IEEE: New York, NY, USA, 2024; pp. 1–7. [Google Scholar] [CrossRef]
Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; Li, H. T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction. IEEE Trans. Intell. Transp. Syst. 2020, 21, 3848–3858. [Google Scholar] [CrossRef]
Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Zhang, C. Graph WaveNet for Deep Spatial-Temporal Graph Modeling. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, Macao, China, 10–16 August 2019; International Joint Conferences on Artificial Intelligence: Macao, China, 2019; pp. 1907–1913. [Google Scholar] [CrossRef]
Bai, L.; Yao, L.; Li, C.; Wang, X.; Wang, C. Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, BC, Canada, 6–12 December 2020; Available online: https://dl.acm.org/doi/10.5555/3495724.3497218 (accessed on 14 April 2026).
Bao, Y.; Huang, J.; Shen, Q.; Cao, Y.; Ding, W.; Shi, Z.; Shi, Q. Spatial–Temporal Complex Graph Convolution Network for Traffic Flow Prediction. Eng. Appl. Artif. Intell. 2023, 121, 106044. [Google Scholar] [CrossRef]
Lin, C.Y.; Su, H.T.; Tung, S.L.; Hsu, W.H. Multivariate and Propagation Graph Attention Network for Spatial-Temporal Prediction with Outdoor Cellular Traffic. In CIKM ’21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management; Association for Computing Machinery: New York, NY, USA, 2021; pp. 3248–3252. [Google Scholar] [CrossRef]
Xiao, L.; Chen, H. Spatio-temporal Transformer Graph Network for Traffic Flow Forecasting. In 2024 3rd International Joint Conference on Information and Communication Engineering (JCICE); IEEE: New York, NY, USA, 2024; pp. 224–228. [Google Scholar] [CrossRef]
Xin, L.; Yong-sheng, Q.; Zeng, J.; Yang, M.; Zhang, F. Dynamic Graph Convolutional Recurrent Network With Temporal Self-Attention for Accurate Traffic Flow Prediction. IET Intell. Transp. Syst. 2025, 19, e70118. [Google Scholar] [CrossRef]
Alkhammash, M. STMGCN: A Spatiotemporal Multi-Graph Convolutional Networks for Real-Time Intelligent Transportation Traffic Flow Prediction. IEEE Trans. Consum. Electron. 2026, 72, 1300–1311. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, Y.; Gao, S.; Raubal, M. Context-Aware Knowledge Graph Framework for Traffic Speed Forecasting Using Graph Neural Network. IEEE Trans. Intell. Transp. Syst. 2025, 26, 3885–3902. [Google Scholar] [CrossRef]
Das, A.; Kong, W.; Sen, R.; Zhou, Y. A decoder-only foundation model for time-series forecasting. arXiv 2024, arXiv:2310.10688. [Google Scholar]
Satorras, V.G.; Rangapuram, S.S.; Januschowski, T. Multivariate Time Series Forecasting with Latent Graph Inference. arXiv 2022, arXiv:2203.03423. [Google Scholar] [CrossRef]
Gao, J.; Ribeiro, B. On the Equivalence Between Temporal and Static Graph Representations for Observational Predictions. arXiv 2023, arXiv:2103.07016. [Google Scholar] [CrossRef]

Figure 1. Overall workflow of the proposed context-aware STGCN framework, illustrating the pipeline from raw data preprocessing to real-time traffic forecasting with residual correction.

Figure 2. Proposed STGCN architecture. A temporal block encodes per-node input sequences to produce

H^{(3)}

. The spatial block applies diffusion convolution with the directed adjacency A to obtain

H^{(4)}

, followed by a masked GAT over incoming neighbors to yield

H^{(5)}

. A residual projection from the node embeddings is added, and a linear decoder maps to the H-step forecasts

\hat{Y}

. Shapes shown in the boxes refer to tensor dimensions while arrows indicate data flow.

Figure 2. Proposed STGCN architecture. A temporal block encodes per-node input sequences to produce

H^{(3)}

. The spatial block applies diffusion convolution with the directed adjacency A to obtain

H^{(4)}

, followed by a masked GAT over incoming neighbors to yield

H^{(5)}

. A residual projection from the node embeddings is added, and a linear decoder maps to the H-step forecasts

\hat{Y}

. Shapes shown in the boxes refer to tensor dimensions while arrows indicate data flow.

Figure 3. Residual correction framework. The baseline STGCN produces initial predictions

{\hat{Y}}^{(1)}

, while a second STGCN learns residual errors

\hat{R}

. The final prediction is obtained by summation.

Figure 3. Residual correction framework. The baseline STGCN produces initial predictions

{\hat{Y}}^{(1)}

, while a second STGCN learns residual errors

\hat{R}

. The final prediction is obtained by summation.

Figure 4. Modeled traffic network of Thessaloniki, Greece. Directed paths represent the predefined traffic paths used for average-speed forecasting, covering major urban corridors and representative routes.

Figure 5. Daytime time-series comparison (07:00–19:00) of observed speeds, baseline STGCN forecasts, and residual-corrected predictions for three representative traffic paths in Thessaloniki. The selected paths correspond to high-, medium-, and low-speed regimes, capturing heterogeneous traffic dynamics across the network during the most active operating period.

Figure 6. Morning peak-hour (08:00–10:00) time-series comparison for the same representative paths shown in Figure 5. The results highlight the behavior of baseline and residual-corrected predictions under rapidly changing traffic conditions.

Figure 7. Regional traffic network of the South Holland (Keukenhof) case study. The network consists of 1159 directed road paths representing major regional corridors surrounding the Keukenhof exhibition area.

Figure 8. Real-time travel time forecasting performance for a representative road segment in the South Holland (Keukenhof) network.

Table 1. Prediction accuracy and training time of benchmark models across temporal resolutions in the Thessaloniki case study.

Model	Resolution (minutes)	Training Time (seconds)	MAE (km/h)	MAPE (%)	RMSE (km/h)
STGCN (Proposed)	1	6592	0.98	2.92	1.74
	5	4432	1.45	5.42	2.47
	15	2565	1.83	6.33	2.69
	30	2378	2.05	7.11	3.12
TimesFM [30]	1	N/A *	1.20	3.42	2.27
	5	N/A *	1.87	6.27	2.84
	15	N/A *	2.16	7.61	3.08
	30	N/A *	2.35	8.18	3.57
FC-GNN [31]	1	25,580	1.03	3.09	2.07
	5	13,616	1.55	5.52	2.64
	15	5568	1.92	6.63	2.98
	30	5201	2.15	7.19	3.29
GRUGCN [32]	1	8219	1.09	3.30	2.17
	5	4956	1.61	5.60	2.69
	15	3807	2.12	7.57	3.07
	30	3562	2.28	8.14	3.52

* TimesFM is a pretrained time-series foundation model and does not require task-specific training; therefore, no training time is reported.

Table 2. Performance of the 5-min STGCN model in the Thessaloniki case study under offline evaluation, real-time operation, and real-time operation with residual correction.

Evaluation Setting	MAE (km/h)	MAPE (%)	RMSE (km/h)
Offline evaluation	1.45	5.42	2.47
Real-time operation (baseline)	1.68	5.83	2.84
Real-time operation (with residual correction)	0.41	1.45	1.04

Table 3. Training and real-time prediction performance for multi-horizon average travel time forecasting in the South Holland (Keukenhof) case study.

Setting	Horizon (Minutes)	MAE (Seconds)	MAPE (%)	RMSE (Seconds)
Training	15	0.10	2.92	0.24
	30	0.15	5.42	0.44
	45	0.20	7.11	0.57
Real-Time	15	0.12	3.95	0.32
	30	0.22	7.61	0.62
	45	0.25	8.63	0.71

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Karetsos, P.; Petkani, V.; Tzanis, D.; Mintsis, E.; Mitsakis, E. A Scalable Context-Aware STGCN Framework for Real-Time Traffic Forecasting with Residual Correction. Future Transp. 2026, 6, 111. https://doi.org/10.3390/futuretransp6030111

AMA Style

Karetsos P, Petkani V, Tzanis D, Mintsis E, Mitsakis E. A Scalable Context-Aware STGCN Framework for Real-Time Traffic Forecasting with Residual Correction. Future Transportation. 2026; 6(3):111. https://doi.org/10.3390/futuretransp6030111

Chicago/Turabian Style

Karetsos, Panagiotis, Viktoria Petkani, Dimitris Tzanis, Evangelos Mintsis, and Evangelos Mitsakis. 2026. "A Scalable Context-Aware STGCN Framework for Real-Time Traffic Forecasting with Residual Correction" Future Transportation 6, no. 3: 111. https://doi.org/10.3390/futuretransp6030111

APA Style

Karetsos, P., Petkani, V., Tzanis, D., Mintsis, E., & Mitsakis, E. (2026). A Scalable Context-Aware STGCN Framework for Real-Time Traffic Forecasting with Residual Correction. Future Transportation, 6(3), 111. https://doi.org/10.3390/futuretransp6030111

Article Menu

A Scalable Context-Aware STGCN Framework for Real-Time Traffic Forecasting with Residual Correction

Abstract

1. Introduction

2. Related Work

2.1. Traditional Traffic Forecasting Methods

2.2. Deep Learning Approaches for Traffic Forecasting

2.3. Graph-Based Traffic Forecasting Models

3. Materials and Methods

3.1. Problem Formulation

3.2. Data Preprocessing and Imputation

3.3. Preparation of Model Inputs

3.4. Graph Construction and Adjacency Matrix

3.5. Model

3.5.1. Input Representation

3.5.2. Temporal Processing

3.5.3. Spatial Processing

3.5.4. Residual Connection and Decoding

3.6. Residual Correction Framework

4. Results

4.1. Case Study: Thessaloniki

4.1.1. Offline Evaluation Across Temporal Aggregation Levels

4.1.2. Real-Time Deployment and Residual Correction

4.2. Case Study: South Holland (Keukenhof Region)

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Code Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Detailed Imputation Formulation

Appendix A.1. Preprocessing and Validation

Appendix A.2. Stage A: Week-Back Seasonal Carryover

Appendix A.3. Stage B: Minute-of-Day Climatology

Appendix A.4. Stage C: Continuity Completion

Appendix B. Dataset Construction Details

Appendix B.1. Temporal Aggregation

Appendix B.2. Chronological Dataset Partitioning

Appendix B.3. Global Standardization

Appendix B.4. Sliding Window Construction

Appendix C. Adjacency Matrix Construction Details

Appendix C.1. Directed Shortest-Path Distances

Appendix C.2. Distance Kernel and Sparsification

Appendix C.3. Self-Loops and Column Normalization

Appendix D. Detailed Model Formulation

Appendix D.1. Input Tensor Construction

Appendix D.2. Graph Representation

Appendix D.2.1. Linear Encoder and Node Embeddings

Appendix D.2.2. Recurrent Temporal Modeling (GRU)

Appendix D.2.3. Gaussian Temporal Smoothing

Appendix D.2.4. Temporal Attention

Appendix D.2.5. Diffusion Convolution

Appendix D.2.6. Graph Attention Layer

Appendix D.2.7. Residual Connection from Node Embeddings

Appendix D.2.8. Linear Decoder

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI