TFGCRN: Temporal–Frequency Graph Convolutional Recurrent Network for Incomplete Traffic Forecasting

Jiazhan Hu; Tao Feng

doi:10.3390/math13244003

and

Urban and Data Science Lab, Graduate School of Advanced Science and Engineering, Hiroshima University, Hiroshima 739-8529, Japan

^*

Author to whom correspondence should be addressed.

Mathematics2025, 13(24), 4003;https://doi.org/10.3390/math13244003

This article belongs to the Special Issue Optimization and Modeling in Spatio-Temporal Data Mining Using Graph Neural Networks

Version Notes

Order Reprints

Abstract

Traffic forecasting is a crucial component that underpins an intelligent transportation system. Among the current mainstream forecasting algorithms, spatial–temporal graph neural networks (STGNNs), as the mainstream solution, have been used in traffic forecasting due to their ability to model spatial–temporal dependencies effectively. However, sensor failures caused by factors such as bad weather often lead to incomplete traffic data, which severely prevents STGNNs from modeling spatial–temporal dependencies and consequently degrades forecasting performance. To achieve accurate forecasting under incomplete traffic conditions, this paper proposes a Temporal–Frequency Graph Convolutional Recurrent Network (TFGCRN) model that embeds a Temporal–Frequency Graph Convolutional Network into gated recurrent units. During the recursive modeling process, TFGCRN fully leverages both global and local information to resist the adverse effects of missing values while generating more accurate spatial relationships, thereby achieving precise incomplete traffic forecasting. The experiments on four real-world datasets show that TFGCRN can achieve satisfactory results superior to multiple baselines and effectively adapt to different missing rates. Compared with the state-of-the-art baseline, TFGCRN can reduce forecasting error by 2–6%.

Keywords:

incomplete traffic forecasting; intelligent transportation system; spatial-temporal graph neural networks; algorithms

MSC:

37M10; 62M30; 62M45; 97R40

1. Introduction

With the acceleration of urbanization and the rapid increase in the number of motor vehicles, problems such as traffic congestion, frequent accidents, and resource waste have become core bottlenecks that hinder the efficient operation of cities [1,2]. Intelligent transportation systems (ITSs) are the key technological framework to solve these issues [3,4]. As an important part of the ITS, traffic time series forecasting is gradually becoming a crucial technology for addressing traffic congestion and improving travel efficiency [5]. Simply put, traffic time series forecasting involves analyzing traffic data ordered by time to uncover patterns in traffic flow, speed, and other metrics as they change over time, thereby making precise forecasts about future traffic conditions [6]. Using traffic flow and speed data as examples, traffic time series forecasting can accurately capture and predict road congestion states and future changes [7]. By analyzing these patterns, traffic management departments can proactively prepare, such as by deploying additional traffic officers on key routes during peak hours or optimizing signal light timing, thus effectively alleviating traffic pressure [8]. In conclusion, traffic forecasting plays an irreplaceable core role in the construction of ITS [9]. It provides scientific and accurate decision-making support for traffic planners, enabling more rational allocation of traffic resources, injecting strong momentum into the efficient operation of urban traffic, and becoming an indispensable smart solution for modern urban traffic management [10].

To achieve accurate traffic forecasting, various mathematical algorithms have been proposed [11]. The mainstream approaches can be broadly categorized into statistical models and artificial intelligence (AI) models [12,13,14]. The former primarily constructs predictive models based on vehicle dynamics and mathematical formulas [15,16]. Although it has strong interpretability, it is unable to model non-linearity and ignores the spatial dependencies between different traffic time series, limiting its practical value [17]. The latter introduces the concept of non-linear modeling based on mathematical theory, which ensures that the model can capture traffic time series from both temporal and spatial dimensions, thereby achieving better performance than statistical models [18]. Currently, the spatial–temporal graph neural network (STGNN) [19,20,21], as a mainstream AI model, effectively combines the Graph Convolutional Network (GCN) and sequence models to exploit the spatial–temporal dependencies in traffic time series fully, and has thus found widespread application.

However, the effectiveness of STGNNs usually relies on the assumption of data completeness [22]. Specifically, graph convolution requires prior knowledge or node features to model spatial relationships, while sequence models rely on temporal context information to analyze periodicity and local temporal details in the data [23]. In reality, the sensors used for data collection are often affected by factors such as bad weather or component failures, leading to a large number of missing values in the collected data (typically represented as zeros), which leads to existing models often needing to mine incomplete traffic time series and predict future values [24]. In this case, along the temporal dimension, missing values disrupt periodicity and trends while introducing erroneous local details (e.g., abrupt changes or flat segments). In the spatial dimension, missing values alter the patterns among different data streams, causing incorrect spatial relationships during similarity analysis. These phenomena make it difficult for STGNNs to accurately capture spatial–temporal dependencies, resulting in suboptimal forecasting results in incomplete traffic forecasting.

To address the adverse effects caused by missing values and to accurately capture spatiotemporal dependencies, we propose a novel Temporal–Frequency Graph Convolutional Recurrent Network (TFGCRN). The core components of TFGCRN include spatial–temporal embedding, the Temporal–Frequency Graph Convolutional Network (TFGCN), and gated recurrent unit (GRU). Specifically, the spatial–temporal embedding introduces additional global information to mitigate the disruption of both global and local information caused by missing values, which can map missing values to available high-dimensional representations, preventing the model from uncovering incorrect patterns. Then, the TFGCN constructs graph structures from both the frequency and temporal domains to reconstruct spatial relationships from global and local perspectives, aiming to mitigate the impact of missing values on local information and thereby improving the accuracy of modeling spatial relationships. Finally, the TFGCN is embedded into the GRU and replaces all fully connected layers, enabling dynamic extraction of spatial–temporal dependencies and achieving satisfactory results in incomplete traffic forecasting. The main contributions of this paper are as follows:

To alleviate the adverse effects caused by missing values, this paper proposes the Temporal–Frequency Graph Convolutional Recurrent Network (TFGCRN), which aims to dynamically reconstruct spatial–temporal dependencies from both global and local perspectives, thereby achieving improved performance in incomplete traffic forecasting.
This paper proposes the Temporal–Frequency Graph Convolutional Network (TFGCN) by leveraging the advantages of frequency and temporal information in capturing global and local spatial relationships. The TFGCN is used to replace all fully connected layers within the GRU, thereby recursively modeling spatial–temporal dependencies.
Experiments on four real-world datasets demonstrate that TFGCRN outperforms ten mainstream baselines and adapts well to different missing rates, verifying its superiority and robustness.

2. Related Works

2.1. Classic Traffic Forecasting Model

Classical traffic forecasting models mainly establish the mapping between historical observations and future values by combining time series and feature engineering. In this context, long short-term memory (LSTM) networks, convolutional neural networks (CNNs), and ensemble learning have gained widespread attention [25]. Alkarim et al. [26] employ ensemble learning to integrate multiple deep learning models and machine learning models, aiming to achieve stable and accurate forecasting of traffic data changes. Wang et al. [27] combine visual features with traffic flow and introduce deep learning to achieve excellent performance in statistical methods. Chen et al. [28] performed feature fusion based on road mixing rate and anomaly features, and used deep learning models to predict traffic flow changes.

In summary, these works have achieved better results than traditional statistical models and machine learning models. However, due to the lack of sufficient modeling of spatial relationships, they still fail to achieve satisfactory results in spatial–temporal traffic prediction.

2.2. Two-Stage Incomplete Traffic Forecasting Model

Mainstream two-stage models typically follow an imputation + forecasting approach: they first use an imputation model to recover missing values, and then apply a forecasting model on the recovered data to capture spatial–temporal dependencies and generate predictions. Next, we will introduce several mainstream imputation models and forecasting models, respectively.

Imputation models mainly fall into two categories: temporal-based imputation and spatial-based imputation [29]. The former primarily relies on attention mechanisms or transformer models to leverage contextual information for recovering missing values. For example, TA-SAITS [30] employs a bidirectional attention mechanism to effectively ensure high-quality data reconstruction. GPT4TS [31] introduces GPT-2 and patch-based modeling to fully exploit contextual information for imputation. Yang et al. [32] combine frequency-domain information with generative models to enhance the effectiveness of time series imputation. The latter mainly relies on spatial attention mechanisms or graph neural networks to effectively utilize the spatial relationships among different sequences for missing value recovery. For instance, GRIN [33] employs a combination of graph convolution and recurrent neural networks to efficiently reconstruct missing values. SPIN [34] integrates both spatial attention and temporal attention to ensure the reliability of data imputation. Islam et al. [35] combine attention mechanisms with diffusion models to enhance the performance of imputation.

Forecasting models. Currently, the mainstream forecasting models include multi-layer perceptron (MLP), transformer, and spatial–temporal graph neural networks [36,37,38]. The main idea behind MLP and transformer models is to model the temporal dependencies of traffic time series to predict their future values [39,40]. STID [41] uses spatial–temporal encoding with the MLP model to obtain accurate traffic forecasting results. STAEformer [42] combines spatial–temporal encoding with the transformer model to obtain accurate traffic forecasting results. MGSFformer [43] combines the concepts of spatiotemporal attention and multi-scale modeling to achieve accurate time series forecasting. LGSTformer [44] integrates multi-scale temporal convolutions, dynamic–static graph convolutions, and spatiotemporal self-attention to capture spatial–temporal dependencies. Chen et al. [45] introduce the idea of dynamic trend modeling to improve the transformer, achieving excellent forecasting results. Compared to the above two methods, STGNNs [46,47] use graph convolution to extract spatial relationships and sequence models to capture temporal information, thereby better modeling the spatiotemporal dependencies in traffic time series. Zhang [48] proposes dynamic Graph Convolutional Networks with temporal representation learning for traffic flow forecasting. Wang et al. [49] combine a hybrid Graph Convolutional Network with a frequency-domain-based decoupling layer to fully exploit spatial–temporal dependencies, thereby achieving accurate spatial–temporal prediction results. Jiang et al. [50] proposed a dynamic adaptive Graph Convolutional Network to accurately model the spatial relationships among different traffic sequences, thereby improving the accuracy of traffic forecasting.

In summary, the two-stage model can alleviate the impact of missing values to a certain extent, and spatial–temporal graph neural networks, as an effective traffic forecasting model, have been widely applied. However, the two-stage model suffers from an inherent error accumulation problem, which limits its practical value to some extent.

2.3. End-to-End Incomplete Traffic Forecasting Model

To address the issues of two-stage models, recent studies have gradually focused on end-to-end models, which enhance the model’s ability to handle missing values by introducing additional components [51,52]. Yang et al. [53] combine large language models with vision models to improve the accuracy of incomplete traffic forecasting. Zuo et al. [54] combine dynamic graph convolution, temporal convolutional networks, and attention to ensure the model’s ability to handle missing values and achieve incomplete traffic forecasting. Bikram et al. [55] combine graph learning, multi-head attention, and GRU to mitigate the impact of missing values, thereby achieving results superior to those of traditional neural networks. Merlin [56] uses knowledge distillation and contrastive learning to enhance the robustness of deep learning models in handling missing values, thereby achieving satisfactory experimental results in incomplete traffic forecasting. Ivan et al. [57] fully leverage the advantages of down sampling and graph learning to design an STGNN that is effectively suited for incomplete traffic prediction, which significantly enhances the model’s robustness in handling missing values.

In summary, existing end-to-end models improve their ability to handle missing values through additional components (such as graph learning) while avoiding the error accumulation problem inherent in two-stage models. In addition, it can be seen that effectively enhancing the utilization of both global and local spatial relationships in existing models is key to achieving accurate traffic prediction in mainstream end-to-end models. Therefore, one of the core techniques of this paper is the introduction of different graph structures to further construct spatial relationships from both global and local perspectives.

3. Methods

3.1. Preliminaries

Traffic Time Series [58,59]: Traffic time series representation consists of multiple time-varying sequences (typically speed and flow) collected by sensors at different locations, which can be represented by a tensor

X \in ℝ^{N_{V} \times N_{H}}

. Here, N_V is the number of traffic time series. N_H is the number of time steps.

Incomplete Traffic Forecasting [60,61]: A classic traffic forecasting task involves a given historical observation tensor

X \in ℝ^{N_{V} \times N_{H}}

, which represents the observations from N_H historical time intervals; the goal of the forecasting model is to predict the values for the next N_F time steps, denoted as

Y \in ℝ^{N_{V} \times N_{F}}

. The objective of traffic forecasting is to build a mapping function from

X \in ℝ^{N_{V} \times N_{H}}

to

Y \in ℝ^{N_{V} \times N_{F}}

. The main difference in incomplete traffic forecasting lies in the presence of missing values in the historical observation data. Therefore, a random percentage M% of data points are masked from the historical observation tensor

X \in ℝ^{N_{V} \times N_{H}}

. After this process, a new input feature tensor

X_{M} \in ℝ^{N_{V} \times N_{H}}

is obtained. The core objective is to build a mapping function from

X_{M} \in ℝ^{N_{V} \times N_{H}}

to

Y \in ℝ^{N_{V} \times N_{F}}

.

3.2. Overall Framework

The overall TFGCRN is shown in Figure 1. The core idea is to integrate TFGCN into the GRU to model spatial relationships from both global and local perspectives recursively. In addition, this paper introduces spatial–temporal embeddings, which aim to provide global information for the input features and alleviate the adverse effects of missing values to some extent. The overall framework and modeling steps are as follows:

Figure 1. Overall framework of TFGCRN. The inputs to the model are incomplete traffic time series, and the outputs of the model are the complete future values.

Step I: The historical input X_M is mapped to a high-dimensional representation based on spatial–temporal embeddings and fully connected layers.

Step II: The above representation is input into the TFGCRN encoder, where the spatial–temporal dependencies are dynamically mined through recursive modeling.

Step III: The representation of the final time step modeled by the TFGCRN encoder, along with the spatial embedding, is used as the input to the decoder. The final prediction result is obtained through the generative decoder based on the MLP.

3.3. Spatial–Temporal Embedding

The spatial–temporal embeddings can provide effective global information TFGCRN, helping to mitigate the negative impact caused by missing values. It consists of two temporal embeddings (

E_{T i D}

and

E_{D i W}

) and one spatial embedding (

E_{s p a}

). These spatial–temporal embeddings are integrated with the input features X_M through concatenation and fully connected layers [62].

H_{i n} = FC (Concat (X_{M} | | E_{T i D} | | E_{D i W} | | E_{s p a}))

(1)

where FC(·) is the fully connected layer. Concat(·) represents concatenating all the tensors.

E_{T i D}

represents the time point of each value in a day.

E_{D i W}

represents the specific day of a week.

E_{T i D}

and

E_{D i W}

are fixed data obtained by constructing timestamp information from existing time series models [59]. Their dimension is

1 \times N_{H}

. They need to be copied and their dimensions have to be changed to

N_{V} \times N_{H}

before concatenating with X_M.

E_{s p a}

represents the learnable parametric matrix generated for each time series. Its dimension is

N_{V} \times N_{H}

. The dimension of H_in will ultimately become

N_{V} \times N_{H} \times d

. At this point, the missing values in X_M are mapped to high-dimensional representations.

3.4. Temporal–Frequency Graph Convolution Network

The main function of TFGCN is to effectively combine the advantages of frequency and temporal information to reconstruct spatial relationships from both global and local perspectives. Therefore, we construct two types of graph structures that combine the high-dimensional representations processed by spatial–temporal encoding and learnable node embeddings from different perspectives to establish different spatial relationships.

Spatial attention graph: It first initializes a diagonal matrix

I_{N} \in ℝ^{N_{V} \times N_{V}}

with a value of one, and randomly initializes a learnable parametric node embedding matrix

E_{N} \in ℝ^{N_{V} \times d}

. The purpose of the diagonal matrix is to ensure that each node preserves its own information while interacting spatially with other nodes, which is a common practice in mainstream-related work [63]. Then, based on the high-dimensional representation

h_{i n} \in ℝ^{N_{V} \times d}

processed by spatial–temporal embedding and the graph embedding matrix

E_{N} \in ℝ^{N_{V} \times d}

, the signal graph embedding matrix

E_{S A G} \in ℝ^{N_{V} \times d}

for the spatial attention graph is obtained.

h_{i n} \in ℝ^{N_{V} \times d}

is the high-dimensional representation at a specific time step within the

N_{H}

time steps of H_in.

E_{S A G} = softmax (\frac{(W_{q} E_{N}) * {(W_{k} h_{i n})}^{T}}{\sqrt{d_{k}}}) W_{v} h_{i n}

(2)

where ∗ is the matrix multiplication. W is the weight coefficient, which is implemented based on a fully connected layer. The dimension of

(W_{q} E_{N})

is

N_{V} \times d

.

{(W_{k} h_{i n})}^{T}

is the transpose of

W_{k} h_{i n}

. Its dimension is

d \times N_{V}

.

softmax (\cdot)

is the activation function. Its setting is mainly used to handle the last dimension of the matrix.

d_{k}

is used to optimize the data distribution. The spatial attention graph

A_{S A G} \in ℝ^{N_{V} \times N_{V}}

can be computed as follows:

A_{S A G} = (I_{N} + softmax (TopK (ReLU (E_{S A G} E_{S A G}^{T})))

(3)

where

E_{S A G}^{T}

is the transpose of

E_{S A G}

.

TopK (\cdot)

represents the selection of the top K most important nodes.

Frequency graph: It also needs to initialize a diagonal matrix

I_{N} \in ℝ^{N_{V} \times N_{V}}

with a value of one, and randomly initializes a learnable parametric node embedding matrix

E_{N} \in ℝ^{N_{V} \times d}

. Then, the high-dimensional representation

h_{i n} \in ℝ^{N_{V} \times d}

is processed through Fourier transformation. Based on concatenation and a fully connected layer, it is combined with graph embeddings

E_{N} \in ℝ^{N_{V} \times d}

to obtain the signal graph embedding matrix

E_{F G} \in ℝ^{N_{V} \times d}

, which is used for the frequency graph.

E_{F G} = FC (FFT (h_{i n}) | | E_{N})

(4)

where

FFT (\cdot)

is the fast Fourier transform function. Since the spatial attention graph already preserves the information related to data amplitude, the frequency graph primarily retains phase information to reflect global patterns. The frequency graph

A_{F G} \in ℝ^{N_{V} \times N_{V}}

can be computed as follows:

A_{F G} = (I_{N} + softmax (TopK (ReLU (E_{F G} E_{F G}^{T})))

(5)

TFGCN: Next, this paper implements the effective integration of the above two graph structures using layer normalization and the basic principles of graph convolution and proposes TFGCN. The specific formula is as follows:

H_{T F G} = F_{L N} (A_{T G} H_{i n} W_{1} + b_{1} + A_{F G} H_{i n} W_{2} + b_{2})

(6)

where

F_{L N} (\cdot)

is the layer normalization. Through the methods above, the TFGCN effectively integrates frequency graph and spatial attention graph, ensuring a more comprehensive exploration of spatial relationships.

3.5. TFGCRN Cell

The TFGCRN cell is the most crucial structural component of the proposed model. Its core idea is to replace all fully connected layers in the GRU cell with TFGCN, enabling recursive extraction of spatial–temporal dependencies. Specifically, each hidden state

h_{t} \in ℝ^{N_{V} \times d}

in the input sequence

H_{i n} = {h_{1}, h_{2}, \dots, h_{T}}

is modeled by a TFGCRN cell, which captures temporal and frequency-domain dependencies through its gated recurrent structure. Based on the basic principles of GRU [64], the formulas for each TFGCRN cell are given as follows:

z_{t} = σ (TFGCN (h_{t}) + TFGCN (c_{t - 1}) + b_{z})

(7)

r_{t} = σ (TFGCN (h_{t}) + TFGCN (c_{t - 1}) + b_{r})

(8)

{\tilde{c}}_{t} = \tanh (TFGCN (h_{t}) + TFGCN (r_{t} ⊙ c_{t - 1}) + b_{c})

(9)

c_{t} = (1 - z_{t}) ⊙ c_{t - 1} + z_{t} ⊙ {\tilde{c}}_{t}

(10)

where z_t is the update gate. r_t is the reset gate.

{\tilde{c}}_{t}

is the candidate hidden state.

σ (\cdot)

is the sigmoid activation function. b is the bias.

\tanh (\cdot)

is the hyperbolic tangent activation function.

⊙

is the Hadamard product (element-wise multiplication). b is the bias term with dimension

1 \times d

. All other tensors also have dimension

N_{V} \times d

. Since the spatiotemporal embeddings effectively map missing values into high-dimensional representations and TFGCN models spatial relationships on top of this, the gating mechanism can effectively mitigate the adverse effects caused by zero values.

4. Experiments

4.1. Datasets

To verify the performance of TFGCRN in incomplete traffic forecasting, this section conducts experimental studies on four public datasets, including METR-LA, PEMS-BAY, PEMS04, and PEMS08. METR-LA and PEMS-BAY are traffic speed datasets, while PEMS04 and PEMS08 are traffic flow datasets. The real-world distribution of the data, along with more comprehensive details, can be found in the descriptions provided in the related work [34,65]. The basic statistical information of these datasets is shown in Table 1, and their brief descriptions are shown as follows:

Table 1. The basic statistical information of these four datasets.

METR-LA: The dataset records traffic speeds measured by loop detectors installed throughout the Los Angeles County road network. It includes observations from 207 sensors collected between 1 March and 30 June 2012. Each sensor provides readings every five minutes, resulting in a total of 34,272 time points.
PEMS-BAY: The dataset consists of traffic speed records obtained from the California Transportation Agencies (CalTrans) Performance Measurement System (PeMS). It includes measurements from 325 sensors collected between 1 January and 31 May 2017. Each time series was sampled every five minutes, resulting in a total of 52,116 time points.
PEMS04: The dataset includes traffic flow information sourced from the CalTrans Performance Measurement System (PeMS). It comprises data from 307 sensors collected between 1 January and 28 February 2018. Each series is recorded at five-minute intervals, yielding a total of 16,992 time points.
PEMS08: The dataset contains traffic flow data obtained from the CalTrans Performance Measurement System (PeMS). It includes observations from 170 sensors recorded between 1 July and 31 August 2018. Each time series was sampled every five minutes, producing a total of 17,833 time points.

4.2. Experimental Setup

Baselines. The baselines adopted in this paper mainly consist of two types: the two-stage model (PatchTST [66] + SAITS [67], DCRNN [68] + GPT4TS [31], FourierGNN [69] + GATGPT [70], and MTGNN [71] + GRIN [33]) and the end-to-end forecasting model (LGnet [72], GC-VRNN [73], RIHGCN [22], GSTAE [74], BiTGraph [75], and GinAR+ [76]). The basic introduction of the adopted baselines is as follows:

PatchTST + SAITS: The former employs patch encoding and a transformer to perform traffic time series forecasting, while the latter uses an attention mechanism to restore missing values in the data.
DCRNN + GPT4TS: The former combines GCN and GRU to achieve traffic forecasting, while the latter uses Patch and GPT-2 to restore missing values.
FourierGNN + GATGPT: The former replaces graph convolution with Fourier computation to perform traffic forecasting, while the latter combines GAT and GPT-2 to restore missing values.
MTGNN + GRIN: The former combines graph learning and temporal convolutional networks to achieve traffic forecasting, while the latter Employs Graph Convolutional Networks for data recovery.
LGnet: It combines memory unit components to enhance the performance of recurrent neural networks in incomplete traffic forecasting.
GC-VRNN: It integrates multi-space GCN with conditional variational RNN to achieve incomplete traffic forecasting.
RIHGCN: It combines graph convolutions constructed from geographic distances and historical similarity with a bidirectional recurrent imputation mechanism and LSTM to enable incomplete traffic forecasting.
GSTAE: It proposes a graph-based spatial–temporal autoencoder that follows an encoder–decoder structure for incomplete traffic forecasting.
BiTGraph: It employs the biased temporal convolution graph neural network to effectively achieve incomplete traffic forecasting.
GinAR+: It integrates interpolation attention, adaptive graph convolution, and simple recurrent units to achieve incomplete traffic forecasting.

Setting. This paper constructs the comparative experiment from the following perspectives to ensure fairness and robustness:

In this paper, following mainstream benchmarks [77], all datasets are divided proportionally into training sets, validation sets, and test sets.
Following mainstream-related works [78], both the historical observation length and the future forecasting length of the model are set to 12, and the evaluation metric is the average performance over the 12-step forecasting.
This paper is primarily conducted under the classic random-missing setting [79]. Since the distribution of missing values is not fixed, the experimental results can better demonstrate the robustness of the model. The missing rates are set to 25%, 50%, and 75%. All missing values are uniformly set to zero using a random masking approach.
Currently, most mainstream baselines conduct experiments directly on the datasets used in this paper without additional data cleaning. To ensure a fair comparison, and considering that real-world data naturally contain missing values, we did not perform extra data cleaning so as to better evaluate the model’s performance in realistic settings.
To ensure stability and robustness, five different random seeds are used for each missing rate, and the final results are reported as the average of the repeated experiments. All models were tested in five repeated trials to ensure their stability and the reproducibility of the experimental results.

Metrics. This paper mainly uses mean absolute error (MAE) [80] and mean absolute percentage error (MAPE) [81] as the main evaluation metrics.

4.3. Main Results

Table 2 and Table 3 present the main results of TFGCRN and all baselines. It can be observed that TFGCRN achieves satisfactory results across all datasets, demonstrating the superiority and practicality of TFGCRN. The following conclusions can be drawn:

Table 2. The main results of TFGCRN and all baselines on traffic speed datasets.

Table 3. The main results of TFGCRN and all baselines on traffic flow datasets.

The forecasting performance of two-stage models is significantly lower than that of end-to-end models, which verifies the value of the end-to-end forecasting framework in the field of incomplete traffic forecasting. The main reason is that two-stage models inevitably suffer from error accumulation, which in turn limits the full exploitation of spatial–temporal dependencies in forecasting models.
Graph-based end-to-end models achieve better experimental results mainly because graph convolution can effectively model the spatial relationships among different time series, thereby enhancing the modeling of spatiotemporal dependencies and ensuring accurate incomplete traffic forecasting.
The proposed TFGCRN achieves the best experimental results across all datasets. First, the spatial–temporal embedding mitigates the adverse effects of missing values by introducing additional information. Second, TFGCN leverages both the frequency graph and spatial attention graph to ensure accurate modeling of spatial relationships from global and local perspectives. Finally, by integrating TFGCN into the GRU, the model realizes recursive modeling of spatial–temporal dependencies. Therefore, TFGCRN demonstrates excellent performance in incomplete traffic forecasting tasks.

4.4. Ablation Experiment

Ablation experiments can fully evaluate the impact of different components on the results and further validate the motivation of this paper. Therefore, this paper conducts ablation experiments from three perspectives: (1) w/o FG: It represents the removal of the frequency graph, in which case the model mainly relies on the spatial attention graph and local information to construct spatial relationships. (2) w/o SAG: It represents the removal of the spatial attention graph, in which case the model mainly relies on the frequency graph and global information to construct spatial relationships. (3) w/o STE: It represents the removal of spatial–temporal embedding, in which case the model does not use additional spatial–temporal information to alleviate the adverse effects caused by missing values. Figure 2 shows the results of the ablation experiments on METR-LA and PEMS04 datasets. The following conclusions can be drawn:

Figure 2. The results of the ablation experiments. w/o FG represents the removal of the frequency graph. w/o SAG represents the removal of the spatial attention graph. w/o STE represents the removal of spatial–temporal embedding.

The frequency graph has a significant impact when the missing rate is high. This is mainly because, when the missing rate is large, global information helps the model better analyze spatial–temporal dependencies.
The spatial attention graph has a significant impact when the missing rate is low. This is mainly because, when the missing rate is small, the model can better model spatial dependencies by leveraging local information.
The spatial–temporal embedding is beneficial to the experimental results under different missing rates. This indirectly confirms that additional spatiotemporal knowledge helps prevent the model from forcing the use of zero values to mine spatial–temporal information, thereby ensuring forecasting accuracy.

The above experimental results demonstrate that spatial–temporal embeddings can significantly improve the model’s forecasting accuracy. Building on this, we further conduct ablation studies in this section. Specifically, we remove the temporal embeddings and the spatial embedding separately, and perform experiments on PEMS-BAY and PEMS08. In addition, the two proposed graph structures, both introduce extra graph embeddings designed to provide additional node identity information, and we also performed further ablation experiments on these components. Table 4 presents the ablation results. It can be observed that removing temporal embeddings leads to a larger performance drop under higher missing rates, mainly because they provide additional timestamp information for each time step, which becomes more beneficial when substantial information is missing. In addition, removing the graph embeddings also degrades the model’s performance, mainly because the node identity information they provide helps the model resist the impact of missing values on graph construction, thereby leading to better results.

Table 4. The ablation experiment of spatial–temporal embedding and node embedding.

4.5. Hyperparameter Analysis

The model’s hyperparameters can significantly affect its forecasting performance, and a detailed analysis of these hyperparameters helps us to understand the most important aspects and core functions of the model. This paper evaluates the influence of six hyperparameters on the results, including graph embedding size, number of TopK, embedding size, and number of layers. Table 5 presents the core hyperparameters of the TFGCRN model, while Figure 3 illustrates the experimental results under different hyperparameter settings (PEMS04 dataset). The following conclusions can be drawn:

Table 5. The values of the main hyperparameters of TGCRN.

Figure 3. The results of the hyperparameter analysis on the PEMS08 dataset.

The impact of the graph embedding size on the experimental results is minimal, primarily because TGCRN can generate relatively accurate graph structures with a small number of parameters, ensuring the accuracy of spatial relationships.
The number of TopK can be increased appropriately, as it allows for a more comprehensive reflection of spatial relationships.
The embedding size is an important hyperparameter and should not be too large. A larger embedding size may lead to overfitting, which in turn affects forecasting accuracy.
The number of layers plays a crucial role in forecasting performance. Too few layers may fail to adequately capture and analyze data characteristics, while too many layers can lead to overfitting and other issues.

4.6. Component Replacement Experiment

To further verify the importance of each component in TFGCRN, this section conducts component replacement experiments. Specifically, we replace certain components with other classic network structures to further evaluate the model’s performance. The experiments cover the following aspects: (1) Multi-head attention: It is used to replace the spatial attention graph structure. (2) Graph attention: It is used to replace the spatial attention graph structure. (3) DTW: It is used to replace the FFT in the frequency graph. (4) Single TFGCN: It uses a single TFGCN to model

h_{t}

and

c_{t - 1}

. (5) LSTM: It used to replace the GRU. (6) Two F_LN: It fuses the information from the two graph structures by first applying normalization and then performing element-wise addition. (7) Mask attention: It replaces the spatiotemporal embeddings with a mask matrix and replaces the spatial attention graph with masked self-attention. In other words, the model directly maps missing values into high-dimensional representations and masks out the positions of missing values when computing spatial relationships.

Table 6 presents the results of the component replacement experiments on PEMS-BAY and PEMS08. It can be observed that replacing the proposed components with other general modules leads to a clear performance drop for TFGCRN. Based on the experimental results, we can draw the following conclusions:

Table 6. The results of component replacement experiment.

Replacing the spatial attention graph with multi-head attention or graph attention both leads to a clear decline in prediction performance, which demonstrates the stability and effectiveness of incorporating node embeddings, node identity information, and attention-based graph structures.
Replacing the FFN with DTW does not significantly improve prediction performance, mainly because DTW focuses on local pattern alignment, whereas FFT emphasizes global analysis. In the context of incomplete traffic time series, modeling spatial relationships from a global perspective is more crucial.
LSTM performs worse than GRU, mainly because a lighter GRU structure can better avoid the noise introduced by missing values along the temporal dimension, thereby improving prediction performance.
The strategy of fusing graph structures by stacking them before layer normalization yields better prediction results, mainly because this approach preserves the differences between the graph structures and thus provides more comprehensive spatial information.
Using two separate TFGCNs to model $h_{t}$ and $c_{t - 1}$ leads to better experimental results, mainly because applying graph convolutions with different parameters to these two distinct representations helps capture more complete spatial relationships.
The performance of masked attention is not satisfactory. On the one hand, removing spatiotemporal embeddings leads to a significant drop in prediction accuracy; on the other hand, masking spatial relationships may cause the model to overlook important information propagation pathways, thereby limiting its predictive performance.

5. Discussion

This paper proposes the Temporal–Frequency Graph Convolutional Recurrent Network for incomplete traffic forecasting. On one hand, the model introduces spatial–temporal embeddings to mitigate the impact of missing values. On the other hand, by incorporating the proposed TFGCN into the GRU, it recursively models spatial–temporal dependencies from both global and local perspectives, thereby achieving accurate incomplete traffic forecasting. This paper will conduct discussions and analyses from the following aspects.

Performance. Compared with mainstream end-to-end forecasting models and two-stage forecasting models, TFGCRN achieves the most satisfactory experimental results across all datasets and missing rates. In addition, ablation studies demonstrate the effectiveness of our proposed TFGCN module. These experimental results collectively verify the superiority of TFGCRN in the field of incomplete traffic forecasting.

Application. From a practical standpoint, missing values are widespread in real-world traffic data. Existing models are often affected by these missing values, leading to inaccurate modeling of spatial–temporal dependencies, which in turn limits their forecasting performance. The proposed TFGCRN leverages the advantages of both the spatial attention graph and the frequency graph to reconstruct spatial relationships from both global and local perspectives. This design ensures the model’s robustness against missing values, thereby demonstrating excellent practicality and application value.

Limitations and Future Work. Although the experimental results collectively demonstrate the performance of TFGCRN, several limitations remain. This work primarily uses spatial–temporal encoding to assist in filling missing values, without considering other important factors such as weather conditions and traffic situations. In the future, we plan to use these additional factors to further enhance the modeling process. In addition, spatial–temporal graph neural networks typically have high computational complexity and therefore require appropriate optimization. The experiments in this paper are mainly conducted on existing datasets. In future studies, we intend to integrate the model into real-world big data platforms to further improve its efficiency and explore its performance and robustness in practical scenarios [83].

6. Conclusions

In this study, we propose a model called the Temporal–Frequency Graph Convolutional Recurrent Network (TFGCRN) to address the negative impact of incomplete traffic data caused by sensor failures on spatial–temporal dependency modeling. By embedding the Temporal–Frequency Graph Convolutional Network into gated recurrent units, TF-GCRN fully leverages both global and local information during the recursive modeling process, mitigating the adverse effects of missing values and generating more accurate spatial relationships, thereby improving the accuracy of incomplete traffic forecasting. Experimental results show that TF-GCRN outperforms several baseline models on four real-world datasets and effectively adapts to different missing rates, demonstrating its superior performance in traffic forecasting tasks. In addition, the ablation study demonstrates that the proposed spatiotemporal embeddings effectively help the model mitigate the interference caused by missing values. Building on this, TFGCN can capture more effective spatial correlations, thereby ensuring strong predictive performance. The component replacement experiment further demonstrates that the designed components can adapt to modeling incomplete traffic time series better than classical networks such as multi-head attention, thereby ensuring the model’s forecasting accuracy.

In the future, we will consider incorporating additional features, such as weather and holiday information, to further enhance forecasting performance. Furthermore, we plan to integrate the model into other big data analytics platforms to enable the practical application of our work.

Author Contributions

Conceptualization, T.F.; methodology, J.H. and T.F.; software, J.H.; validation, J.H. and T.F.; formal analysis, J.H. and T.F.; investigation, J.H. and T.F.; resources, J.H. and T.F.; data curation, J.H.; writing—original draft preparation, J.H.; writing—review and editing, T.F.; visualization, J.H. and T.F.; supervision, T.F.; project administration, T.F.; funding acquisition, T.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The dataset used in this paper consists of multiple public datasets, which can be found at the following link: https://github.com/liyaguang/DCRNN (accessed on 10 October 2025) and https://github.com/guoshnBJTU/ASTGNN/tree/main/data (accessed on 10 October 2025).

Acknowledgments

The authors gratefully acknowledge financial support from the Next-Generation Fellowship Program, Hiroshima University, Japan. In addition, the authors would like to express our gratitude to the open-source code and datasets available in the following GitHub links, which have supported our research: https://github.com/GestaltCogTeam/BasicTS (accessed on 10 October 2025) and https://github.com/ChengqingYu/Merlin (accessed on 10 October 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
ITS	Intelligent Transportation Systems
STGNN	Spatial–Temporal Graph Neural Network
GNN	Graph Neural Network
RNN	Recurrent Neural Network
GRU	Gated Recurrent Unit
MAE	Mean Absolute Error
MAPE	Mean Absolute Percentage Error

References

Yu, F.; Mi, X.; Yu, C.; Jiang, Y. Distributed Traffic Signal Control Model for Accurate Policy Learning Under Dynamic Traffic Flow: A Graph Forecast-State Vector Driven Deep Reinforcement Learning Framework. IEEE Trans. Intell. Transp. Syst. 2025, 26, 13573–13584. [Google Scholar] [CrossRef]
Zheng, L.; Feng, Y.; Wang, W.; Men, Q. Performance Analysis of Switch Buffer Management Policy for Mixed-Critical Traffic in Time-Sensitive Networks. Mathematics 2025, 13, 3443. [Google Scholar] [CrossRef]
Liu, H.; Zhang, X.-Y.; Yang, Y.-X.; Li, Y.-F.; Yu, C.-Q. Hourly traffic flow forecasting using a new hybrid modelling method. J. Cent. South Univ. 2022, 29, 1389–1402. [Google Scholar] [CrossRef]
Shang, P.; Liu, X.; Yu, C.; Yan, G.; Xiang, Q.; Mi, X. A new ensemble deep graph reinforcement learning network for spatio-temporal traffic volume forecasting in a freeway network. Digit. Signal Process. 2022, 123, 103419. [Google Scholar] [CrossRef]
Ahmed, S.F.; Kuldeep, S.A.; Rafa, S.J.; Fazal, J.; Hoque, M.; Liu, G.; Gandomi, A.H. Enhancement of traffic forecasting through graph neural network-based information fusion techniques. Inf. Fusion 2024, 110, 102466. [Google Scholar] [CrossRef]
Zhang, J.; He, Q.; Lu, X.; Xiao, S.; Wang, N. A FIG-IWOA-BiGRU Model for Bus Passenger Flow Fluctuation Trend and Spatial Prediction. Mathematics 2025, 13, 3204. [Google Scholar] [CrossRef]
Ting, C.-C.; Wu, K.-T.; Lin, H.-T.C.; Lin, S. MixModel: A Hybrid TimesNet–Informer Architecture with 11-Dimensional Time Features for Enhanced Traffic Flow Forecasting. Mathematics 2025, 13, 3191. [Google Scholar] [CrossRef]
Li, X.; Xian, K.; Wen, H.; Bai, S.; Xu, H.; Yu, Y. PathGen-LLM: A Large Language Model for Dynamic Path Generation in Complex Transportation Networks. Mathematics 2025, 13, 3073. [Google Scholar] [CrossRef]
Shao, Z.; Qian, T.; Sun, T.; Wang, F.; Xu, Y. Spatial-temporal large models: A super hub linking multiple scientific areas with artificial intelligence. Innovation 2025, 6, 100763. [Google Scholar] [CrossRef]
Shao, Z.; Zhang, Z.; Wei, W.; Wang, F.; Xu, Y.; Cao, X.; Jensen, C.S. Decoupled dynamic spatial-temporal graph neural network for traffic forecasting. Proc. VLDB Endow. 2022, 15, 2733–2746. [Google Scholar] [CrossRef]
Cheng, F.; Liu, H. Charging strategies optimization for lithium-ion battery: Heterogeneous ensemble surrogate model-assisted advanced multi-objective optimization algorithm. Energy Convers. Manag. 2025, 342, 120170. [Google Scholar] [CrossRef]
Fu, Y.; Shao, Z.; Yu, C.; Li, Y.; An, Z.; Wang, Q.; Xu, Y.; Wang, F. Selective Learning for Deep Time Series Forecasting. arXiv 2025, arXiv:2510.25207. [Google Scholar] [CrossRef]
Shao, Z.; Zhang, Z.; Wang, F.; Xu, Y. Pre-training Enhanced Spatial-temporal Graph Neural Network for Multivariate Time Series Forecasting. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; pp. 1567–1577. [Google Scholar]
Chengqing, Y.; Guangxi, Y.; Chengming, Y.; Yu, Z.; Xiwei, M. A multi-factor driven spatiotemporal wind power prediction model based on ensemble deep graph attention reinforcement learning networks. Energy 2023, 263, 126034. [Google Scholar] [CrossRef]
Dhanasekaran, S.; Gopal, D.; Logeshwaran, J.; Ramya, N.; Salau, A.O. Multi-Model Traffic Forecasting in Smart Cities using Graph Neural Networks and Transformer-based Multi-Source Visual Fusion for Intelligent Transportation Management. Int. J. Intell. Transp. Syst. Res. 2024, 22, 518–541. [Google Scholar] [CrossRef]
Liang, K.; Meng, L.; Li, H.; Wang, J.; Lan, L.; Li, M.; Liu, X.; Wang, H. From Concrete to Abstract: Multi-View Clustering on Relational Knowledge. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 9043–9060. [Google Scholar] [CrossRef]
Yu, C.; Wang, F.; Shao, Z.; Sun, T.; Wu, L.; Xu, Y. DSformer: A Double Sampling Transformer for Multivariate Time Series Long-term Prediction. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023; pp. 3062–3072. [Google Scholar]
Afandizadeh, S.; Abdolahi, S.; Mirzahossein, H. Deep learning algorithms for traffic forecasting: A comprehensive review and comparison with classical ones. J. Adv. Transp. 2024, 2024, 9981657. [Google Scholar] [CrossRef]
Yu, C.; Yan, G.; Yu, C.; Mi, X. Attention mechanism is useful in spatio-temporal wind speed prediction: Evidence from China. Appl. Soft Comput. 2023, 148, 110864. [Google Scholar] [CrossRef]
Ju, W.; Zhao, Y.; Qin, Y.; Yi, S.; Yuan, J.; Xiao, Z.; Luo, X.; Yan, X.; Zhang, M. COOL: A Conjoint Perspective on Spatio-Temporal Graph Neural Network for Traffic Forecasting. Inf. Fusion 2024, 107, 102341. [Google Scholar] [CrossRef]
Khan, S.; Alghayadh, F.Y.; Ahanger, T.A.; Soni, M.; Viriyasitavat, W.; Berdieva, U.; Byeon, H. Deep learning model for efficient traffic forecasting in intelligent transportation systems. Neural Comput. Appl. 2025, 37, 14673–14686. [Google Scholar] [CrossRef]
Zhong, W.; Suo, Q.; Jia, X.; Zhang, A.; Su, L. Heterogeneous Spatio-Temporal Graph Convolution Network for Traffic Forecasting with Missing Values. In Proceedings of the 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS), Washington, DC, USA, 7–10 July 2021; pp. 707–717. [Google Scholar]
Deng, J.; Chen, X.; Jiang, R.; Du, Y.; Yang, Y.; Song, X.; Tsang, I.W. Disentangling Structured Components: Towards Adaptive, Interpretable and Scalable Time Series Forecasting. IEEE Trans. Knowl. Data Eng. 2024, 36, 3783–3800. [Google Scholar] [CrossRef]
Chauhan, J.; Raghuveer, A.; Saket, R.; Nandy, J.; Ravindran, B. Multi-variate time series forecasting on variable subsets. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; pp. 76–86. [Google Scholar] [CrossRef]
Mi, X.; Yu, C.; Liu, X.; Yan, G.; Yu, F.; Shang, P. A dynamic ensemble deep deterministic policy gradient recursive network for spatiotemporal traffic speed forecasting in an urban road network. Digit. Signal Process. 2022, 129, 103643. [Google Scholar] [CrossRef]
Alkarim, A.S.; Al-Malaise Al-Ghamdi, A.S.; Ragab, M. Ensemble Learning-based Algorithms for Traffic Flow Prediction in Smart Traffic Systems. Eng. Technol. Appl. Sci. Res. 2024, 14, 13090–13094. [Google Scholar] [CrossRef]
Wang, Q.; Chen, J.; Song, Y.; Li, X.; Xu, W. Fusing visual quantified features for heterogeneous traffic flow prediction. Promet-Traffic Transp. 2024, 36, 1068–1077. [Google Scholar] [CrossRef]
Chen, J.; Zhang, S.; Xu, W. Scalable prediction of heterogeneous traffic flow with enhanced non-periodic feature modeling. Expert Syst. Appl. 2025, 294, 128847. [Google Scholar] [CrossRef]
Zhang, Y.; Kong, X.; Zhou, W.; Liu, J.; Fu, Y.; Shen, G. A Comprehensive Survey on Traffic Missing Data Imputation. IEEE Trans. Intell. Transp. Syst. 2024, 25, 19252–19275. [Google Scholar] [CrossRef]
Sharma, A.; Samon, T.; Vellandurai, A.; Kumar, V. TA-SAITS: Time Aware-Self Attention based Imputation of Time Series algorithm for Partially Observable Multi-Variate Time Series. In Proceedings of the 2023 International Conference on Machine Learning and Applications (ICMLA), Jacksonville, FL, USA, 15–17 December 2023; pp. 2228–2233. [Google Scholar]
Zhou, T.; Niu, P.; Sun, L.; Jin, R. One fits all: Power general time series analysis by pretrained lm. Adv. Neural Inf. Process. Syst. 2023, 36, 43322–43355. [Google Scholar]
Yang, X.; Sun, Y.; Chen, X. Frequency-aware generative models for multivariate time series imputation. Adv. Neural Inf. Process. Syst. 2024, 37, 52595–52623. [Google Scholar] [CrossRef]
Cini, A.; Marisca, I.; Alippi, C. Filling the G_ap_s: Multivariate Time Series Imputation by Graph Neural Networks. In Proceedings of the International Conference on Learning Representations, Virtual, 25 April 2022. [Google Scholar]
Marisca, I.; Cini, A.; Alippi, C. Learning to reconstruct missing data from spatiotemporal graphs with sparse observations. Adv. Neural Inf. Process. Syst. 2022, 35, 32069–32082. [Google Scholar]
Islam, M.R.U.; Tadepalli, P.; Fern, A. Self-attention-based Diffusion Model for Time-series Imputation in Partial Blackout Scenarios. Proc. AAAI Conf. Artif. Intell. 2025, 39, 17564–17572. [Google Scholar] [CrossRef]
Aouedi, O.; Le, V.A.; Piamrat, K.; Ji, Y. Deep Learning on Network Traffic Prediction: Recent Advances, Analysis, and Future Directions. ACM Comput. Surv. 2025, 57, 151. [Google Scholar] [CrossRef]
Fu, Y.; Wang, F.; Shao, Z.; Yu, C.; Li, Y.; Chen, Z.; An, Z.; Xu, Y. LightWeather: Harnessing Absolute Positional Encoding to Efficient and Scalable Global Weather Forecasting. arXiv 2024, arXiv:2408.09695. [Google Scholar] [CrossRef]
Zhou, S.; Zhang, J.; Pan, J.; Xie, H.; Zuo, W.; Ren, J. Spatio-temporal filter adaptive network for video deblurring. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2482–2491. [Google Scholar]
Wang, T.; Chen, J.; Lü, J.; Liu, K.; Zhu, A.; Snoussi, H.; Zhang, B. Synchronous Spatiotemporal Graph Transformer: A New Framework for Traffic Data Prediction. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 10589–10599. [Google Scholar] [CrossRef]
Liu, H.; Yu, C.; Wu, H.; Duan, Z.; Yan, G. A new hybrid ensemble deep reinforcement learning model for wind speed short term forecasting. Energy 2020, 202, 117794. [Google Scholar] [CrossRef]
Shao, Z.; Zhang, Z.; Wang, F.; Wei, W.; Xu, Y. Spatial-Temporal Identity: A Simple yet Effective Baseline for Multivariate Time Series Forecasting. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–22 October 2022; pp. 4454–4458. [Google Scholar]
Liu, H.; Dong, Z.; Jiang, R.; Deng, J.; Deng, J.; Chen, Q.; Song, X. Spatio-Temporal Adaptive Embedding Makes Vanilla Transformer SOTA for Traffic Forecasting. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023; pp. 4125–4129. [Google Scholar]
Yu, C.; Wang, F.; Wang, Y.; Shao, Z.; Sun, T.; Yao, D.; Xu, Y. MGSFformer: A Multi-Granularity Spatiotemporal Fusion Transformer for air quality prediction. Inf. Fusion 2025, 113, 102607. [Google Scholar] [CrossRef]
Zhou, W.; Shen, G.; Zhao, Z.; Deng, Z.; Tang, T.; Kong, X.; Tolba, A.; Alfarraj, O. A transformer-based approach for traffic prediction with fusion spatiotemporal attention. Knowl.-Based Syst. 2025, 329, 114466. [Google Scholar] [CrossRef]
Chen, J.; Ye, H.; Ying, Z.; Sun, Y.; Xu, W. Dynamic trend fusion module for traffic flow prediction. Appl. Soft Comput. 2025, 174, 112979. [Google Scholar] [CrossRef]
Liu, X.; Qin, M.; He, Y.; Mi, X.; Yu, C. A new multi-data-driven spatiotemporal PM2.5 forecasting model based on an ensemble graph reinforcement learning convolutional network. Atmos. Pollut. Res. 2021, 12, 101197. [Google Scholar] [CrossRef]
Liang, K.; Meng, L.; Li, H.; Liu, M.; Wang, S.; Zhou, S.; Liu, X.; He, K. MGKsite: Multi-Modal Knowledge-Driven Site Selection via Intra and Inter-Modal Graph Fusion. IEEE Trans. Multimed. 2025, 27, 1722–1735. [Google Scholar] [CrossRef]
Zhang, A. Dynamic graph convolutional networks with Temporal representation learning for traffic flow prediction. Sci. Rep. 2025, 15, 17270. [Google Scholar] [CrossRef] [PubMed]
Wang, P.; Feng, L.; Zhu, Y.; Wu, H. Hybrid spatial–temporal graph neural network for traffic forecasting. Inf. Fusion 2025, 118, 102978. [Google Scholar] [CrossRef]
Jiang, F.; Han, X.; Wen, S.; Tian, T. Spatiotemporal interactive learning dynamic adaptive graph convolutional network for traffic forecasting. Knowl.-Based Syst. 2025, 311, 113115. [Google Scholar] [CrossRef]
Liu, Y.; Rasouli, S.; Wong, M.; Feng, T.; Huang, T. RT-GCN: Gaussian-based spatiotemporal graph convolutional network for robust traffic prediction. Inf. Fusion 2024, 102, 102078. [Google Scholar] [CrossRef]
Shao, Z.; Wang, F.; Sun, T.; Yu, C.; Fang, Y.; Jin, G.; An, Z.; Liu, Y.; Qu, X.; Xu, Y. HUTFormer: Hierarchical U-Net transformer for long-term traffic forecasting. Commun. Transp. Res. 2025, 5, 100218. [Google Scholar] [CrossRef]
Yang, N.; Zhong, H.; Zhang, H.; Berry, R. Vision-LLMs for Spatiotemporal Traffic Forecasting. arXiv 2025, arXiv:2510.11282. [Google Scholar] [CrossRef]
Zuo, J.; Zeitouni, K.; Taher, Y.; Garcia-Rodriguez, S. Graph convolutional networks for traffic forecasting with missing values. Data Min. Knowl. Discov. 2023, 37, 913–947. [Google Scholar] [CrossRef]
Bikram, P.; Das, S.; Biswas, A. Dynamic attention aggregated missing spatial–temporal data imputation for traffic speed prediction. Neurocomputing 2024, 607, 128441. [Google Scholar] [CrossRef]
Yu, C.; Wang, F.; Yang, C.; Shao, Z.; Sun, T.; Qian, T.; Wei, W.; An, Z.; Xu, Y. Merlin: Multi-View Representation Learning for Robust Multivariate Time Series Forecasting with Unfixed Missing Rates. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2, Toronto, ON, Canada, 3–7 August 2025; pp. 3633–3644. [Google Scholar]
Marisca, I.; Alippi, C.; Bianchi, F.M. Graph-based Forecasting with Missing Data through Spatiotemporal Downsampling. In Proceedings of the 41st International Conference on Machine Learning, Proceedings of Machine Learning Research, Vienna, Austria, 21–27 July 2024; pp. 34846–34865. [Google Scholar]
Shao, Z.; Li, Y.; Wang, F.; Yu, C.; Fu, Y.; Qian, T.; Xu, B.; Diao, B.; Xu, Y.; Cheng, X. BLAST: Balanced Sampling Time Series Corpus for Universal Forecasting Models. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2, Toronto, ON, Canada, 3–7 August 2025; pp. 2502–2513. [Google Scholar]
Liang, Y.; Shao, Z.; Wang, F.; Zhang, Z.; Sun, T.; Xu, Y. BasicTS: An Open Source Fair Multivariate Time Series Prediction Benchmark. In Proceedings of the Benchmarking, Measuring, and Optimizing, Sanya, China, 3–5 December 2023; pp. 87–101. [Google Scholar]
Wang, F.; Li, Y.; Shao, Z.; Yu, C.; Fu, Y.; An, Z.; Xu, Y.; Cheng, X. ARIES: Relation Assessment and Model Recommendation for Deep Time Series Forecasting. arXiv 2025, arXiv:2509.06060. [Google Scholar] [CrossRef]
Chen, J.; Yang, L.; Yang, Y.; Peng, L.; Ge, X. Spatio-temporal graph neural networks for missing data completion in traffic prediction. Int. J. Geogr. Inf. Sci. 2025, 39, 1057–1075. [Google Scholar] [CrossRef]
Wang, Y.; Shao, Z.; Sun, T.; Yu, C.; Xu, Y.; Wang, F. Clustering-property matters: A cluster-aware network for large scale multivariate time series forecasting. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023; pp. 4340–4344. [Google Scholar]
Jin, G.; Wang, M.; Zhang, J.; Sha, H.; Huang, J. STGNN-TTE: Travel time estimation via spatial–temporal graph neural network. Future Gener. Comput. Syst. 2022, 126, 70–81. [Google Scholar] [CrossRef]
Yoon, C.; Yim, S.; Yoo, S.; Jung, C.; Yeon, H.; Jang, Y. V-DCRNN: Virtual Network-Based Diffusion Convolutional Recurrent Neural Network for Estimating Unobserved Traffic Data. IEEE Trans. Intell. Transp. Syst. 2025, 26, 10336–10352. [Google Scholar] [CrossRef]
Li, Y.; Shao, Z.; Xu, Y.; Qiu, Q.; Cao, Z.; Wang, F. Dynamic Frequency Domain Graph Convolutional Network for Traffic Forecasting. In Proceedings of the ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; pp. 5245–5249. [Google Scholar]
Nie, Y.; Nguyen, N.H.; Sinthong, P.; Kalagnanam, J. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. In Proceedings of the The Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
Du, W.; Côté, D.; Liu, Y. SAITS: Self-attention-based imputation for time series. Expert Syst. Appl. 2023, 219, 119619. [Google Scholar] [CrossRef]
Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv 2017, arXiv:1707.01926. [Google Scholar]
Yi, K.; Zhang, Q.; Fan, W.; He, H.; Hu, L.; Wang, P.; An, N.; Cao, L.; Niu, Z. FourierGNN: Rethinking multivariate time series forecasting from a pure graph perspective. Adv. Neural Inf. Process. Syst. 2023, 36, 69638–69660. [Google Scholar]
Chen, Y.; Wang, X.; Xu, G. Gatgpt: A pre-trained large language model with graph attention network for spatiotemporal imputation. arXiv 2023, arXiv:2311.14332. [Google Scholar] [CrossRef]
Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Chang, X.; Zhang, C. Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual, 6–10 July 2020; pp. 753–763. [Google Scholar]
Tang, X.; Yao, H.; Sun, Y.; Aggarwal, C.; Mitra, P.; Wang, S. Joint Modeling of Local and Global Temporal Dynamics for Multivariate Time Series Forecasting with Missing Values. Proc. AAAI Conf. Artif. Intell. 2020, 34, 5956–5963. [Google Scholar] [CrossRef]
Xu, Y.; Bazarjani, A.; Chi, H.-G.; Choi, C.; Fu, Y. Uncovering the missing pattern: Unified framework towards trajectory imputation and prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 9632–9643. [Google Scholar]
Wang, A.; Ye, Y.; Song, X.; Zhang, S.; Yu, J.J.Q. Traffic Prediction With Missing Data: A Multi-Task Learning Approach. IEEE Trans. Intell. Transp. Syst. 2023, 24, 4189–4202. [Google Scholar] [CrossRef]
Chen, X.; Li, X.; Liu, B.; Li, Z. Biased temporal convolution graph network for time series forecasting with missing values. In Proceedings of the The Twelfth International Conference on Learning Representations, Vienna, Austria, 7–11 May 2023. [Google Scholar]
Yu, C.; Wang, F.; Shao, Z.; Qian, T.; Zhang, Z.; Wei, W.; An, Z.; Wang, Q.; Xu, Y. GinAR+: A Robust End-to-End Framework for Multivariate Time Series Forecasting With Missing Values. IEEE Trans. Knowl. Data Eng. 2025, 37, 4635–4648. [Google Scholar] [CrossRef]
Shao, Z.; Wang, F.; Xu, Y.; Wei, W.; Yu, C.; Zhang, Z.; Yao, D.; Sun, T.; Jin, G.; Cao, X.; et al. Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity Analysis. IEEE Trans. Knowl. Data Eng. 2025, 37, 291–305. [Google Scholar] [CrossRef]
Yu, C.; Wang, F.; Shao, Z.; Qian, T.; Zhang, Z.; Wei, W.; Xu, Y. GinAR: An End-To-End Multivariate Time Series Forecasting Model Suitable for Variable Missing. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; pp. 3989–4000. [Google Scholar]
Mei, H.; Li, J.; Liang, Z.; Zheng, G.; Shi, B.; Wei, H. Uncertainty-aware Traffic Prediction under Missing Data. In Proceedings of the 2023 IEEE International Conference on Data Mining (ICDM), Shanghai, China, 1–4 December 2023; pp. 1223–1228. [Google Scholar]
Cheng, F.; Liu, H. Multi-step electric vehicles charging loads forecasting: An autoformer variant with feature extraction, frequency enhancement, and error correction blocks. Appl. Energy 2024, 376, 124308. [Google Scholar] [CrossRef]
Cheng, F.; Liu, H.; Lv, X. Lithium-ion batteries remaining useful life prediction via Fourier-mixed window attention enhanced Informer with decomposition and adaptive error correction strategy. Adv. Eng. Inform. 2025, 65, 103292. [Google Scholar] [CrossRef]
Cheng, F.; Liu, H.; Lv, X. MetaGNSDformer: Meta-learning enhanced gated non-stationary informer with frequency-aware attention for point-interval remaining useful life prediction of lithium-ion batteries. Adv. Eng. Inform. 2026, 69, 103798. [Google Scholar] [CrossRef]
Dritsas, E.; Trigka, M. Applying Machine Learning on Big Data With Apache Spark. IEEE Access 2025, 13, 53377–53393. [Google Scholar] [CrossRef]

Figure 1. Overall framework of TFGCRN. The inputs to the model are incomplete traffic time series, and the outputs of the model are the complete future values.

Figure 2. The results of the ablation experiments. w/o FG represents the removal of the frequency graph. w/o SAG represents the removal of the spatial attention graph. w/o STE represents the removal of spatial–temporal embedding.

Figure 3. The results of the hyperparameter analysis on the PEMS08 dataset.

Table 1. The basic statistical information of these four datasets.

Datasets	METR-LA	PEMS-BAY	PEMS04	PEMS08
Variates	207	325	307	170
Timesteps	34,272	52,116	16,992	17,833
Granularity	5 min	5 min	5 min	5 min

Table 2. The main results of TFGCRN and all baselines on traffic speed datasets.

Datasets	Models	Missing Rate 25%		Missing Rate 50%		Missing Rate 75%
Datasets	Models	MAE	MAPE	MAE	MAPE	MAE	MAPE
METR-LA	PatchTST + SAITS	3.83	10.95	3.94	11.23	4.08	11.94
	DCRNN + GPT4TS	3.68	10.24	3.81	10.84	3.96	11.25
	FourierGNN + GATGPT	3.70	10.31	3.84	10.91	3.97	11.44
	MTGNN + GRIN	3.63	10.15	3.75	10.79	3.91	11.24
	LGnet	3.76	10.78	3.88	11.15	4.01	11.71
	GC-VRNN	3.62	9.97	3.71	10.48	3.85	11.14
	RIHGCN	3.57	9.94	3.66	10.41	3.77	10.87
	GSTAE	3.55	9.92	3.67	10.39	3.79	10.92
	BiTGraph	3.54	9.88	3.62	10.36	3.71	10.73
	GinAR+	3.51	9.83	3.59	10.23	3.67	10.58
	TFGCRN	3.44	9.73	3.56	9.92	3.65	10.51
PEMS-BAY	PatchTST + SAITS	2.34	5.45	2.49	5.81	2.81	7.06
	DCRNN + GPT4TS	2.23	5.24	2.37	5.61	2.59	6.28
	FourierGNN + GATGPT	2.27	5.28	2.40	5.68	2.64	6.35
	MTGNN + GRIN	2.21	5.23	2.35	5.63	2.58	6.25
	LGnet	2.31	5.39	2.45	5.71	2.71	6.44
	GC-VRNN	2.17	5.17	2.31	5.52	2.55	6.11
	RIHGCN	2.07	4.92	2.35	5.67	2.62	6.32
	GSTAE	2.12	5.08	2.34	5.61	2.59	6.27
	BiTGraph	2.08	4.96	2.30	5.43	2.51	6.07
	GinAR+	2.05	4.85	2.25	5.37	2.49	5.94
	TFGCRN	2.01	4.82	2.19	5.28	2.36	5.88

Table 3. The main results of TFGCRN and all baselines on traffic flow datasets.

Datasets	Models	Missing Rate 25%		Missing Rate 50%		Missing Rate 75%
Datasets	Models	MAE	MAPE	MAE	MAPE	MAE	MAPE
PEMS04	PatchTST + SAITS	24.37	17.45	26.04	18.17	27.23	18.67
	DCRNN + GPT4TS	23.17	17.04	25.38	17.41	26.84	18.45
	FourierGNN + GATGPT	22.98	16.61	25.31	17.55	26.67	18.30
	MTGNN + GRIN	23.06	16.64	25.32	17.38	26.73	18.22
	LGnet	23.73	17.22	25.59	17.64	27.04	18.53
	GC-VRNN	22.81	16.52	24.33	17.13	26.43	17.59
	RIHGCN	23.45	17.06	25.17	17.38	26.75	18.02
	GSTAE	22.67	16.48	24.27	17.04	26.07	17.51
	BiTGraph	22.49	16.31	23.73	16.92	25.98	17.45
	GinAR+	22.32	16.05	23.41	16.86	25.72	17.26
	TFGCRN	21.58	15.47	22.76	16.54	24.38	17.15
PEMS08	PatchTST + SAITS	21.43	13.79	22.18	14.25	25.06	16.42
	DCRNN + GPT4TS	20.64	13.56	21.96	14.03	24.58	15.79
	FourierGNN + GATGPT	20.77	13.45	21.91	14.08	24.89	15.85
	MTGNN + GRIN	20.59	13.34	21.78	13.98	24.51	15.68
	LGnet	21.51	13.85	22.14	14.31	24.95	16.26
	GC-VRNN	20.42	13.21	21.67	13.91	23.45	15.06
	RIHGCN	20.53	13.31	21.71	14.08	23.51	15.34
	GSTAE	20.62	13.39	21.83	14.12	23.79	15.63
	BiTGraph	20.21	13.04	21.59	13.82	23.06	14.79
	GinAR+	20.03	12.97	21.55	13.75	22.93	14.52
	TFGCRN	19.79	12.75	21.04	13.52	22.33	14.38

Table 4. The ablation experiment of spatial–temporal embedding and node embedding.

Datasets	Models	Missing Rate 25%	Missing Rate 50%	Missing Rate 75%
PEMS-BAY	w/o STE	2.23	2.46	2.62
	w/o E_N	2.07	2.24	2.43
	w/o temporal embedding	2.09	2.31	2.54
	w/o spatial embedding	2.12	2.28	2.47
	TFGCRN	2.01	2.19	2.36
PEMS08	w/o STE	22.17	23.42	25.08
	w/o E_N	20.16	21.77	23.35
	w/o temporal embedding	21.36	22.65	24.02
	w/o spatial embedding	21.54	22.52	23.47
	TFGCRN	19.79	21.04	22.33

Table 5. The values of the main hyperparameters of TGCRN.

Config	Values
optimizer	Adam [82]
learning rate	0.002
weight decay	0.0001
embedding size	64
graph embedding size	64
number of layers	3
TopK	24
dropout	0.15
learning rate schedule	MultiStepLR
clip gradient normalization	5
milestone	[1, 5, 25, 50, 75, 100]
discount coefficient	0.5
batch size	32
epoch	101

Table 6. The results of component replacement experiment.

Datasets	Models	Missing Rate 25%	Missing Rate 50%	Missing Rate 75%
PEMS-BAY	Multi-head attention	2.15	2.34	2.54
	Mask attention	2.19	2.41	2.63
	Graph attention	2.13	2.33	2.52
	DTW	2.04	2.22	2.41
	Two F_LN	2.07	2.25	2.45
	Single TFGCN	2.08	2.27	2.46
	LSTM	2.05	2.24	2.42
	TFGCRN	2.01	2.19	2.36
PEMS08	Multi-head attention	21.21	22.65	23.92
	Mask attention	21.57	23.06	24.31
	Graph attention	21.03	22.34	23.66
	DTW	20.17	21.56	22.85
	Two F_LN	20.11	21.62	22.90
	Single TFGCN	20.58	22.01	23.37
	LSTM	20.04	21.43	22.79
	TFGCRN	19.79	21.04	22.33

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Article metric data becomes available approximately 24 hours after publication online.