Next Article in Journal
A Vulnerability Taxonomy for Tor-Based Hidden Services: Toward a De-Anonymization Framework for Cybercrime Investigation
Previous Article in Journal
Stochastic Optimal Energy Management of a Shore-Side Renewable Hydrogen Supply System for Hydrogen-Based Marine Vessels
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dynamic Graph Construction and Continuous Spatiotemporal Evolution for Traffic Forecasting

1
School of Electronic Information Engineering, Changchun University of Science and Technology, Satellite Street, Changchun 130022, China
2
Changchun Institute of Technology, 395 Kuanping Road, Changchun 130012, China
*
Author to whom correspondence should be addressed.
Electronics 2026, 15(11), 2369; https://doi.org/10.3390/electronics15112369
Submission received: 10 April 2026 / Revised: 15 May 2026 / Accepted: 18 May 2026 / Published: 31 May 2026
(This article belongs to the Section Artificial Intelligence)

Abstract

Traffic prediction is a fundamental task in intelligent transportation systems, yet developing accurate prediction models remains challenging because of the complex spatial and temporal dependencies in real road networks. Existing methods commonly rely on discrete modeling paradigms to characterize spatiotemporal features. However, these approaches often fail to adequately capture the intrinsic spatiotemporal coupling among nodes and mainly depend on static adjacency matrices constructed from prior knowledge, which limits their ability to represent dynamic spatiotemporal correlations in real traffic scenarios. To address these limitations, this paper proposes a dynamic prediction model using continuous ordinary differential equations termed DPMCODE. The proposed method enables collaborative aggregation of global and local information through continuous neural ordinary differential equations and dynamically learns spatiotemporal dependencies via graph ODE networks for traffic prediction. Specifically, a continuous ordinary differential equation modeling strategy is introduced to alleviate the over-smoothing problem in discrete networks. Meanwhile, an adaptive dynamic graph structure is designed to reduce the reliance on prior knowledge graphs and capture richer latent spatiotemporal correlations. In addition, a local correlation-aware ODE module is developed to model potential dependencies between non-adjacent nodes, while a spatiotemporal fusion prediction module is further designed to promote effective collaboration between global and local information. Compared with conventional discrete network models, the proposed model generates more realistic and accurate predictions. Extensive experiments and theoretical analysis on five benchmark traffic prediction datasets demonstrate the superiority and state-of-the-art performance of DPMCODE.

1. Introduction

Spatiotemporal prediction is a fundamental task in intelligent transportation systems and plays an important role in traffic control, route planning, and urban operation scheduling [1]. Among these tasks, traffic forecasting is particularly important [2]. However, it remains challenging to build an accurate model because traffic states evolve over both time and space in real road networks and are influenced by topology [3], dynamic propagation [4], and regional interactions [5]. As shown in Figure 1, a real road network can be naturally represented as a spatiotemporal graph, where sensor nodes are connected by physical topology and also form continuously evolving spatiotemporal correlations over time. Therefore, traffic forecasting is not simply a univariate time-series extrapolation task but a problem of modeling spatiotemporal dependencies in complex road networks.
The core challenge of traffic forecasting lies in how to accurately characterize real spatiotemporal dependencies over road networks [6]. On the one hand, the relationships among traffic nodes are not fixed. Many existing methods construct adjacency matrices based on geographical distance [7], road connectivity [8], or semantic similarity [9], and use them as prior structures for graph propagation [10]. However, such predefined graph structures often fail to reflect latent and time-varying correlations among nodes in real traffic scenarios [11]. On the other hand, the propagation of traffic information is not determined only by global diffusion [12]. As shown in Figure 2, when an emergency or abnormal disturbance occurs in a local region, its influence may go beyond directly adjacent roads and further affect distant but strongly correlated areas. This indicates that an effective traffic forecasting model should capture long-range global propagation and also model local latent dependencies between non-adjacent nodes. In addition, most existing graph learning methods rely on discrete message passing. As the network goes deeper, node representations tend to degrade or become over-smoothed, which weakens the expressive ability of the model in complex traffic environments.
To address these issues, numerous studies have focused on spatiotemporal traffic forecasting in recent years [13]. Early statistical learning methods [14] and traditional machine learning approaches [15] laid the foundation for traffic modeling, but their ability to characterize complex nonlinear spatiotemporal dependencies remains limited. In recent years, deep learning-based traffic forecasting methods, especially those integrated with graph neural networks, have made substantial progress in modeling road topology and temporal dynamics [16]. Existing studies have substantially improved traffic forecasting performance by incorporating graph convolution [17], recurrent networks [18], temporal convolution [19], and attention mechanisms [20]. However, most existing methods still rely on predefined static graph structures and discrete propagation mechanisms, which limits their ability to characterize dynamic node relationships and continuous spatiotemporal evolution [21]. Meanwhile, recent continuous-time graph learning models have partly alleviated the representation degradation caused by deep discrete propagation [22]. However, most of these models are still built on fixed graph structures and remain insufficient for jointly modeling global diffusion and local non-adjacent interactions.
Based on the above observations, this paper proposes a dynamic prediction model using continuous ordinary differential equations, termed DPMCODE, for spatiotemporal forecasting in complex traffic scenarios. Unlike conventional methods that rely on static prior graphs, DPMCODE constructs an adaptive dynamic graph structure to capture latent time-varying correlations among nodes more flexibly. Furthermore, a continuous ODE propagation strategy is introduced to improve the continuity of spatiotemporal representation learning and alleviate the over-smoothing problem caused by stacked discrete propagation. On this basis, a local correlation-aware module is designed to model latent dependencies between non-adjacent nodes. In addition, a spatiotemporal fusion prediction module is developed to facilitate effective coordination between global and local spatiotemporal information. Through these designs, DPMCODE provides a more realistic and effective solution for modeling dynamic traffic evolution in complex road networks.
The main contributions of this paper are summarized as follows:
1.
A dynamic prediction model based on continuous ordinary differential equations is proposed to effectively alleviate the over-smoothing problem in discrete graph propagation and improve the continuity and expressive ability of traffic dependency modeling.
2.
An adaptive dynamic graph construction strategy is designed to replace traditional predefined static adjacency structures, so as to more effectively capture latent and time-varying correlations among road nodes.
3.
A local correlation-aware module and a spatiotemporal fusion prediction module are developed to jointly model local dependencies between non-adjacent nodes and global spatiotemporal interactions, thereby improving the overall prediction accuracy of the model.
4.
Extensive experiments on five real-world traffic benchmark datasets show that the proposed DPMCODE outperforms existing comparison methods in overall performance and demonstrates strong effectiveness and competitiveness.

2. Related Work

2.1. Static Graph-Based Traffic Forecasting Methods

Early traffic forecasting studies were mainly dominated by traditional statistical models and sequential deep learning architectures [23]. For example, ARIMA [24] captures temporal patterns in traffic series through autoregressive and differencing operations, while FC-LSTM [25] improves the modeling of nonlinear sequential dependencies through recurrent memory mechanisms. Although these methods have achieved some success in temporal forecasting, they usually treat traffic data as regular sequences and have difficulty explicitly characterizing the non-Euclidean spatial dependencies inherent in road networks. For this reason, traffic forecasting methods based on graphs have gradually emerged, among which models based on static graphs have become an important research branch. For example, STGCN [26] integrates temporal convolution and graph convolution into a unified framework and shows that graph operations can effectively capture spatial interactions in traffic networks. STSGCN [27] further constructs local spatiotemporal subgraphs to jointly model spatial and temporal dependencies, thereby improving the representation of short-term traffic evolution. GraphWaveNet [28] introduces spatial and temporal attention mechanisms to adaptively model correlations among different sensors and achieves favorable forecasting performance with an encoder–decoder architecture. In addition, STFGNN [29] uses dynamic time warping to construct a semantic adjacency matrix, which compensates for the incompleteness of physical road connectivity and enhances the learning of hidden spatiotemporal relationships. These studies clearly show that beyond temporal dependency learning, spatial modeling with graph structures plays an important role in traffic forecasting.
However, despite their promising performance, most of these methods are essentially built upon predefined static adjacency matrices. Such graph structures usually remain unchanged during both training and inference, making it difficult to faithfully reflect latent and time-varying interactions among nodes in complex traffic scenarios. In addition, although some methods enhance graph structures through semantic similarity or attention mechanisms, the learned spatial dependencies are still largely constrained by prior graph construction strategies. Therefore, traffic forecasting methods based on static graphs still have clear limitations in characterizing dynamic propagation patterns and hidden correlations in complex traffic systems.

2.2. Dynamic Graph-Based Traffic Forecasting Methods

To overcome the limitations of static graph construction, more and more studies in recent years have turned to dynamic graph-based traffic forecasting methods, where node relationships are allowed to change with traffic states [30], learned representations [31], or adaptive graph generation mechanisms [32]. This research direction is important because interactions among nodes in real traffic systems are inherently dynamic, and the dependency between two roads can change with time and scenarios. Among these studies, DCRNN [33] is one of the representative methods. It models traffic flow as a diffusion process on a directed graph and captures directional propagation patterns through forward and backward random walks, demonstrating the importance of graph diffusion mechanisms in spatiotemporal forecasting. DCCRN [34] further learns graph structures adaptively from data and provides a more flexible graph learning framework for multivariate time-series forecasting. DSTCGCN [35] constructs a cross-graph structure with dynamic spatiotemporal characteristics, which improves dependency modeling while reducing computational complexity. ST-DGDE [36] captures the temporal evolution of spatial nodes by integrating a dynamic graph learning network with differential equations. These studies indicate that introducing adaptive or dynamically evolving graph structures can substantially enhance the ability of traffic forecasting models to characterize real node interactions. However, most of these methods are still built upon discrete graph neural networks or recurrent neural network frameworks, where node relationships are usually updated through discrete time steps or hierarchical layer stacking. Although this strategy improves dynamic graph modeling, it may still suffer from representation homogenization and over-smoothing during deep propagation. Moreover, their dynamic graphs mainly serve global spatial dependency learning, while the explicit modeling of latent local relationships among non-adjacent nodes remains insufficient.
Meanwhile, another important research direction is to introduce continuous-time dynamics into the graph learning process. Neural ordinary differential equations provide a continuous modeling formulation for hidden state evolution, thereby avoiding the explicit stacking of discrete network layers. Building on this foundation, STGODE [37] introduces continuous graph propagation into traffic forecasting and improves the extraction of long-range spatiotemporal correlations. MTGODE [38] further develops a continuous-time aggregation mechanism to learn more expressive spatial and temporal dynamics. These studies suggest that continuous representation learning provides an effective way to overcome the limitations of deep discrete graph propagation and opens up a new direction for spatiotemporal traffic forecasting. However, these ODE-based methods mainly focus on continuous graph propagation itself, while their graph structures still largely rely on predefined or relatively fixed spatial relationships. As a result, they provide limited characterization of traffic state-driven dynamic graph generation. In addition, these methods place greater emphasis on global graph propagation while lacking the explicit modeling of latent interactions among local non-adjacent yet highly correlated nodes [39].
In summary, although existing studies have made significant progress in graph structure modeling, continuous-time representation learning, and spatiotemporal dependency characterization, they still find it difficult to solve several key problems within a unified framework. First, discrete graph neural networks tend to suffer from over-smoothing during stacked propagation, which limits their ability to represent complex spatiotemporal heterogeneity. Second, existing methods are still insufficient in modeling continuous spatiotemporal evolution in traffic systems and thus cannot faithfully reflect the dynamic changes of traffic states. Third, most studies have difficulty jointly characterizing the combined effect of global propagation mechanisms and local non-adjacent interactions and thus remain inadequate in mining latent correlations in complex traffic scenarios. To this end, this paper proposes a dynamic prediction model based on continuous ordinary differential equations to achieve the more effective modeling of dynamic spatiotemporal dependencies in complex traffic systems. The above limitations motivate the present study. In contrast, the proposed DPMCODE does not simply stack dynamic graph learning with an ODE module. Instead, it provides a unified design from three perspectives: dynamic relation construction, continuous state evolution, and global local collaborative forecasting. First, DPMCODE extracts latent relational features from current traffic states through a local correlation-aware dynamic relation module and generates probabilistic dynamic graph structures using the Gumbel softmax reparameterization mechanism, enabling node relationships to be adaptively updated with changing traffic states. Second, DPMCODE embeds dynamic graph propagation into a continuous ODE evolution framework to characterize the dynamic changes of traffic states in a continuous hidden space, thereby mitigating the representation over-smoothing caused by discrete hierarchical stacking. Finally, DPMCODE introduces a spatiotemporal fusion prediction module to jointly model global propagation information and local correlations among non-adjacent nodes, thereby enhancing the model capacity to represent implicit dependencies in complex traffic scenarios.
To more clearly highlight the differences between DPMCODE and related methods, Table 1 compares them in terms of graph structure type, dynamic graph updating strategy, continuous modeling mechanism, and local non-adjacent relation modeling.

3. Problem Formulation

A real road network can be represented as a graph structure G = ( V , E , A ) , where V denotes the set of traffic sensor nodes, E denotes the set of edges between nodes, and A R N × N denotes the corresponding adjacency matrix, with V = N . In traffic forecasting tasks, the graph structure is used to characterize the spatial topological relationships among road nodes and provides a basic representation for subsequent spatiotemporal dependency modeling. Based on the above graph representation, the traffic forecasting task can be further formulated as a signal modeling problem on a spatiotemporal graph, where each node is associated with a traffic state sequence that changes over time.
Problem 1: The over-smoothing problem
To further explain how the continuous ODE propagation mechanism alleviates the over-smoothing problem, we analyze the difference between discrete graph propagation and continuous state evolution. For conventional discrete GCNs, multilayer graph propagation can be formulated as
H ( K ) = A ^ K X W
where A ^ denotes the normalized adjacency matrix, K is the number of graph propagation layers, and X and W denote the input features and learnable parameters, respectively. As shown in this formulation, when K increases, node representations are repeatedly smoothed by the same adjacency matrix. Repeated multiplication by A ^ gradually drives node representations toward neighborhood averaged states, reducing the representation discrepancy among nodes. This phenomenon corresponds to the over-smoothing problem in graph neural networks, which is mainly caused by the repeated stacking of neighborhood aggregation operations in discrete graph propagation.
Different from discrete layer-wise propagation, DPMCODE models graph propagation as a continuous hidden state evolution process. The continuous ODE propagation can be formulated as
d H ( t ) d t = f θ H ( t ) , A P m
where H ( t ) denotes the node hidden state at continuous time t, A P m is the probabilistic dynamic graph structure generated by the local correlation-aware module, and f θ ( · ) denotes the continuous state derivative parameterized by the graph propagation function. The corresponding integral solution is given by:
H ( T ) = H ( 0 ) + 0 T f θ H ( t ) , A P m d t
This formulation shows that ODE propagation does not update node representations by explicitly stacking multiple discrete graph convolutional layers. Instead, it updates node states through continuous integration in the hidden space. Therefore, the evolution of node representations is governed by a continuous dynamical system rather than repeated smoothing caused by high-order powers of a fixed adjacency matrix. Compared with the discrete propagation form A ^ K in GCNs, ODE propagation performs state updates over a finite continuous-time interval, which weakens the repeated neighborhood averaging effect introduced by deep discrete stacking.
Moreover, in DPMCODE, the propagation matrix A P m is adaptively generated from current traffic states rather than being fully determined by static road topology. Thus, node states can be updated according to dynamic traffic relations during continuous evolution, enabling the model to capture global propagation trends while preserving local differences. In this sense, the continuous ODE propagation mechanism alleviates the risk of representation homogenization by avoiding the mechanical stacking of fixed graph convolutional layers and incorporating state-driven dynamic graph updates. This provides a theoretical explanation for its ability to mitigate the over-smoothing problem.
Problem 2: Spatiotemporal forecasting
Let X R N × C × T denote the spatiotemporal graph signal tensor in the traffic system, where N denotes the number of nodes, C denotes the feature dimension of each node, and T denotes the length of historical time steps. For a given time step t, the historical traffic observation sequence can be written as
X H = X t S + 1 , X t S + 2 , , X t
where S denotes the length of the historical observation window and X H contains the historical spatiotemporal information used to predict future traffic states.
The goal of traffic forecasting is to learn a mapping function F ( · ) , which uses the historical traffic observation sequence X H to predict the traffic state sequence in future time periods:
Y = X t + 1 , X t + 2 , , X t + S
where S denotes the length of the prediction horizon.
Accordingly, the traffic forecasting task can be formulated as
Y ^ = F θ ( X H )
where F θ ( · ) denotes the learnable forecasting model parameterized by θ and Y ^ denotes the predicted future traffic state sequence.
The core of this task is to jointly model the spatial dependencies in road networks and the dynamic patterns of traffic states over time, so as to achieve accurate prediction of future traffic states.

4. Model

As shown in Figure 3, the proposed model mainly consists of the local correlation-aware dynamic relation module, the coupled continuous spatiotemporal evolution block, and the spatiotemporal fusion prediction module. These three components jointly accomplish dynamic relation modeling, spatiotemporal state evolution, and final prediction output.

4.1. Local Correlation-Aware Dynamic Relation Module

Most existing graph-based traffic forecasting methods characterize spatial dependencies among nodes mainly based on predefined road topology or globally shared propagation patterns. Although these methods can describe coarse-grained interactions in road networks to some extent, they remain inadequate in modeling fine-grained, time-varying, and latent associations among non-adjacent nodes in real traffic systems. In complex traffic scenarios, the influence of local disturbances is often not limited to directly connected roads but may further propagate to regions that are spatially non-adjacent yet functionally highly correlated. This indicates that correlations among nodes cannot be fully determined by static adjacency relationships but should be adaptively characterized according to the dynamic changes of traffic states.
Based on the above considerations, this study designs a local correlation-aware dynamic relation module to explicitly extract latent local relational features from traffic states and further transform them into dynamic graph structure signals. Unlike conventional attention mechanisms that are mainly used for feature weighting and information aggregation, the core function of this module is to identify latent node interaction patterns under current traffic states, especially to discover implicit dependencies among non-adjacent but functionally related nodes. Specifically, given an input traffic sequence, the model first obtains node hidden representations through a temporal embedding layer. Let X t R N × C denote the traffic observation state at time step t, where N is the number of traffic nodes and C is the input feature dimension. The node hidden representation can be formulated as
H t = ϕ ( X t )
where ϕ ( · ) denotes the TCN-based temporal embedding mapping function. H t R N × d represents the node state representation at time step t, and d is the hidden feature dimension. This process maps raw traffic observations into a unified hidden feature space, providing the basic representations for subsequent local relation extraction.
Subsequently, to extract local relational features, the node hidden representation is further projected into a relation-aware feature space Q t :
Q t = f q ( H t ) = σ H t W q + b q
where W q and b q are learnable parameters, σ ( · ) denotes the sigmoid nonlinear activation function, and Q t R N × d represents the local relational features at time step t. Different from the original traffic state features, Q t focuses more on describing the relational tendencies of nodes participating in local interactions under the current traffic state, and thus serves as the basis for subsequently computing latent correlations among nodes.
To further incorporate temporal context information, the current local relational features are concatenated with the temporal context embeddings within the historical input window to obtain the local correlation-aware hidden state:
H l ( t ) = C o n c a t ( Q i , t 0 , , t S / S 1 )
where H l ( t ) denotes the hidden state representation of the local correlation-aware module at time step t, Q i denotes the local relational feature extracted at the i-th time step, t 0 , , t S / S 1 denotes the temporal context within the input window for modeling local dynamic correlations, and Concat ( · ) denotes the concatenation operation along the feature dimension. This process integrates local relational information from different time steps into a unified representation, thereby providing a relational basis for subsequent continuous evolution and dynamic graph construction.
To enhance the temporal continuity of local relation representations, this study further embeds the above local relation modeling process into an ODE dynamic framework, allowing local correlation features to evolve and update in a continuous hidden space. It can be formulated as
Q ode = ODESolve d Q ( t ) d t , H l 0 , t 0 , t S 1
d Q ( t ) d t = Q i × 1 ( A ^ I ) + Q i × 2 ( U I ) + Q i × 3 ( W I ) + H l 0
where H l 0 denotes the initial hidden state in the local correlation modeling process, A ^ denotes the normalized adjacency matrix, and U and W denote the transformation parameters on the temporal and feature dimensions, respectively. Through the above continuous evolution process, the model can continuously refine latent relational representations among nodes under local structural constraints, thereby obtaining more stable and more expressive local dynamic relations.
After obtaining the continuously evolved local relation representation Q o d e , this study further computes the latent relation probability between node pairs. Specifically, for node i and node j, their relation probability is derived from the similarity between their evolved local relation features:
θ i j = σ q i o d e T q j o d e d
where q i o d e and q j o d e denote the continuously evolved local relation features of node i and node j, respectively; d is the hidden feature dimension; and θ i j measures the probability of a latent dynamic connection between node i and node j under the current traffic state.
On this basis, this paper employs the Gumbel softmax mechanism to perform reparameterized sampling on inter-node relations, thereby generating the probabilistic graph adjacency matrix A P m . Its form is defined as
A i j P m = σ log θ i j / ( 1 θ i j ) + g i j 1 g i j 2 τ
where τ is the temperature parameter that controls the smoothness of sampling, g i j 1 and g i j 2 are random noise terms drawn from the Gumbel distribution, and A i j P m takes the value of 1 with probability θ i j and 0 with the remaining probability. Through the above reparameterization process, the model can sample latent node connections within a differentiable framework, allowing the dynamic graph generation process to be jointly optimized with the forecasting task in an end-to-end manner.
Different from conventional adjacency matrices that rely on manual priors or fixed topology, A P m is not directly determined by physical road connectivity but is adaptively induced by the local correlation-aware module from the traffic state evolution process. Therefore, this matrix can not only preserve latent node dependencies beyond explicit topology but also dynamically adjust relation strengths among nodes with changes in traffic states, thereby reflecting time-varying interaction patterns in complex traffic systems more faithfully.

4.2. Coupled Continuous Spatiotemporal Evolution Module

After mining local latent relations and generating the probabilistic dynamic graph matrix A P m , the key problem of the model is how to further achieve stable and effective spatiotemporal state evolution on this dynamic structure. Most existing traffic forecasting methods usually treat spatial modeling and temporal modeling as two relatively independent processes. For example, they use graph convolution to extract spatial dependencies and then use recurrent networks or temporal convolution to model temporal dynamics. However, although this discrete serial design is straightforward in implementation, it is often difficult to simultaneously characterize the continuous evolution of traffic states and the coordinated update of spatiotemporal dependencies within a unified framework. Especially when the network becomes deeper, discrete graph propagation can easily make node representations overly similar, which weakens the ability of the model to represent complex spatiotemporal heterogeneity.
Based on the above issues, this paper further constructs a coupled continuous spatiotemporal evolution block as the main structure of the whole model. The core idea of this module is to take dynamic graph relations as the basis of propagation, use graph convolution to characterize spatial relation updating, use temporal convolution to describe temporal dynamic evolution, and couple the two within an ODE-based continuous propagation framework. In the spatial dimension, this paper first uses graph convolution to propagate node states on the dynamic graph structure. Traditional graph convolution usually relies on a fixed normalized adjacency matrix A ^ to aggregate information from each node and its neighbors, and its general form can be written as
H k + 1 = G C N ( H k ) = σ ( A ^ H k W )
where H k denotes the input node representation at the k-th layer, W denotes the learnable parameter matrix, and σ ( · ) denotes the nonlinear activation function. However, the propagation relations characterized by a fixed adjacency matrix are essentially static and thus cannot reflect latent and time-varying node interactions in complex traffic scenarios.
To address this issue, this paper replaces the static graph structure in traditional graph convolution with the dynamic graph matrix A P m , which is adaptively generated by the local correlation-aware module, and thus obtains the following updated form:
H k + 1 = G C N ( H k ) = σ ( A P m H k W )
where A P m has the same dimension as the original adjacency matrix, but its relation strengths are driven by the current traffic state and are dynamically updated over time. In this way, the spatial propagation process is no longer constrained by fixed topological priors but can flexibly characterize more realistic relations among nodes in a data-driven manner. In the temporal dimension, this paper uses a temporal convolutional network to model the dynamic changes of traffic states over time. Compared with methods based on recurrent structures, TCN has better advantages in long-sequence modeling and parallel computation and can more stably extract multi-scale temporal dependencies.
Let the input spatiotemporal feature be X. Then the temporal convolution representation at the k-th layer can be defined as
H T c n k = X , k = 0 S i g m o i d W k d k H T c n k 1 , k = 1 , 2 , , S
where W k denotes the convolution kernel parameter at the k-th layer and d k denotes the dilation rate, which is used to enlarge the receptive field layer by layer. With this dilated temporal convolution mechanism, the model can perceive traffic state changes at different time scales, thereby providing richer temporal context for the subsequent continuous propagation process.
On this basis, this paper further embeds spatial propagation and temporal evolution into a unified ODE framework to construct a continuous spatiotemporal state update mechanism. For the graph convolution branch, this paper represents its hidden state in continuous space as
H g ( t ) = O D E S o l v e d H g ( t ) d t , H g 0 , t 0 , t T
d H g ( t ) d t = H g ( t ) × 1 ( A P m I ) + H g ( t ) × 2 ( U I ) + H g ( t ) × 3 ( W I ) + H g 0
where H g 0 denotes the initial hidden state, U denotes the transformation matrix on the temporal dimension, and W denotes the transformation matrix on the feature dimension. Through the above continuous propagation form, the model no longer relies on discrete stacking with a limited number of layers to extract spatial dependencies but treats the update of node representations as a dynamic process that continuously evolves along the depth direction. This design not only can alleviate the over-smoothing problem that commonly appears in deep discrete propagation but also enables spatial structure updating and temporal state evolution to be coupled within a unified continuous dynamic framework.
However, dual-branch continuous propagation is still insufficient to guarantee effective coordination between global diffusion information and local relational information. Therefore, this paper further designs a unified state update mechanism to achieve stable fusion and the continuous updating of multi-source spatiotemporal states. Specifically, let the hidden representations from the GCN branch and the local correlation branch be denoted as GCN and LAM, respectively. This paper first uses an aggregation function to jointly model the two types of states, and its form is given as
H = A G G ( G C N , L A M ) = 1 2 K m K n m K F i S i g m o i d ( F j )
where H denotes the aggregated hidden state, ⊙ denotes element-wise multiplication, and G C N and L A M denote the state representations of the global propagation branch and the local correlation branch, respectively. By introducing the Sigmoid gating term, the model can adaptively adjust the contribution strength of different branch features during aggregation, thereby avoiding the dominance of a single path in continuous propagation and improving the effectiveness of the joint modeling of global and local spatiotemporal information.
After completing multi-branch state aggregation, this paper further adopts a residual update strategy to stabilize node representations, and its form is defined as
H = U p d a t e ( H , H ) = α r e s σ ( W r e s H + b r e s ) + β r e s H
where H and H denote the input state before updating and the aggregated state after fusion, respectively, while W r e s and b r e s denote the residual mapping parameters.
To enable adaptive adjustment of the contributions of the two update paths, this paper further defines α r e s and β r e s as learnable gating coefficients:
α r e s = exp ( a ) exp ( a ) + exp ( b )
β r e s = exp ( b ) exp ( a ) + exp ( b )
where a and b are learnable parameters and are automatically optimized through gradient descent during training. In this way, the model can establish a dynamic balance between preserving the original state and injecting newly aggregated information, thereby improving the stability and expressive ability of continuous spatiotemporal evolution.

4.3. Spatiotemporal Fusion Prediction Module

After the local correlation-aware dynamic relation module completes probabilistic dynamic graph construction and the coupled continuous spatiotemporal evolution block completes state extraction and updating, the model has obtained two complementary types of spatiotemporal representations. One comes from the global structural dependencies learned through dynamic graph propagation, and the other comes from the latent interactions between non-adjacent nodes mined by the local correlation-aware mechanism. However, although these two types of features reflect different levels of information in the traffic system, their joint modeling value is difficult to fully exploit if they are only processed by simple linear mapping or direct concatenation. Especially in complex traffic scenarios, global propagation patterns and local sudden correlations often influence future state evolution at the same time. Therefore, the prediction stage requires not only information integration but also a fusion mechanism that can explicitly coordinate the contributions of global and local spatiotemporal features.
Based on this consideration, this paper further designs a spatiotemporal fusion prediction module as the key component that connects representation learning with prediction output in the whole DPMCODE framework. This module does not treat prediction as a simple mapping of the final hidden state. Instead, it uses an attention-driven collaborative fusion mechanism to jointly model the spatiotemporal representations learned by the previous layers, so that the complementary relationship between global dependencies and local correlations can be explicitly characterized in the prediction layer. Specifically, let the output representation from the previous continuous spatiotemporal evolution block be denoted by X ^ . This paper first applies linear projection to it to generate the query, key, and value representations:
Q = X ^ W q + b q , K = X ^ W k + b k , V = X ^ W v + b v
where W q , W k , and W v denote the learnable weight matrices for the query, key, and value, respectively, while b q , b k , and b v denote the corresponding bias terms. Through this projection process, the model can map the high-dimensional spatiotemporal representations extracted in the previous stage to a representation space that is suitable for relation matching and information fusion.
On this basis, this paper further uses the scaled dot-product attention mechanism to evaluate the correlations among different representations and obtains the attention scores in the fusion stage:
A t t e n t i o n S c o r e s = Q T · K h / C
where h denotes the number of attention heads, C denotes the node embedding dimension, and h / C is the scaling factor used to alleviate the numerical instability caused by the increase in inner product values with dimension. Through this process, the model can adaptively identify more discriminative spatiotemporal dependency patterns from the mixed representations of global propagation and local relations.
Subsequently, this paper uses the softmax function to normalize the attention scores and applies them to the value representations, thereby obtaining the fused spatiotemporal predictive representation.
X i = s o f t m a x ( A t t e n t i o n S c o r e s ) · V i
To further map the fused representation to the target prediction space, this paper introduces a multilayer perceptron at the end of the prediction module to perform nonlinear transformation on the above spatiotemporal representation and output the final future traffic state prediction Y ^ :
Y ^ = M L P ( X i )
where MLP ( · ) denotes the multilayer perceptron mapping function, whose role is to fully exploit the discriminative information in the fused features and enhance the final prediction ability of the model for future traffic evolution trends.

4.4. Loss Function

In traffic flow prediction tasks, the Huber loss function is often widely used; it combines the advantages of squared loss and absolute loss [40]. Its general expression equation is as follows:
L ( Y ^ , Y ) = 1 2 ( Y ^ Y ) 2 , Y ^ Y δ , δ Y ^ Y 1 2 δ 2 , Y ^ Y > δ .
where δ represents a predefined hyperparameter set for the intended threshold, Y is the actual future spatiotemporal data, and Y ^ and refers to the forecasted future data.

5. Experiments

This section comprehensively evaluates the proposed DPMCODE on five real-world traffic datasets. Specifically, this paper first introduces the experimental setup, including datasets, comparison methods, and implementation details. Then, the forecasting performance of the model is evaluated by comparing it with existing methods. On this basis, ablation experiments and variant comparison experiments are further conducted to verify the effectiveness of each key component. Meanwhile, hyperparameter sensitivity analysis and efficiency performance evaluation are employed to further examine the robustness and practical value of the model. Finally, visualization experiments are conducted to further confirm the experimental results of the proposed framework.
We employ distinct performance metrics, to measure our model’s effectiveness.
MAE ( x , x ^ ) = 1 | H | i H x i x ^ i
RMSE ( x , x ^ ) = 1 | H | i H x i x ^ i 2
MAPE ( x , x ^ ) = 1 | H | i H x i x ^ i x i
where the MAE metric evaluates forecast accuracy, RMSE is sensitive to anomalies, and MAPE adjusts for unit variance, providing detailed information on the comparison between actual and predicted values in the data. x i denotes the actual value for the i-th observation, x i ^ represents the predicted value for the same observation, and H indicates the set of indices for the observed samples in our experiments.

5.1. Dataset

Our experiments are conducted on five publicly available traffic datasets, namely, PEMS03, PEMS04, PEMSD7, PEMS08, and PEMS-BAY. Among them, the first four datasets comprise data collected from the Performance Measurement System (PeMS) of the California Department of Transportation, while PEMS-BAY was obtained from traffic data in the San Francisco Bay Area. A brief statistical summary of these datasets is presented in Table 2, including the number of nodes, the number of edges, the total time steps, and the corresponding time range. For PEMS-BAY and PEMSD7, the data are divided into training, validation, and test sets in a ratio of 7:1:2. For PEMS03, PEMS04, and PEMS08, the data split ratio is set to 6:2:2. Following the standard experimental setting, raw traffic observations are collected every 30 s and further aggregated into 5 min intervals for model training and evaluation.

5.2. Experimental Setup

All experiments are implemented based on Python 3.8 and PyTorch 1.13.0 and are trained on an NVIDIA GeForce RTX 3090 GPU. The model is optimized with Adam. The input sequence length is set to 12 time steps, and the prediction length is also set to 12 time steps, corresponding to traffic forecasting for the next hour. The other main experimental settings are shown in Table 3. The number of model layers is set to 2, the embedding dimension is set to 32, and the batch size is set to 16.

5.3. Baseline

The benchmarks employed for our comparative analysis fall into four distinct categories. The first group encompasses traditional methods for forecasting time series.
ARIMA [24]: The Autoregressive Integrated Moving Average (ARIMA) model aims to extract the time-series patterns hidden behind the data through the autocorrelation and differencing of the data and then use these patterns to predict future data.
FC-LSTM [25]: The Long Short-Term Memory (LSTM) network, featuring fully connected hidden units, is a renowned architecture known for its potent capability to capture sequential dependencies.
The second category is spatiotemporal graph convolutional networks with static graphs.
STGCN [26]: STGCN incorporates graph structure convolutions and 1D temporal convolutions to efficiently model spatial dependencies and temporal correlations.
GraphWaveNet [28]: This model combines adaptive graph convolution with 1D dilated causal convolutions, providing a robust approach to capturing spatiotemporal dependencies in the data.
STGODE [38]: STGODE leverages continuous differential equations for node representation in traffic forecasting networks.
The third category is spatiotemporal graph with dynamic graphs.
DCRNN [33]: Diffusion Convolutional Recurrent Neural Networks view traffic flow as diffusion and integrate a diffusion convolutional layer into the GRU, creating the DCGRU.
DGCRN [34]: DGCRN uses a Dynamic Graph Convolutional Recurrent Module to detect spatiotemporal patterns in a seq2seq framework, modeling dynamic graphs.
The fourth category is spatiotemporal graphs with cross dependencies.
STSGCN [27]: Spatiotemporal Synchronous Graph Convolutional Networks break down the issue into localized subgraphs, aiding the network in capturing local spatiotemporal correlations and addressing heterogeneities in spatiotemporal data.

5.4. Model Performance Experiment

To comprehensively validate the forecasting performance of the proposed DPMCODE, this paper compares it with several representative methods on five real-world traffic datasets, and the results are shown in Table 4 and Table 5. The comparison models cover traditional statistical methods, sequence modeling methods, static graph-based spatiotemporal forecasting methods, and continuous-time graph learning methods. Therefore, they can more comprehensively reflect the applicability and performance advantages of the proposed model in different traffic scenarios.
From the overall results, DPMCODE achieves the best performance on most evaluation metrics across the five datasets, showing strong stability and generalization ability. Compared with the traditional statistical model ARIMA and the pure sequence modeling method FC-LSTM, the proposed model achieves significantly lower errors on all datasets. This indicates that relying only on temporal patterns is insufficient to fully characterize spatiotemporal dependencies in complex road networks, while the introduction of dynamic graph modeling, continuous propagation, and collaborative prediction of global and local information enables the model to learn the dynamic variation patterns of traffic states more effectively.
On the PEMS03 dataset, DPMCODE achieves the best results on MAE, RMSE, and MAPE, reaching 15.89, 27.31, and 16.38, respectively. Compared with the strong baselines DGCRN and STGODE, the proposed model further improves all three metrics. This indicates that on this dataset, the proposed method can characterize latent and time-varying dependencies among nodes more accurately and effectively improve the overall forecasting accuracy.
On the PEMS04 dataset, the advantage of DPMCODE is more evident, with MAE, RMSE, and MAPE reaching 18.52, 29.41, and 11.99, respectively, and all three metrics are significantly better than those of all comparison methods. In particular, compared with strong graph models such as STSGCN and STGODE, the proposed model achieves larger improvements in error control. This indicates that the proposed dynamic graph relation construction and continuous spatiotemporal propagation mechanisms can work more effectively on this dataset, thereby improving the fitting ability of the model for complex traffic variation patterns.
On the PEMS08 dataset, DPMCODE also outperforms all comparison methods on all three metrics, with MAE, RMSE, and MAPE reaching 15.83, 23.91, and 10.80, respectively. Compared with the second-best method, STGODE, the proposed model achieves further improvements on all three metrics. The advantage is especially clear on RMSE, which indicates that the proposed method has stronger robustness and representation ability when dealing with sharp traffic fluctuations and complex dynamic changes. On the larger-scale datasets PEMSD7 and PEMS-BAY, DPMCODE still maintains clear advantages. For PEMSD7, the proposed model achieves 2.88, 5.21, and 7.33 on MAE, RMSE, and MAPE, respectively, and outperforms all comparison methods. For PEMS-BAY, DPMCODE achieves 1.63, 3.21, and 3.72 on the three metrics, respectively, and also significantly outperforms advanced models such as GraphWaveNet, STGODE, and STSGCN. These results further indicate that the proposed model is not only suitable for small- and medium-scale road networks but can also maintain strong modeling ability and forecasting accuracy in more complex and larger-scale real traffic systems.
From the perspective of method categories, STGCN, DCRNN, STSGCN, and GraphWaveNet can effectively combine spatial topology and temporal dynamic information, but most of them still rely on static graph structures or discrete propagation processes. Therefore, they still have limitations in characterizing latent and time-varying node interactions. In contrast, the proposed DPMCODE adaptively generates a probabilistic dynamic graph adjacency matrix through the local correlation-aware dynamic relation module, and combines it with the coupled continuous spatiotemporal evolution block to jointly model spatial propagation and temporal evolution. In this way, it can more fully mine latent dependencies in traffic systems. In addition, the spatiotemporal fusion prediction module further strengthens the collaboration between global propagation information and local relational information, which enables the model to achieve better forecasting performance on different datasets.

5.5. Ablation Experiment

To further evaluate the deployment potential of the proposed model in practical traffic forecasting scenarios, this study conducts a comprehensive comparison of the complexity and predictive performance of different model variants on the PEMS BAY dataset. The analysis considers five aspects, including training time, inference time, the number of parameters, FLOPs, and the prediction error measured by MAE. The results are reported in Table 6 and Figure 4. In Figure 4, the horizontal axis represents the training time, the vertical axis denotes the prediction error in terms of MAE, the bubble size indicates the number of parameters, and the color intensity reflects FLOPs. The annotations further provide the parameter size and inference time of each model.
The compared variants are constructed from two perspectives: module removal and adjacency matrix replacement. For module removal, this study first constructs a BaseLine control model. This model adopts a classical GCN-plus-TCN architecture, where GCN is used to extract spatial topological dependencies in the road network and TCN is used to model temporal dynamic features in traffic sequences. It should be noted that BaseLine is not an ablation variant obtained by removing a specific module from the complete DPMCODE but a basic backbone control model. Compared with the complete model, BaseLine does not include the continuous ODE propagation mechanism, the local correlation-aware ODE module, the spatiotemporal fusion prediction module, or the dynamic adaptive adjacency matrix. It only serves as a basic discrete spatiotemporal modeling framework for measuring the performance gains introduced by the proposed modules.
On this basis, three module removal variants are further constructed. First, the w/o ODE variant removes the continuous ODE propagation mechanism and retains only discrete graph propagation, aiming to verify the contribution of continuous dynamic evolution modeling to spatiotemporal dependency learning. Second, the w/o LCA variant removes the local correlation-aware ODE module to analyze its role in modeling latent dependencies among non-adjacent nodes. Third, the w/o SFP variant removes the spatiotemporal fusion prediction module to examine how the collaborative fusion of global and local spatiotemporal information affects the final prediction results. In addition, to further disentangle the contributions of different components in the dynamic graph generation mechanism, this study adds the w/o Gumbel variant. This variant removes the Gumbel softmax reparameterization process and directly uses the continuous relation probability matrix for graph propagation. Through this controlled experiment, the role of the reparameterized sampling mechanism in generating learnable probabilistic dynamic graph structures can be further verified.
For adjacency matrix variants, this paper further designs two alternative graph construction methods. The first variant is the static adjacency matrix model DPMCODE-S, which constructs a weighted adjacency matrix with a Gaussian kernel distance threshold mechanism and obtains a fixed graph structure by adding self loops and performing symmetric normalization. This method can preserve prior topological relations in the road network, but it is difficult for it to adaptively mine latent and time-varying dependencies among nodes according to changes in traffic states. The second variant is the diffusion adjacency matrix model DPMCODE-D, which follows the bidirectional diffusion convolution idea of DCRNN and realizes graph information propagation through the forward random walk matrix and the backward random walk matrix. Although this method can characterize directional features in the traffic propagation process to some extent, its propagation structure still relies on predefined diffusion matrices and lacks the ability to dynamically adjust node relations according to sample states. Therefore, by comparing with DPMCODE-S and DPMCODE-D, the advantages of the proposed dynamic adaptive adjacency matrix in latent relation mining and complex spatiotemporal dependency modeling can be verified more intuitively.
From the overall results, BaseLine has the lowest computational cost, with a training time of only 54 s, an inference time of 2 s, 284,631 parameters, and 0.45 G FLOPs. However, since this model only adopts a basic GCN and TCN architecture and lacks continuous ODE propagation, local correlation-aware modeling, dynamic graph generation, and spatiotemporal fusion prediction mechanisms, its MAE reaches 1.89, indicating clearly lower prediction accuracy than the other variants. This suggests that relying solely on a lightweight discrete spatiotemporal modeling structure can reduce computational cost, but is insufficient to fully characterize dynamic spatiotemporal dependencies in complex traffic systems. Compared with BaseLine, the variants incorporating the proposed core components show increased computational complexity, but all achieve clear improvements in prediction accuracy. For example, the MAE values of w/o ODE, w/o LCA, and w/o SFP are 1.76, 1.72, and 1.69, respectively, all of which are significantly better than that of BaseLine. This indicates that even when some core modules are removed, the retained dynamic graph relation modeling, local correlation learning, or spatiotemporal fusion mechanisms can still effectively enhance prediction performance. Meanwhile, the gradual increase in training time and FLOPs of these variants also shows that different modules introduce additional computational costs while improving the model representation capacity. A further comparison of different graph structure variants shows that DPMCODE-S and DPMCODE-D obtain MAE values of 1.71 and 1.67, respectively, both of which are worse than the complete DPMCODE. Among them, DPMCODE-D has 615,894 parameters, 1.85 G FLOPs, and an inference time of 64 s, all of which are higher than those of the complete model. However, its prediction error is still higher than that of the complete DPMCODE. This suggests that graph propagation based on the diffusion adjacency matrix can enhance directional propagation modeling, but its higher computational cost does not lead to the best performance. In contrast, the complete DPMCODE contains 546,937 parameters, requires 1.57 G FLOPs, and has an inference time of 60 s, achieving a lower MAE of 1.63 with less computational cost than DPMCODE-D. This demonstrates that the proposed dynamic adaptive graph structure can characterize latent node relationships more effectively, rather than relying on a larger parameter scale or higher computational complexity to improve performance.
In addition, the w/o Gumbel variant achieves an MAE of 1.74, with 474,309 parameters and an inference time of 52 s. Compared with the complete DPMCODE, this variant has lower computational cost but reduced prediction accuracy. This indicates that after removing the Gumbel softmax reparameterization mechanism, the model can still retain a certain ability to learn dynamic graph relationships, but the discriminability and state adaptability of the dynamic graph structure are weakened. Therefore, although the Gumbel softmax mechanism introduces certain additional computational costs, it plays a positive role in probabilistic dynamic graph generation and prediction performance improvement. Overall, DPMCODE achieves the lowest MAE with moderate parameter size, FLOPs, and inference time, indicating that its performance improvement does not rely on model scale expansion but stems from the effective collaboration of continuous ODE propagation, local correlation-aware dynamic graph construction, and the spatiotemporal fusion mechanism.
To further analyze the computational cost of DPMCODE, let N denote the number of traffic nodes, T the length of the historical input window, C the hidden feature dimension, L the number of continuous spatiotemporal evolution layers, K the graph propagation order, and M the average number of function evaluations required by the ODE solver. According to the model architecture, the main complexity of DPMCODE comes from local correlation-aware dynamic graph construction and coupled continuous spatiotemporal evolution. The local correlation-aware module models latent relations among nodes, leading to a complexity of approximately O ( T N 2 C ) . The Gumbel softmax reparameterization mainly operates on the relation probability matrix, with a complexity of approximately O ( T N 2 ) , and is usually not the dominant term. The coupled continuous spatiotemporal evolution module consists of dynamic graph convolution, temporal convolution, and ODE numerical integration, and its main complexity can be approximated as O ( M L K T N C 2 ) . The spatiotemporal fusion prediction module mainly includes query, key, and value mappings as well as feature fusion, with a complexity of approximately O ( T N C 2 ) . Therefore, the overall computational complexity of DPMCODE can be approximated as
O T N 2 C + M L K T N C 2 + T N C 2
where the first two terms correspond to dynamic graph relation construction and continuous spatiotemporal state evolution, respectively, and constitute the main sources of the model computational cost. Since O ( T N C 2 ) is relatively small, the overall complexity can be simplified as:
O T N 2 C + M L K T N C 2

5.6. Hyperparameter Sensitivity Experiments

To further verify the rationality of the key structural parameter settings of the proposed model, this paper conducts hyperparameter sensitivity experiments and focuses on the effects of the number of layers in the coupled continuous spatiotemporal evolution block, the node embedding dimension, and the batch size on model performance. These parameters correspond to the depth of continuous propagation, the capacity of spatiotemporal feature representation, and the optimization stability during training, respectively, and therefore have an important influence on the final prediction results. Specifically, the layer number experiment is used to evaluate the modeling performance of the coupled continuous spatiotemporal evolution block under different propagation depths. If the number of layers is too small, the model cannot fully extract deep spatiotemporal dependencies from dynamic graph propagation and temporal evolution. If the number of layers is too large, redundant propagation may be introduced and representation homogenization may be aggravated. The embedding dimension experiment is mainly used to analyze the representation ability of the model under different feature space capacities. A small embedding dimension limits the representation ability of local correlation modeling and global propagation features, while an overly large embedding dimension may bring redundant representations and extra optimization burden. The batch size experiment is used to examine the effect of gradient estimation stability during training on model performance. An overly small batch size may lead to large gradient fluctuations, while an overly large batch size may weaken the generalization advantage brought by stochastic optimization. Figure 5 shows the variation trends of MAE, MAPE, and RMSE under different parameter settings in PEMS08. It can be observed that the model shows relatively stable parameter sensitivity in all three groups of experiments and achieves the best performance within a moderate parameter range. In particular, when the number of layers in the coupled continuous spatiotemporal evolution block is set to 2, the embedding dimension is set to 32, and the batch size is set to 16, the model achieves the best results on all three metrics.

5.7. Visual Experiment

To more intuitively evaluate the predictive performance of the proposed model, this paper randomly selects two nodes from the PEMS08 dataset and plots the real traffic flow curves and the corresponding predicted curves over a specific period, as shown in Figure 6. It can be observed that for both node 120 and node 58, the prediction curves of DPMCODE are overall closer to the variation trend of the real curves and show better fitting performance at local peaks, valleys, and turning points. In contrast, although STGODE can capture the overall variation trend of traffic flow to some extent, it still shows obvious deviations in some sharply fluctuating intervals. Its predictions are relatively less consistent with the ground truth, especially in local peak valley changes and short-term disturbance responses. The green dashed line representing DPMCODE in Figure 6 remains more consistent with the blue ground-truth curve over most time steps. This indicates that the proposed model can not only learn the global variation patterns of traffic flow well but also characterize local dynamic fluctuations more accurately, thereby showing stronger prediction stability and better tracking ability for time-varying trends.
Furthermore, to analyze the actual modeling effect of the proposed dynamic graph mechanism, Figure 7 presents the heatmap visualization results of the dynamic adaptive graph at different time steps t = 2 , 4 , 6 .
It can be observed from the figure that the distributions of adjacency relation strengths are different at different time steps, which indicates that the graph structure learned by the model is dynamically adjusted with changes in traffic states rather than remaining fixed all the time. Some node pairs show higher response strengths at specific time steps, while they become relatively weaker at other time steps. This reflects that the model can adaptively reconstruct latent dependencies among nodes according to the current traffic states. These results further indicate that the proposed dynamic adaptive adjacency matrix can not only overcome the limitations of static topological graphs but also continuously capture time-varying interaction patterns during the evolution process, thereby providing more flexible and more discriminative structural support for subsequent spatiotemporal propagation.

6. Limitations and Future Work

Although DPMCODE achieves promising predictive performance on multiple real-world traffic forecasting datasets, there remains room for further extension. First, the current experiments are mainly conducted on commonly used benchmark traffic datasets, whose network scales are still relatively limited compared with large, city-scale transportation systems. Therefore, the scalability of DPMCODE to ultra-large traffic networks with more than 1000 nodes still needs to be further investigated. In future work, we will conduct experiments on larger, city-scale networks and explore sparse dynamic graph learning, graph partition-based acceleration, and more efficient ODE solvers to reduce computational overhead and improve real-time inference capability. Second, future work will consider jointly modeling missing data imputation and spatiotemporal forecasting to improve the adaptability of the model under sensor anomalies or incomplete observations. In practical intelligent transportation systems, sensor failures, communication interruptions, and incomplete observations may affect the reliability of traffic prediction. Therefore, integrating missing-pattern-aware imputation with the forecasting process is a promising direction for improving the robustness of DPMCODE. Finally, external factors such as weather conditions, traffic accidents, holidays, and large-scale events can be incorporated as auxiliary node features or dynamic graph update signals. These factors may help the model better capture abnormal traffic fluctuations and further enhance the robustness and deployment value of DPMCODE in real intelligent transportation systems.

7. Conclusions

This paper proposes a dynamic prediction model using continuous ordinary differential equations, termed DPMCODE, to address the challenge of accurately characterizing spatiotemporal dependencies in complex traffic scenarios. Unlike conventional methods that rely on static graph structures and discrete propagation mechanisms, DPMCODE develops a unified traffic forecasting framework comprising three aspects: dynamic graph relation construction, continuous spatiotemporal evolution modeling, and collaborative prediction using global and local information. Specifically, a local correlation-aware dynamic relation module is first constructed to adaptively mine latent correlations between non-adjacent nodes from evolving traffic states and generate a probabilistic dynamic graph adjacency matrix. Then, a coupled continuous spatiotemporal evolution block is designed to integrate dynamic graph propagation, temporal convolution modeling, and continuous ODE propagation, thereby enabling continuous feature extraction and stable updates of complex spatiotemporal states. Finally, a spatiotemporal fusion prediction module is introduced to jointly model global propagation information and local relational information, followed by a multilayer perceptron that produces the final prediction results. Extensive experiments on five real-world traffic benchmark datasets show that DPMCODE outperforms existing comparative methods across multiple evaluation metrics, verifying its effectiveness and superiority in complex traffic forecasting tasks. Further ablation studies demonstrate that the continuous ODE propagation mechanism, local correlation-aware modeling, the spatiotemporal fusion prediction strategy, and the dynamic adaptive adjacency matrix all contribute substantially to performance improvement. Meanwhile, the efficiency analysis shows that the proposed model achieves a favorable balance among prediction accuracy, parameter scale, training cost, and computational complexity, indicating strong potential for practical applications. Future work will further explore more complex heterogeneous traffic scenarios, the incorporation of multi-source external factors, and more efficient continuous propagation mechanisms to improve the generalization ability and deployment performance of the model in large-scale real traffic systems.

Author Contributions

Conceptualization, Y.Z. and Y.Y.; methodology, Y.Z.; software, Y.Z.; validation, P.L. and C.W.; formal analysis, Y.Z.; writing—original draft preparation, Y.Z.; writing—review and editing, Y.Z.; visualization, P.L.; supervision, P.L.; funding acquisition, P.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research study was funded by the Science and Technology Development Program of Jilin Province, grant number 20250102228JC.

Data Availability Statement

Publicly available datasets were analyzed in this study. The data can be found at: https://github.com/liyaguang/DCRNN (accessed on 1 January 2020).

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

  1. Yu, H.; Jiang, C.; Fang, Q.; Wei, T.; Xu, L. Deep learning driven spatiotemporal prediction of global carbon emissions from container shipping. Transp. Res. Part D Transp. Environ. 2026, 151, 105169. [Google Scholar] [CrossRef]
  2. Liu, Z.; Zou, G.; Wang, T.; Tu, M.; Wang, H.; Li, Y. Learning and Predicting Traffic Conflicts in Mixed Traffic: A Spatiotemporal Graph Neural Network with Manifold Similarity Learning. Expert Syst. Appl. 2026, 309, 131183. [Google Scholar] [CrossRef]
  3. Ji, J.; Wang, J.; Mou, Y.; Long, C.; Wu, J. How to Break It Down for Building It Up? Theory Guided Graph Decomposition Learning for Spatiotemporal Traffic Prediction. IEEE Trans. Pattern Anal. Mach. Intell. 2026, 48, 5442–5459. [Google Scholar] [CrossRef] [PubMed]
  4. Zhang, Q.; Shang, Q.; Zhou, Q.; Yang, M.; Xu, B.; Yang, Z. DGSSformer: Dynamically Global Aware Spatiotemporal Synchronous Transformer for Traffic Prediction. Expert Syst. Appl. 2026, 307, 131063. [Google Scholar] [CrossRef]
  5. Yang, S.; Wu, Q.; Li, M. Decoupled Multi Spatio Temporal Fusion Graph Convolutional Recurrent Network for Traffic Prediction. Eng. Appl. Artif. Intell. 2026, 163, 112956. [Google Scholar] [CrossRef]
  6. Liu, P.; Zhu, Y.; Yang, Y.; Tang, J.; Jiang, X.; Wang, J. MSF GODE: Multi scale Frequency Domain Learning in Graph Neural ODEs for Accurate Traffic Flow Forecasting. Neurocomputing 2025, 658, 131566. [Google Scholar] [CrossRef]
  7. Luo, Q.; He, S.; Han, X.; Wang, Y.; Li, H. LSTTN: A Long Short Term Transformer Based Spatiotemporal Neural Network for Traffic Flow Forecasting. Knowl.-Based Syst. 2024, 293, 111637. [Google Scholar] [CrossRef]
  8. Méndez, M.; Merayo, M.G.; Núñez, M. Long Term Traffic Flow Forecasting Using a Hybrid CNN BiLSTM Model. Eng. Appl. Artif. Intell. 2023, 121, 106041. [Google Scholar] [CrossRef]
  9. Xu, Y.; Han, L.; Zhu, T.; Sun, L.; Du, B.; Lv, W. Generic Dynamic Graph Convolutional Network for Traffic Flow Forecasting. Inf. Fusion 2023, 100, 101946. [Google Scholar] [CrossRef]
  10. Huo, G.; Zhang, Y.; Wang, B.; Gao, J.; Hu, Y.; Yin, B. Hierarchical Spatio Temporal Graph Convolutional Networks and Transformer Network for Traffic Flow Forecasting. IEEE Trans. Intell. Transp. Syst. 2023, 24, 3855–3867. [Google Scholar] [CrossRef]
  11. Zhao, W.; Yuan, G.; Zhang, Y.; Liu, X.; Liu, S.; Zhang, L. An Interpretable and Efficient Multi Scale Spatio Temporal Neural Network for Traffic Flow Forecasting. Expert Syst. Appl. 2026, 296, 128961. [Google Scholar] [CrossRef]
  12. Liu, P.; Zhu, Y.; Yang, Y.; Wang, C.; Jie, J. A Unified Diffusion Framework for Traffic Imputation and Prediction with Physical Priors. IEEE Trans. Mob. Comput. 2025, 25, 341–357. [Google Scholar] [CrossRef]
  13. Chen, H.; Grant Muller, S. Use of Sequential Learning for Short Term Traffic Flow Forecasting. Transp. Res. Part C Emerg. Technol. 2001, 9, 319–336. [Google Scholar] [CrossRef]
  14. Hung, R.J. Graph Neural Networks for City Scale Electric Vehicle Charging Demand and Road Network Flow Forecasting: Empirical Ablations on Graph Structure and Exogenous Features. Electronics 2026, 15, 859. [Google Scholar] [CrossRef]
  15. Freeman, R.; Bagui, S.S.; Bagui, S.C.; Mink, D.; Cameron, S.; Carvalho, G.C.S.D. A Hybrid Time Series Forecasting Model Combining ARIMA and Decision Trees to Detect Attacks in MITRE ATT&CK Labeled Zeek Log Data. Electronics 2026, 15, 871. [Google Scholar] [CrossRef]
  16. Li, H.; Liu, W.; Chen, H. Multi Scale Graph Decoupling Spatial Temporal Network for Traffic Flow Forecasting in Complex Urban Environments. Electronics 2026, 15, 495. [Google Scholar] [CrossRef]
  17. Pang, J.; Wu, M.; Xie, B.; Bi, Y.; Luo, Z. Dynamic Graph Information Bottleneck for Traffic Prediction. Electronics 2026, 15, 623. [Google Scholar] [CrossRef]
  18. Wei, C.; Chen, C.; Wu, X.; Pan, D.; Yu, Q.; Zheng, X.; Luo, Y. Attention Dynamic Graph Convolutional Network for Traffic Flow Prediction. Eng. Appl. Artif. Intell. 2026, 163, 112642. [Google Scholar] [CrossRef]
  19. Li, X.; Bao, Y. Adaptive Gated Meta Graph Retention Network: A Model for Urban Traffic Flow Prediction. Expert Syst. Appl. 2026, 298, 129703. [Google Scholar] [CrossRef]
  20. Chen, Y.; Xia, D.; Liu, Y.; Zhang, F.; Zhang, W.; Hu, Y.; Li, Y.; Li, H. Mamba CorRL: Mamba Correlation Graph Convolutional Networks with Reinforcement Learning for Traffic Flow Prediction. Eng. Appl. Artif. Intell. 2026, 165, 113369. [Google Scholar] [CrossRef]
  21. Lv, H.; Chen, X.; Xiu, W. TSAformer: A Traffic Flow Prediction Model Based on Cross Dimensional Dependency Capture. Electronics 2026, 15, 231. [Google Scholar] [CrossRef]
  22. Gao, M.; Yu, H.; Jiao, P. MLGO: Multi Layer Graph Neural ODEs for Traffic Forecasting. Neural Netw. 2026, 198, 108540. [Google Scholar] [CrossRef]
  23. Shi, Y.; Zhou, W. MP Transformer: A Hybrid Model Integrating Multi Period ARIMA and Dynamically Gated Attention for Time Series Forecasting. Inf. Technol. Control 2026, 55, 243–256. [Google Scholar] [CrossRef]
  24. Liu, C.; Kou, Y.; Wang, S.; Xie, Z.; Su, Y. Research on Traffic Flow Prediction of Progressive Graph Convolutional Networks Based on Spatio Temporal Self Attention Mechanism. Sci. Rep. 2026, 16, 14112. [Google Scholar] [CrossRef] [PubMed]
  25. Tang, J.; Zhu, R.; Wu, F.; He, X.; Huang, J.; Zhou, X.; Sun, Y. Deep Spatio Temporal Dependent Convolutional LSTM Network for Traffic Flow Prediction. Sci. Rep. 2025, 15, 11743. [Google Scholar] [CrossRef]
  26. Ji, J.; Dong, H. Spatio Temporal Graph Convolutional Networks for Traffic Prediction Considering Multiple Spatio Temporal Information. In Proceedings of the 2024 20th International Conference on Mobility, Sensing and Networking (MSN), Harbin, China, 20–22 December 2024; IEEE: New York, NY, USA, 2024; pp. 730–737. [Google Scholar]
  27. Song, C.; Lin, Y.; Guo, S.; Wan, H. Spatial Temporal Synchronous Graph Convolutional Networks: A New Framework for Spatial Temporal Network Data Forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; AAAI: Washington, DC, USA, 2020; Volume 34, pp. 914–921. [Google Scholar]
  28. Zhong, L.; Wang, B.; Tian, Z.; Liu, W.; She, W. WaveGFormer: A Wavelet Enhanced Graph Transformer for Spatio Temporal Traffic Flow Forecasting. Inf. Sci. 2026, 740, 123187. [Google Scholar] [CrossRef]
  29. Li, M.; Zhu, Z. Spatial Temporal Fusion Graph Neural Networks for Traffic Flow Forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; AAAI: Washington, DC, USA, 2021; Volume 35, pp. 4189–4196. [Google Scholar]
  30. Hosseini, S.M.; Rahmatinia, S.M.; Hosseini Seno, S.A. Integrated Spatio Temporal Modeling with Hybrid Graph Convolutions and the Graph Fourier Neural Operator for Traffic Prediction. Sci. Rep. 2026, 16, 12945. [Google Scholar] [CrossRef] [PubMed]
  31. Pandey, S.; Sharma, S.; Kumar, R.; Moreira, J.M.; Chandra, J. STARK: Enhancing Traffic Prediction Through Spatiotemporal Adaptive Refinement With Knowledge Distillation. IEEE Trans. Comput. Soc. Syst. 2026, 1–13. [Google Scholar] [CrossRef]
  32. Zhang, H.; Qi, F.; Zhang, Y.; Qin, Y.; Li, Y. A Traffic Flow Forecasting Model Based on Dynamic Graph Learning and Temporally Adaptive Attention. Saf. Sci. 2026, 195, 107063. [Google Scholar] [CrossRef]
  33. Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion Convolutional Recurrent Neural Network: Data Driven Traffic Forecasting. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  34. Weng, W.; Fan, J.; Wu, H.; Zhang, Y.; Li, F.; Yu, B. A Decomposition Dynamic Graph Convolutional Recurrent Network for Traffic Forecasting. Pattern Recognit. 2023, 142, 109670. [Google Scholar] [CrossRef]
  35. Hu, J.; Lin, X.; Wang, C. DSTGCN: Dynamic Spatial Temporal Graph Convolutional Network for Traffic Prediction. IEEE Sens. J. 2022, 22, 13116–13124. [Google Scholar] [CrossRef]
  36. Zhou, J.; Qin, X.; Ding, Y.; Ma, H. Spatial Temporal Dynamic Graph Differential Equation Network for Traffic Flow Forecasting. Mathematics 2023, 11, 2867. [Google Scholar] [CrossRef]
  37. Fang, Z.; Long, Q.; Song, G.; Xie, K. Spatial Temporal Graph ODE Networks for Traffic Flow Forecasting. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Virtual, 14–18 August 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 364–373. [Google Scholar]
  38. Jin, M.; Zheng, Y.; Li, Y.F.; Chen, S.; Yang, B.; Pan, S. Multivariate Time Series Forecasting with Dynamic Graph Neural ODEs. IEEE Trans. Knowl. Data Eng. 2022, 35, 9168–9180. [Google Scholar] [CrossRef]
  39. Wiseman, Y. Real-Time Monitoring of Traffic Congestions. In Proceedings of the IEEE International Conference on Electro Information Technology, Lincoln, NE, USA, 14–17 May 2017; IEEE: New York, NY, USA, 2017; pp. 501–505. [Google Scholar]
  40. Jiang, S.; Huang, S. Robust Learning of Huber Loss under Exponentially Strongly Mixing Sequence. J. Complex. 2026, 95, 102045. [Google Scholar] [CrossRef]
Figure 1. Traffic network graph and spatiotemporal graph.
Figure 1. Traffic network graph and spatiotemporal graph.
Electronics 15 02369 g001
Figure 2. Traffic accident simulation diagram.
Figure 2. Traffic accident simulation diagram.
Electronics 15 02369 g002
Figure 3. DPMCODE model.
Figure 3. DPMCODE model.
Electronics 15 02369 g003
Figure 4. Efficiency performance trade off among different model variants on the PEMS-BAY dataset. The x axis and y axis denote training time and MAE, respectively. Bubble size represents the number of parameters, and color indicates FLOPs.
Figure 4. Efficiency performance trade off among different model variants on the PEMS-BAY dataset. The x axis and y axis denote training time and MAE, respectively. Bubble size represents the number of parameters, and color indicates FLOPs.
Electronics 15 02369 g004
Figure 5. Hyperparameter sensitivity analysis of DPMCODE on the PEMS-BAY dataset. Subfigures (ac) illustrate the effects of the number of layers, embedding dimension, and batch size on MAE, MAPE, and RMSE, respectively, showing that the proposed model achieves the best performance under moderate parameter settings.
Figure 5. Hyperparameter sensitivity analysis of DPMCODE on the PEMS-BAY dataset. Subfigures (ac) illustrate the effects of the number of layers, embedding dimension, and batch size on MAE, MAPE, and RMSE, respectively, showing that the proposed model achieves the best performance under moderate parameter settings.
Electronics 15 02369 g005
Figure 6. Comparative visualization of traffic flow predictions by DPMCODE and STGODE models for nodes 120 and 58 in the PEMS08 dataset.
Figure 6. Comparative visualization of traffic flow predictions by DPMCODE and STGODE models for nodes 120 and 58 in the PEMS08 dataset.
Electronics 15 02369 g006
Figure 7. The heatmap visualization results of the dynamic adaptive graphs at t = 2, 4, 6 time steps.
Figure 7. The heatmap visualization results of the dynamic adaptive graphs at t = 2, 4, 6 time steps.
Electronics 15 02369 g007
Table 1. Comparison between DPMCODE and related methods.
Table 1. Comparison between DPMCODE and related methods.
MethodGraph Structure
Type
Dynamic Graph
Update
Continuous
Modeling
Mechanism
Local Non-Adjacent
Relation
Modeling
STGCNPredefined static
graph
×××
STFGNNPredefined static
graph
×××
DCRNNPredefined
diffusion graph
××
DGCRNAdaptive dynamic
graph
××
ST-DGDEAdaptive dynamic
graph
×
STGODEPredefined static
graph
××
DPMCODEProbabilistic adaptive
dynamic graph
Table 2. A brief overview of the datasets used in the experiments.
Table 2. A brief overview of the datasets used in the experiments.
Dataset | V | | E | Time StepsTime Range
PEMS0335854726,20809/2018–11/2018
PEMS0430734016,99201/2018–02/2018
PEMSD7228113212,67205/2012–06/2012
PEMS0817029517,85607/2016–08/2016
PEMS-BAY325236952,11601/2017–05/2017
Table 3. Main hyperparameter settings of DPMCODE.
Table 3. Main hyperparameter settings of DPMCODE.
HyperparameterValue
FrameworkPython + PyTorch
GPUNVIDIA GeForce RTX 3090
OptimizerAdam
Input length12
Prediction horizon12
Number of layers2
Embedding dimension32
Batch size16
Learning rate0.001
Table 4. Forecasting performance comparison of models on PEMS03/04/08. Lower MAE/RMSE/MAPE indicates better performance. Performance rankings are in bold for the best results, and underlined - for the second-best results.
Table 4. Forecasting performance comparison of models on PEMS03/04/08. Lower MAE/RMSE/MAPE indicates better performance. Performance rankings are in bold for the best results, and underlined - for the second-best results.
DatasetMetricARIMAFC-LSTMGraphWaveNetSTGCNSTGODEDCRNNDGCRNSTSGCNOurs
PEMS03MAE35.4122.3319.1217.5516.5017.9915.9817.4815.89
RMSE47.5935.1132.7730.4227.8430.3627.4129.2127.31
MAPE33.7825.3318.8917.3416.6918.3417.36 16.87 - 16.38
PEMS04MAE33.7326.7724.8921.1620.84 22.11 - 22.8419.7718.52
RMSE48.8040.6539.6634.7232.84 34.46 - 33.6230.4429.41
MAPE24.1818.2317.2913.83 13.77 - 14.1714.6413.0011.99
PEMS08MAE31.0923.0918.2817.15 15.94 - 16.8616.2217.3815.83
RMSE44.3235.1730.0527.0926.28 26.36 - 26.1027.2823.91
MAPE22.7314.9912.1511.29 11.09 - 12.0612.0610.9610.80
Table 5. Forecasting performance comparison of models on PEMS-D7 and PEMS-BAY. Lower MAE/RMSE/MAPE indicates better performance. Performance rankings are in bold for the best results, and underlined - for the next-best results.
Table 5. Forecasting performance comparison of models on PEMS-D7 and PEMS-BAY. Lower MAE/RMSE/MAPE indicates better performance. Performance rankings are in bold for the best results, and underlined - for the next-best results.
DatasetMetricARIMAGraphWaveNetDCRNNSTSGCNSTGCNSTGODEOurs
PEMSD7MAE7.273.193.833.044.01 2.97 - 2.88
RMSE13.206.247.185.937.55 5.66 - 5.21
MAPE10.388.029.817.559.67 7.36 - 7.33
PEMS-BAYMAE3.38 1.95 - 2.072.022.492.041.63
RMSE6.50 4.48 - 4.744.635.694.893.21
MAPE8.30 4.61 - 4.904.795.794.613.72
Table 6. Ablation study on the PEMS-BAY dataset.
Table 6. Ablation study on the PEMS-BAY dataset.
ModelMAETraining Time (s)Inference Time (s)ParametersFLOPs (G)
BaseLine1.89542284,6310.45 G
w/o ODE1.7613246361,2481.01 G
w/o LCA1.7214551428,5161.22 G
w/o SFP1.6915355487,9031.31 G
w/o Gumbel1.7415152474,3091.29 G
DPMCODE-S1.7116158513,2901.45 G
DPMCODE-D1.6717864615,8941.85 G
DPMCODE1.6317060546,9371.57 G
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhu, Y.; Wang, C.; Liu, P.; Yang, Y. Dynamic Graph Construction and Continuous Spatiotemporal Evolution for Traffic Forecasting. Electronics 2026, 15, 2369. https://doi.org/10.3390/electronics15112369

AMA Style

Zhu Y, Wang C, Liu P, Yang Y. Dynamic Graph Construction and Continuous Spatiotemporal Evolution for Traffic Forecasting. Electronics. 2026; 15(11):2369. https://doi.org/10.3390/electronics15112369

Chicago/Turabian Style

Zhu, Yaodong, Caixia Wang, Peng Liu, and Yang Yang. 2026. "Dynamic Graph Construction and Continuous Spatiotemporal Evolution for Traffic Forecasting" Electronics 15, no. 11: 2369. https://doi.org/10.3390/electronics15112369

APA Style

Zhu, Y., Wang, C., Liu, P., & Yang, Y. (2026). Dynamic Graph Construction and Continuous Spatiotemporal Evolution for Traffic Forecasting. Electronics, 15(11), 2369. https://doi.org/10.3390/electronics15112369

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop